Coding: Fleeting Thoughts

A place to discuss the implementation and style of computer programs.

Moderators: phlip, Moderators General, Prelates

User avatar
Yakk
Poster with most posts but no title.
Posts: 11129
Joined: Sat Jan 27, 2007 7:27 pm UTC
Location: E pur si muove

Re: Coding: Fleeting Thoughts

Postby Yakk » Wed May 09, 2012 6:46 pm UTC

Thesh wrote:
Thesh wrote:So apparently, .NET implements PBKDF1 which allows you to specify a hash algorithm, and PBKDF2 using HMAC-SHA1. I need PBKDF2 using HMAC-SHA384, which means I have to write my own implementation (or steal someone else's). For the level of abstraction that System.Security.Cryptography has, is a generic PBKDF2 class too much to ask for?
And 70 lines of code later, I have a working generic, static Pbkdf2 class. See, microsoft, it's that easy.
Wait, you just wrote cryptographic code.

Do you know the rule about writing cryptographic code?
One of the painful things about our time is that those who feel certainty are stupid, and those with any imagination and understanding are filled with doubt and indecision - BR

Last edited by JHVH on Fri Oct 23, 4004 BCE 6:17 pm, edited 6 times in total.

User avatar
Thesh
Made to Fuck Dinosaurs
Posts: 6598
Joined: Tue Jan 12, 2010 1:55 am UTC
Location: Colorado

Re: Coding: Fleeting Thoughts

Postby Thesh » Wed May 09, 2012 7:23 pm UTC

Yakk wrote:
Thesh wrote:
Thesh wrote:So apparently, .NET implements PBKDF1 which allows you to specify a hash algorithm, and PBKDF2 using HMAC-SHA1. I need PBKDF2 using HMAC-SHA384, which means I have to write my own implementation (or steal someone else's). For the level of abstraction that System.Security.Cryptography has, is a generic PBKDF2 class too much to ask for?
And 70 lines of code later, I have a working generic, static Pbkdf2 class. See, microsoft, it's that easy.
Wait, you just wrote cryptographic code.

Do you know the rule about writing cryptographic code?

There's a difference betweeen writing your own cryptographic protocols, and implementing a specification, while using existing, built in library functions for the underlying algorithms.

Of course, I should mention this is for a sample implementation of a cryptographic standard I'm developing and planning on publishing so it can be reviewed.
Summum ius, summa iniuria.

User avatar
Shivahn
Posts: 2200
Joined: Tue Jan 06, 2009 6:17 am UTC

Re: Coding: Fleeting Thoughts

Postby Shivahn » Thu May 10, 2012 2:59 am UTC

So I've studied interpreted languages (well, interpreters), but not compilation. I have a quick question relating something I see in C++ to interpreted stuff, and I realize I may be way off.

Static member functions are said to be unrelated to any object of the class, and each member of a class has its own copy of non static functions. Does each member of a class have its own frame in the environment where the attributes and methods are defined, while static member functions are defined in the global environment? Or am I trying to apply some scheme that doesn't make sense onto a compiled language?

User avatar
Thesh
Made to Fuck Dinosaurs
Posts: 6598
Joined: Tue Jan 12, 2010 1:55 am UTC
Location: Colorado

Re: Coding: Fleeting Thoughts

Postby Thesh » Thu May 10, 2012 3:24 am UTC

Shivahn wrote:So I've studied interpreted languages (well, interpreters), but not compilation. I have a quick question relating something I see in C++ to interpreted stuff, and I realize I may be way off.

Static member functions are said to be unrelated to any object of the class, and each member of a class has its own copy of non static functions. Does each member of a class have its own frame in the environment where the attributes and methods are defined, while static member functions are defined in the global environment? Or am I trying to apply some scheme that doesn't make sense onto a compiled language?


In C++, each static method of a class is just like a non-class function except it can access both public and non-public static members of that class. Non-static methods are not copied for each class, instead they are passed a reference to the class, essentially as a behind the scenes parameter (which you can access via the "this" keyword).



Unrelated, I was checking how the speed of my PBKDF2 function using HMAC-SHA-1 compares to the .NET library class and I was happy to see it was less than 1% slower (I want my slow hash function implementation to be fast, dammit!). Then I tested the function with HMAC-SHA-384 and noticed it was half the speed of HMAC-SHA1. I changed the build options to build as 64 bit (.NET 4.0 builds all exe's as 32 bit by default), and it was three times as fast as HMAC-SHA-1. So I decided to run a few different modes and get some benchmarks, these are my results:

64-bit execution time (milliseconds)

Code: Select all

Pbkdf2<HMACSHA1>;      1483.0849
Pbkdf2<HMACSHA256>;     455.0261
Pbkdf2<HMACSHA384>;     575.0328
Rfc2898DeriveBytes;    1453.0832

32-bit execution time (milliseconds)

Code: Select all

Pbkdf2<HMACSHA1>;      1751.1002
Pbkdf2<HMACSHA256>;     950.0544
Pbkdf2<HMACSHA384>;    3433.1963
Rfc2898DeriveBytes;    1745.0998


I would have expected SHA-384 to be half the speed on 32 bit (it primarily uses 64 bit integer operations), and the rest to be barely any different on 64-bit since they are made up entirely of 32 bit integer operations. I'm curious as to the reason for the significant performance difference. I'm guessing the rotations have something to do with the SHA-384 performance, but I would think it would take four operations to rotate a 32 bit integer on a 64 bit processor, that doesn't explain why it was 6 times slower. Anyone have any insight?
Summum ius, summa iniuria.

EvanED
Posts: 4331
Joined: Mon Aug 07, 2006 6:28 am UTC
Location: Madison, WI
Contact:

Re: Coding: Fleeting Thoughts

Postby EvanED » Thu May 10, 2012 3:30 am UTC

Shivahn wrote:So I've studied interpreted languages (well, interpreters), but not compilation. I have a quick question relating something I see in C++ to interpreted stuff, and I realize I may be way off.

Static member functions are said to be unrelated to any object of the class, and each member of a class has its own copy of non static functions. Does each member of a class have its own frame in the environment where the attributes and methods are defined, while static member functions are defined in the global environment? Or am I trying to apply some scheme that doesn't make sense onto a compiled language?


I'm not sure how to relate the idea of an interpreted-language style environment to compiled C++, especially as I'm not sure what interpreted languages you're familiar with.

First, everything I'm about to say deals with member variables only; I'll come back to functions at the end.

Basically each object has some block of memory. If you use automatic allocation (if it's a local), then that block is on the stack; if you use new, then it's in the heap; if it's a global, than it's in the "static" area. (More on that in a sec.) Wherever it is, that block of memory has one slot for each (non-static) member variable, along possibly with some padding and a vptr (more on that in a sec too).

Depending on what you're familiar with, you might be able to call that a frame in your environment.

Class-static variables are essentially globals. They're allocated along with globals in the same area of memory. The only difference is the namespace it's in and how it's referred to; a static variable x in a class C might essentially be desugared to just talking about some global C__x or something. (In fact you can see this in real-compiler mangled names, they just use more complicated schemes.) You could probably consider the globals the top-level environment if you want.

Functions are different. Non-virtual functions are basically the same no matter whether they are static or not. They're put in the same place, and objects have no reference to them. The binding to those functions are done statically, so there's no reference needed. The only difference between a static function and a non-virtual instance function is that the instance function will have an implicit "this" parameter.

Virtual functions are a bit different because of the runtime binding. The actual function code is still put in the same place, but now objects contain references to them. Each class C (not object) gets something called a "vtable" that has pointers to the functions in question. (Unlike most interpreted languages this isn't a map from name->function, just an array; the offset from the beginning of the vtable is known at compile time, and subclasses keep the same layout of their vtable as their parent classes for the parts where they overlap.) Then each object has what's called a "vpointer" to its class's vtable. When a.foo() executes, the code will look at a's vptr, follow it to the vtable, look up the offset of foo, then call that address.

(Technically the stuff in this post is implementation details, but practically speaking it's universally true. Different compilers can use different vtable layouts, but as this is a very efficient way of doing things and it's perfectly satisfactory as far as language semantics goes, this is what everyone does.)

User avatar
Shivahn
Posts: 2200
Joined: Tue Jan 06, 2009 6:17 am UTC

Re: Coding: Fleeting Thoughts

Postby Shivahn » Thu May 10, 2012 3:35 am UTC

Thesh wrote:
Shivahn wrote:So I've studied interpreted languages (well, interpreters), but not compilation. I have a quick question relating something I see in C++ to interpreted stuff, and I realize I may be way off.

Static member functions are said to be unrelated to any object of the class, and each member of a class has its own copy of non static functions. Does each member of a class have its own frame in the environment where the attributes and methods are defined, while static member functions are defined in the global environment? Or am I trying to apply some scheme that doesn't make sense onto a compiled language?


In C++, each static method of a class is just like a non-class function except it can access both public and non-public static members of that class. Non-static methods are not copied for each class, instead they are passed a reference to the class, essentially as a behind the scenes parameter (which you can access via the "this" keyword).

Er, do you mean that non-static methods are passed a reference to the calling object? So like, when you call some object.method(<args>), the method function is essentially passed (<args>,Class* this) and the instruction someMemberVar=3 is translated to this->someMemberVar=3?

And static functions are basically the same thing, only they don't get passed a pointer?

Or am I misunderstanding entirely? I'm sorry if these questions are really basic, but I am struggling to apply what I've learned to relatively different situations.

Edit for didn't see another post:

Oh, ok, thanks. I think I see. What I'm familiar with is the (really basic) MIT 6.001 class, which used the SICP book and Scheme as its language. What I basically learned was that when evaluating a procedure, a local table is formed that binds the arguments to the parameters and contains a pointer to the parent environment of the procedure. So that's the basics of my understanding - I thought maybe C++ was basically forming a global table for static functions and then treating objects as frames with their class methods bound in the tables.

Thanks a lot for the writeup and help. I really appreciate it.

User avatar
Thesh
Made to Fuck Dinosaurs
Posts: 6598
Joined: Tue Jan 12, 2010 1:55 am UTC
Location: Colorado

Re: Coding: Fleeting Thoughts

Postby Thesh » Thu May 10, 2012 3:53 am UTC

Shivahn wrote:Er, do you mean that non-static methods are passed a reference to the calling object?


Yeah, that's what I meant.

Shivahn wrote: So like, when you call some object.method(<args>), the method function is essentially passed (<args>,Class* this) and the instruction someMemberVar=3 is translated to this->someMemberVar=3?


Yup, that's essentially what it's doing.

Shivahn wrote:And static functions are basically the same thing, only they have greater access and don't get passed a pointer?


The greater access was in comparison to your basic C functions, which are outside the class entirely. Static member functions have the same access level as non-static member functions, except they don't get passed a pointer so they can't see non-static variables and can't call non-static function (since there's no reference to pass).
Summum ius, summa iniuria.

User avatar
Shivahn
Posts: 2200
Joined: Tue Jan 06, 2009 6:17 am UTC

Re: Coding: Fleeting Thoughts

Postby Shivahn » Thu May 10, 2012 3:57 am UTC

Haha, yeah, the "greater access" part wasn't supposed to be there, but snuck in because brains are funny things sometimes.

Thanks a lot! I have a much better understanding of parts of C and C++ (and compilation) now.

User avatar
Yakk
Poster with most posts but no title.
Posts: 11129
Joined: Sat Jan 27, 2007 7:27 pm UTC
Location: E pur si muove

Re: Coding: Fleeting Thoughts

Postby Yakk » Thu May 10, 2012 3:46 pm UTC

Shivahn wrote:Er, do you mean that non-static methods are passed a reference to the calling object? So like, when you call some object.method(<args>), the method function is essentially passed (<args>,Class* this) and the instruction someMemberVar=3 is translated to this->someMemberVar=3?

The traditional translation of this->member(foo) is member(this, foo) -- ie, "this" is considered the first parameter.

In some OO frameworks and languages, the passing of the this pointer to the member function is explicit. And almost always the first one. This syncs up nicely with default parameters (which usually have to be trailing parameters if you are using parameters-by-order passing).

The second important part is that private methods and members are not usable outside of the class (in a strange quirk, they are visible, just not usable). So a closer approximation of the C++ object model (not including virtual yet!) is:

Code: Select all

class Foo
{
public:
  int a_public_value;
  void print_values();
private:
  int a_private_value;
};

void Foo::print_values()
{
  printf("Public: %d", a_public_value);
  printf("Private: %d", a_private_value);
}

// C version:
struct PublicFoo
{
  int a_public_value;
};

void print_private_value(PublicFoo* this);

struct PrivateFoo
{
  PublicFoo public_part;
  int a_private_value;
};

void print_private_value(PublicFoo* this_)
{
  PrivateFoo* this = (PrivateFoo*)this_;
  printf("Public: %d", this->public_part.a_public_value);
  printf("Private: %d", this->a_private_value);
}

but that is only approximate.

Virtual inheritance and Virtual functions are another complication.

If you have Virtual functions, each type of class has a table of function pointers and each instance has a pointer to such a table.

Derived classes create copies of the Virtual function table of their parent classes, and replace some of the function pointers. Derived instances have their class pointer pointing at the derived function table.

When you call the function on an instance of the class, what you actually do is look up the function table in that instance's class function table.

This gets tricky with multiple inheritance -- but you just end up with one table for each parent class (and sometimes a single virtual function in the derived class can override both parent tables).

Virtual inheritance is something else entirely. Here we have a table of pointers to class tables, each of which contains pointers to functions. At this point, learning how you interact with it, rather than how it is implemented, is probably sufficient (unless you are doing header-file based transparent DLL versioning or something else stupid like that).
And static functions are basically the same thing, only they don't get passed a pointer?
Yep.

Note that static data members also exist. They are not stored in the instance of the class, but access control on them depends on being member or non-member.

And that visibility and access control are two different things. This is important if you are doing certain kinds of overloading -- overload resolution depends on visibility, so you can call a function and get a private overload instead of a slightly less good match on a public overload, and then the compiler says "not valid, you cannot call a private member function".

But that is getting somewhat esoteric.

C++ is a big language.
One of the painful things about our time is that those who feel certainty are stupid, and those with any imagination and understanding are filled with doubt and indecision - BR

Last edited by JHVH on Fri Oct 23, 4004 BCE 6:17 pm, edited 6 times in total.

User avatar
Shivahn
Posts: 2200
Joined: Tue Jan 06, 2009 6:17 am UTC

Re: Coding: Fleeting Thoughts

Postby Shivahn » Thu May 10, 2012 10:49 pm UTC

Yakk wrote:C++ is a big language.

Yes, that about sums it up. It might just be me, but it seems horrifically complicated compared to.. well, most other things I see. I could implement a shitty Lisp or Python interpreter at my current skill level (with some internet help), but C++ is mind boggling in trying to understand the behind-the-scenes stuff.

User avatar
TheAmazingRando
Posts: 2308
Joined: Thu Jan 03, 2008 9:58 am UTC
Location: San Diego, CA

Re: Coding: Fleeting Thoughts

Postby TheAmazingRando » Thu May 10, 2012 11:44 pm UTC

Making a scrollbar in MFC has got to be the most complicated way possible to do the simplest thing.

Ben-oni
Posts: 278
Joined: Mon Sep 26, 2011 4:56 am UTC

Re: Coding: Fleeting Thoughts

Postby Ben-oni » Fri May 11, 2012 12:47 am UTC

EvanED wrote:Virtual functions are a bit different because of the runtime binding. The actual function code is still put in the same place, but now objects contain references to them. Each class C (not object) gets something called a "vtable" that has pointers to the functions in question. (Unlike most interpreted languages this isn't a map from name->function, just an array; the offset from the beginning of the vtable is known at compile time, and subclasses keep the same layout of their vtable as their parent classes for the parts where they overlap.) Then each object has what's called a "vpointer" to its class's vtable. When a.foo() executes, the code will look at a's vptr, follow it to the vtable, look up the offset of foo, then call that address.

Wait.

Vtables aren't actually part of the language specification, are they? This could get ugly...

Code: Select all

#include <iostream>
using namespace std;

// Empty structure to act as function signature
template <int N> struct S {};

// Base classes
template <int N>
class A {
public:
   virtual void classNum(S<N> s) {
      cout << "CNum = " << N << endl;
   }
};

// Derived classes; B<N> inherits from all of A<0> to A<N>
template <int N>
class B : public A<N>, public B<N-1> {
public:
   void classNum(S<N> s) {
      cout << "DNum = " << N << endl;
   }
};

template <>
class B<0> : public A<0> {
public:
   void classNum(S<0> s) {
      cout << "DNum = " << 0 << endl;
   }
};


So, B<N> inherits from N classes of A, and overloads each of their methods. So, the vtable for each A must have N entries, right? Otherwise there could potentially be a conflict:

Code: Select all

template <int N>
void callClassNum(A<N> &a) {
   S<N> s;
   a.classNum(s);
}

So, callClassNum<N> can take either an A<N> or a B<M> where M ≥ N. So, by using a constant offset on the vtable, some A<N> must have N entries on it's vtable.

For simple cases this isn't bad, but this could get terribly messy, couldn't it?

Code: Select all

#define CLASS_NAME(C, M, N) public: virtual void className(S<M>,S<N>){cout << C << "<" << M << ", " << N << ">" << endl;}

template <int M, int N>
class C {
   CLASS_NAME('C', M, N)
};

template <int M, int N>
class D : public D<M-1, N>, public C<M, N> {
   CLASS_NAME('D', M, N)
};

template <int N>
class D<0, N> : public C<0, N> {
   CLASS_NAME('D', 0, N)
};

template <int M, int N>
class E : public E<M, N-1>, public D<M, N> {
public:
   CLASS_NAME('E', M, N)
};

template <int M>
class E<M, 0> : public C<M, 0> {
   CLASS_NAME('E', M, 0)
};

template <int M, int N>
void callClassName(C<M, N> &c) {
   S<M> sm; S<N> sn;
   c.className(sm, sn);
}

Now gcc whines when I try to declare an E<50,50> class. It only requires about 5,000 classes, which really isn't all that much. Is it because each of those 5,000 classes requires a 5,000 entry vtable? Hmm... (5,000)2 * 4 bytes ≈ 100 MB (assuming 32 bit pointers). That would make for a rather large executable. No wonder gcc whines.

Is there an option to not use vtables?

EvanED
Posts: 4331
Joined: Mon Aug 07, 2006 6:28 am UTC
Location: Madison, WI
Contact:

Re: Coding: Fleeting Thoughts

Postby EvanED » Fri May 11, 2012 12:54 am UTC

Ben-oni wrote:
EvanED wrote:Virtual functions are a bit different because of the runtime binding. The actual function code is still put in the same place, but now objects contain references to them. Each class C (not object) gets something called a "vtable" that has pointers to the functions in question. (Unlike most interpreted languages this isn't a map from name->function, just an array; the offset from the beginning of the vtable is known at compile time, and subclasses keep the same layout of their vtable as their parent classes for the parts where they overlap.) Then each object has what's called a "vpointer" to its class's vtable. When a.foo() executes, the code will look at a's vptr, follow it to the vtable, look up the offset of foo, then call that address.

Wait.

Vtables aren't actually part of the language specification, are they? This could get ugly...

"(Technically the stuff in this post is implementation details, but practically speaking it's universally true. Different compilers can use different vtable layouts, but as this is a very efficient way of doing things and it's perfectly satisfactory as far as language semantics goes, this is what everyone does.)"

Is there an option to not use vtables?

No.

But vtables aren't your problem either. Remember that each of those functions have to exist as well. And the actual function code, even for a no-op function, is going to be rather larger than a pointer.

Ben-oni
Posts: 278
Joined: Mon Sep 26, 2011 4:56 am UTC

Re: Coding: Fleeting Thoughts

Postby Ben-oni » Fri May 11, 2012 2:13 am UTC

EvanED wrote:
Is there an option to not use vtables?

No.

But vtables aren't your problem either. Remember that each of those functions have to exist as well. And the actual function code, even for a no-op function, is going to be rather larger than a pointer.

Yes, the problem is precisely the vtables. I admit I've specifically gone for a case where vtables perform badly, but it was to illustrate a point.

5,000 functions isn't a problem. The choice of a fixed method of referencing them is. Regardless of the approach, virtual functions require some way of mapping both a class and function identifier to the function address, i.e. η : C × I → F. Understanding that η need not be fully defined, we can satisfactorily represent it as a 0-1 matrix M. When M is dense, vtables work wonderfully. But, when M is sparse*, vtables are a terrible solution.

So, I want two things:
  1. For people to stop thinking about virtual functions being intrinsically linked to the vtable implementation.
  2. For compilers to have a backup plan when it looks like vtables won't perform well.


*I don't actually mean that M being sparse is a sufficient condition. Let M be a permutation of the direct sum of irreducible matrices: PTMP = A1⊕…⊕An. Then for any Ai that is sparse, vtables may be inefficient.

jareds
Posts: 436
Joined: Wed Jan 03, 2007 3:56 pm UTC

Re: Coding: Fleeting Thoughts

Postby jareds » Fri May 11, 2012 5:56 am UTC

Ben-oni wrote:5,000 functions isn't a problem. The choice of a fixed method of referencing them is. Regardless of the approach, virtual functions require some way of mapping both a class and function identifier to the function address, i.e. η : C × I → F. Understanding that η need not be fully defined, we can satisfactorily represent it as a 0-1 matrix M. When M is dense, vtables work wonderfully. But, when M is sparse*, vtables are a terrible solution.

This is not based on an understanding of how vtables are normally (i.e., always minus epsilon) implemented in the presence of multiple inheritance. Let's ignore vtables for a second and talk about multiple inheritance. Consider the following code:

Code: Select all

struct A {
    int a;
};
struct B {
    int b;
};
int getA(A *a) { return a->a; }
int getB(B *b) { return b->b; }
struct C : A, B { };
int f(C *c) {
    return getA(c) + getB(c);
}
How does this work? There is only one copy each of getA and getB, each reading from a fixed offset from their argument. Does B have 4 bytes of unused padding at the beginning (for example) so that A and B can combine while sharing the same address? Certainly not. Instead, converting a C* to a B* entails automatically adding 4 to the value of the C* (for example). There is nothing that says that an object must be at the same address as all its base objects, and in fact that would not lead to a good implementation of multiple inheritance. Of course, invariably one of the base objects will share its address with the object. Furthermore, allowing inheritance to affect the layout of the base class would be incompatible with separate compilation.

The situation is similar with virtual functions and multiple inheritance. If you have a class inheriting from a long chain of single inheritance, it will only have a single vptr field. If you have a class with multiple inheritance and virtual functions everywhere, it will have a separate vptr field for each leaf in the inheritance graph. In your example, no vtable has more entries than its class actually has functions (so no A or C vtable has more than one entry), but E<50,50> will have roughly 2500 vptrs (and nothing else, since it has no data members) and there are roughly 5000 vtables in total if a E<50,50> is instantiated. Each instantiation of callClassName will almost certainly have equivalent object code, but if you do "E<50,50> e; callClassName<X,Y>(e);" an offset will be added to e's address that depends on the constants X and Y.

I can't say specifically why GCC is choking. When I tried this, I used "return (M<<16)|(N<<8)|C" rather than I/O because I wanted to look at the assembly and didn't want to pull in a bunch of crap. And GCC did not choke for me. All the compilers I tried used tons more memory and CPU time than I would have thought ideal. In any event, it looks more like a general quality of implementation issue than a vtable strategy issue.
Ben-oni wrote:So, I want two things: 1. For people to stop thinking about virtual functions being intrinsically linked to the vtable implementation.

I sort of agree. People sometimes do horrific things like *(long*)obj to get the vptr for some nefarious purpose. However, C++ is a language that people use when they care about efficiency, and it's perfectly reasonable to understand widely used implementation techniques so that you know how efficient your code is likely to be.
Ben-oni wrote:2. For compilers to have a backup plan when it looks like vtables won't perform well.

C++ users expect separate compilation of individual object files, not whole program compilation. Therefore, all C++ implementations follow some sort of ABI, varying from implementation to implementation, that governs things like class and vtable layout, and this ABI cannot be deviated from without creating binary incompatibility. It is generally quite unacceptable for turning on an optimization to create binary incompatibility with unoptimized code from the same implementation. Therefore, any such "backup plan" would have to be decided at the level of the ABI. However, you haven't actually demonstrated a real problem at the ABI level with normal C++ ABIs.
Last edited by jareds on Fri May 11, 2012 6:24 pm UTC, edited 1 time in total.

User avatar
Yakk
Poster with most posts but no title.
Posts: 11129
Joined: Sat Jan 27, 2007 7:27 pm UTC
Location: E pur si muove

Re: Coding: Fleeting Thoughts

Postby Yakk » Fri May 11, 2012 2:39 pm UTC

So, the vtable for each A must have N entries, right?

No, the vtable for A<N> has one entry. Because it has one virtual function.

The vtable for B<N> has N entries, one for each virtual function it inherits. (I guess you could say that it has N vtables, one for each A<n>).
So, B<N> inherits from N classes of A, and overloads each of their methods. So, the vtable for each A must have N entries, right? Otherwise there could potentially be a conflict:

B<N> inherits from A<N> and B<N-1>. There is A<0..N> in its inheritance tree.

B<N>'s class causes a single function, callClassNum(S<N>). It uses a vtable that looks a hell of a lot like B<N-1>'s vtable, with an additional entry for callClassNum(S<N>).

I suspect you didn't mean that "each A must have N entries", but rather "each B must have N entries"?

Note that B<N>::callClassNum(S<N>) from 0...N are the only functions that need exist. Each B<N> however requires a vtable of size N. So we end up with O(N(N+1)/2) vtable action and O(N) functions.

Would virtual inheritance reduce the overhead? I suspect so, but am uncertain.

The real trick here is that the various tables of functions are compatible, if there is no "header" information in the table of function pointers. So you could, in theory, use a particular offset into the vtable of B<100> to represent the vtable of B<99>. However, I doubt vtables are optimized for this.
One of the painful things about our time is that those who feel certainty are stupid, and those with any imagination and understanding are filled with doubt and indecision - BR

Last edited by JHVH on Fri Oct 23, 4004 BCE 6:17 pm, edited 6 times in total.

Ben-oni
Posts: 278
Joined: Mon Sep 26, 2011 4:56 am UTC

Re: Coding: Fleeting Thoughts

Postby Ben-oni » Fri May 11, 2012 6:55 pm UTC

Wait another sec, there. Let's see if I understand this right. You're saying that casting a pointer from one type to another changes the pointer value?!

Struct A uses a certain structure to represent objects. Likewise for B. Struct C simply concatenates the two structures, and appends its own unique structure as well.

So, casting pointers of type C* to type B* or A* adds a constant to the pointer value, so that it now points to the correct structure. Big problem.

Code: Select all

int main(int argc, char** argv) {
  C* c = new C();
  A* a = (A*)c;
  B* b = (B*)c;
  cout << "a = " << (void*)a << endl;
  cout << "b = " << (void*)b << endl;
  cout << "c = " << (void*)c << endl;
  return 0;
}

But, apparently it actually does change the pointer.

However, this violates my intuition about how pointers work. For instance,

Code: Select all

C* cs[5];
B** bs = (B**)cs;

I've now lost the knowledge that each *bs[i] needs to be shifted by a number of bytes. If I try to use it, I'll end up with junk. But this is C++, and I should have been using vectors anyways:

Code: Select all

vector<C*> cs(1);
vector<B*> bs = cs // Type error
vector<B*> bs(cs.begin(), cs.end()); // Cast the elements of cs individually, no type error

This, of course, is a rather common newb error, is it not? And with obvious reasons: The laws of OOP state that a objects of a child class are also objects of the super class, and so we say a subclass is a type of the superclass. But this doesn't work either in C++'s template system, nor in its pointer system! Distinct class and struct types are different, regardless of inheritance, and have nothing in common.

Of course, it could have been done right, even while maintaining ABI support. It looks like C++ compiler writers traded correctness with simplicity. Grr.

Thank you Java, once again, for teaching me wrong. Thank you, C++, once again, for doing it the wrong way.

korona
Posts: 495
Joined: Sun Jul 04, 2010 8:40 pm UTC

Re: Coding: Fleeting Thoughts

Postby korona » Fri May 11, 2012 7:16 pm UTC

Ben-oni wrote:Wait another sec, there. Let's see if I understand this right. You're saying that casting a pointer from one type to another changes the pointer value?!

If you're using multiple inheritance this is indeed the case (and the canonical way implement multiple inheritance so C++ did nothing wrong here).

Code: Select all

C* cs[5];
B** bs = (B**)cs;

That does not even work with single inheritance or c style structs, so your intuition was wrong from the beginning:

Code: Select all

struct type_a { int a; };
struct type_b : public type_a { int b; };

type_b *some_ptrs[10];
type_a **obviously_not_working = (type_a**)some_ptrs;


Of course, it could have been done right, even while maintaining ABI support. It looks like C++ compiler writers traded correctness with simplicity. Grr.

I disagree: The way it is implemented IS the right way. I also don't see any way to implement the functionality you want without additional runtime information or a more complex type system (i.e. a pointer-to-a-subclass-stored-in-array type).

Ben-oni
Posts: 278
Joined: Mon Sep 26, 2011 4:56 am UTC

Re: Coding: Fleeting Thoughts

Postby Ben-oni » Fri May 11, 2012 7:29 pm UTC

korona wrote:That does not even work with single inheritance or c style structs, so your intuition was wrong from the beginning:

Code: Select all

struct type_a { int a; };
struct type_b : public type_a { int b; };

type_b *some_ptrs[10];
type_a **obviously_not_working = (type_a**)some_ptrs;

Why don't you think about that for a bit and get back to me, k? It obviously works.

jareds
Posts: 436
Joined: Wed Jan 03, 2007 3:56 pm UTC

Re: Coding: Fleeting Thoughts

Postby jareds » Fri May 11, 2012 7:34 pm UTC

Oh wow, I was so hung up on the fact that vtables aren't in any way a sparse matrix that I missed the fact that the first case (classes A/B) isn't even sparse under Ben-oni's interpretation because each B<N> has N distinct member functions in the first place. Thanks Yakk.

However, with the second example (classes C/D/E), with E<N,N>, the number of total vtable entries is only cubic in N and thus sub-quadratic in K=N*N.
Yakk wrote:Would virtual inheritance reduce the overhead? I suspect so, but am uncertain.

Why? If any class appears twice in the inheritance tree, I fail to see it.
Yakk wrote:The real trick here is that the various tables of functions are compatible, if there is no "header" information in the table of function pointers. So you could, in theory, use a particular offset into the vtable of B<100> to represent the vtable of B<99>. However, I doubt vtables are optimized for this.

Unfortunately, there is invariably header information in vtables for RTTI.
Ben-oni wrote:However, this violates my intuition about how pointers work. For instance,

Code: Select all

C* cs[5];
B** bs = (B**)cs;

I've now lost the knowledge that each *bs[i] needs to be shifted by a number of bytes. If I try to use it, I'll end up with junk. But this is C++, and I should have been using vectors anyways:

Code: Select all

vector<C*> cs(1);
vector<B*> bs = cs // Type error
vector<B*> bs(cs.begin(), cs.end()); // Cast the elements of cs individually, no type error

This, of course, is a rather common newb error, is it not? And with obvious reasons: The laws of OOP state that a objects of a child class are also objects of the super class, and so we say a subclass is a type of the superclass. But this doesn't work either in C++'s template system, nor in its pointer system! Distinct class and struct types are different, regardless of inheritance, and have nothing in common.

Of course, it could have been done right, even while maintaining ABI support. It looks like C++ compiler writers traded correctness with simplicity. Grr.

Thank you Java, once again, for teaching me wrong. Thank you, C++, once again, for doing it the wrong way.

Um, NO. If you think casting a Derived** to a Base** should work, you haven't thought it through and you don't understand the "laws of OOP".

Code: Select all

struct B { void methodB(); };
struct C : B { void methodC(); };
C *cs[5];
B **bs = (B**)cs;
bs[0] = new B();
cs[0]->methodC(); // OOPS -- we just called methodC() on a B object

Note that C++ correctly won't allow you to assign cs to bs unless you use an unsafe cast.

This doesn't show up exactly in Java because it doesn't have pointers to pointers in the first place. Java does let you convert a C[] to a B[] and then throws an exception if you store a B into the B[], but I thought this was generally regarded as legacy weirdness. In any event, in C++ that would require carrying type information with all class pointers and checking it on stores to a pointer, which obviously would not be considered viable. If you want run-time checking for everything, you shouldn't be using C++.

Also, Java doesn't even have multiple inheritance in the first place.

I can't even begin to imagine how you think multiple inheritance should be implemented such that a derived object shares the same address with 2500 base objects. Keep in mind data members, not just virtual functions. Please enlighten us.

User avatar
Yakk
Poster with most posts but no title.
Posts: 11129
Joined: Sat Jan 27, 2007 7:27 pm UTC
Location: E pur si muove

Re: Coding: Fleeting Thoughts

Postby Yakk » Fri May 11, 2012 7:58 pm UTC

The general OO heirarchy is problematic.

Squares are restricted types of Rectangles, right?

No. That isn't actually true.

Immutable Squares are restricted types of immutable Rectangles. The standard OO pattern that both Squares and Rectangles can be changed (SetWidth) means that a Square is NOT a restricted type of Rectangle, as the following operations works on a Rectangle but not a Square:

Code: Select all

rect->SetWidth(5);
rect->SetHeight(10);
rect->GetWidth() != rect->GetHeight();


For a similar reason, if B is a subclass of A, then you can turn a B* into an A*, but you cannot turn a B** into an A**. There are operations on an A** that are illegal to perform on a B**, namely A** a; *a = new A();

In comparison, a B*const* should be changeable into an A*const*, in theory. In practice, I don't think C++ allows it (am I right? Dunno!), probably due to implementation details.

...

Please note, stop using C style casts in C++. There are C++ casts that work for every C style cast you want to do. And B** b; A** a = static_cast<A**>(b); will fail to compile, unlike A** a = (B**)b;, where you just implicitly reinterpreted memory in a way that is highly unlikely to be valid.
One of the painful things about our time is that those who feel certainty are stupid, and those with any imagination and understanding are filled with doubt and indecision - BR

Last edited by JHVH on Fri Oct 23, 4004 BCE 6:17 pm, edited 6 times in total.

Ben-oni
Posts: 278
Joined: Mon Sep 26, 2011 4:56 am UTC

Re: Coding: Fleeting Thoughts

Postby Ben-oni » Fri May 11, 2012 8:11 pm UTC

jareds wrote:

Code: Select all

struct B { void methodB(); };
struct C : B { void methodC(); };
C *cs[5];
B **bs = (B**)cs;
bs[0] = new B();
cs[0]->methodC(); // OOPS -- we just called methodC() on a B object

Note that C++ correctly won't allow you to assign cs to bs unless you use an unsafe cast.

Ah, mutation, of course. It's been so long since I used it, it hardly ever registers.

I can't even begin to imagine how you think multiple inheritance should be implemented such that a derived object shares the same address with 2500 base objects. Keep in mind data members, not just virtual functions. Please enlighten us.

It has to do with analyzing the structure of the entire class tree (a tree being a graph with no cycles), and storing certain metadata in the compiled files. It's non-trivial, and certainly not a fleeting thought. If you'd like the full solution, then I hope you're willing to sign off on a dissertation.

I freely admit that I primarily use interpreted or functional languages, and only dabble with C++, which is why I wondered what was going on here. My own interest in C++ is primarily in templates and multiple inheritance, as they allow for the declaration of arbitrary lattices (if only there wasn't a hiccup when inheriting from the same class multiple times!), a fundamental structure in logic. Logicians out there may know where I'm going with this...

User avatar
Yakk
Poster with most posts but no title.
Posts: 11129
Joined: Sat Jan 27, 2007 7:27 pm UTC
Location: E pur si muove

Re: Coding: Fleeting Thoughts

Postby Yakk » Fri May 11, 2012 8:34 pm UTC

if only there wasn't a hiccup when inheriting from the same class multiple times!
What hiccup? With non-virtual, you get two instances of the class in your hierarchy. With virtual, you get one instance of the class in your hierarchy.

You can get N instances by doing virtual inheritance with template tagging.
One of the painful things about our time is that those who feel certainty are stupid, and those with any imagination and understanding are filled with doubt and indecision - BR

Last edited by JHVH on Fri Oct 23, 4004 BCE 6:17 pm, edited 6 times in total.

jareds
Posts: 436
Joined: Wed Jan 03, 2007 3:56 pm UTC

Re: Coding: Fleeting Thoughts

Postby jareds » Fri May 11, 2012 8:53 pm UTC

Ben-oni wrote:
I can't even begin to imagine how you think multiple inheritance should be implemented such that a derived object shares the same address with 2500 base objects. Keep in mind data members, not just virtual functions. Please enlighten us.

It has to do with analyzing the structure of the entire class tree (a tree being a graph with no cycles), and storing certain metadata in the compiled files. It's non-trivial, and certainly not a fleeting thought. If you'd like the full solution, then I hope you're willing to sign off on a dissertation.

I didn't mean that it's impossible, just that it would suck. If you're forced to do a table lookup to find the offset to use for every member access, even for classes that aren't part of a multiple inheritance tree, and require all objects to carry around information to allow polymorphism, even if they don't use virtual functions, that would suck. Just like run-time error checking, if you want object members to be based on dictionaries attached to objects, don't use C++.

Ben-oni
Posts: 278
Joined: Mon Sep 26, 2011 4:56 am UTC

Re: Coding: Fleeting Thoughts

Postby Ben-oni » Sun May 13, 2012 2:55 am UTC

Yakk wrote:Squares are restricted types of Rectangles, right?

No. That isn't actually true.

Immutable Squares are restricted types of immutable Rectangles. The standard OO pattern that both Squares and Rectangles can be changed (SetWidth) means that a Square is NOT a restricted type of Rectangle, as the following operations works on a Rectangle but not a Square:

Code: Select all

rect->SetWidth(5);
rect->SetHeight(10);
rect->GetWidth() != rect->GetHeight();


After thinking about this for a bit, I think I'm starting to get the joke.

A rectangle is a kind of square where the Width and Height don't have to be equal.

From a mathematical perspective, all quadrilaterals are squares under some projection, so this sort of makes sense.

Next up, a square is a kind of circle where π is allowed to vary...

User avatar
Steax
SecondTalon's Goon Squad
Posts: 3038
Joined: Sat Jan 12, 2008 12:18 pm UTC

Re: Coding: Fleeting Thoughts

Postby Steax » Sun May 13, 2012 4:01 pm UTC

So I need help sanity-checking this.

I've decided to build my own MySQL database versioning tool within PHP. It's to help multiple developers version their database schemas; right now, it's quite complicated if 2 people commit 2 different schema changes at the same time. Someone needs to go in and examine both schemas, and figure out how to apply both changes.

So the tool works like this:
- Developer checks out latest version of code
- Developer does changes to the schema on his machine's local testing server
- Developer goes to a page that triggers a PHP script that extracts the current schema, compares it with the previous schema (made by this same step the last time this tool works), and figures out the delta (in SQL).
- Tool generates a SQL script in a file (filename based on timestamp), and it goes into a special folder.
- Developer sends this to remote server.
- Developer goes to remote server, open a page that triggers a PHP script that backs everything up, examines the special folder, detects new files, runs them, logs them, and notes the current timestamp (for the next "detect new file" bit).
- Remote server is now in sync. If 2 developers submitted 2 different deltas, they would both be run, hopefully with no conflicts.

I just, then, need to figure out how to detect conflicts.

Good idea? Or am I missing an easy way to do this? Any obvious holes?
In Minecraft, I use the username Rirez.

User avatar
Shivahn
Posts: 2200
Joined: Tue Jan 06, 2009 6:17 am UTC

Re: Coding: Fleeting Thoughts

Postby Shivahn » Sun May 13, 2012 6:56 pm UTC

I have a bit of a newbie C++ question. I am working on a game thingy off and on (just starting a new on phase, I think), and I'm implementing a component entity system. It seems to me that one of the easiest ways to do this is to have an addComponent method in my Entity class that is basically just a huge switch that takes an enumerated int and then adds the component (So, say I enumerate input to 0, I'd call entity.addComponent(input) and it would add the component). Component pointers are stored within the entity as a vector.

So my basic question is this: Is there a good way to do this? Because if I create an instance of my InputComponent class within the Entity::addComponent call, then put its pointer in my vector, the component itself is destroyed when the scope ends. Obviously, I need the component to remain intact because later on I'm going to need to call it to do its stuff. I know I could subclass Entity for each individual entity type and then actually give it a member for each component, but that both leads to a lot of extra code and is less modular.

User avatar
Sc4Freak
Posts: 673
Joined: Thu Jul 12, 2007 4:50 am UTC
Location: Redmond, Washington

Re: Coding: Fleeting Thoughts

Postby Sc4Freak » Sun May 13, 2012 7:23 pm UTC

If you really want to do that then allocate it on the heap, not the stack.

Code: Select all

vector<int*> ptrs;

{
    int* x = new int(5);
    ptrs.push_back(x);
}

assert(ptrs[0] == 5);


But it sounds like you have a problem with your design anyway. Why have a giant switch statement in the first place? Why is the entity responsible for creating components? If you want to initialize a component when you create it, how are you going to pass parameters to its constructor? And if you call entity.addComponent(0) then you already know the type you want to add, so why not just say entity.addComponent(new InputComponent) instead of doing it the roundabout way by going through a huge switch statement?

User avatar
Shivahn
Posts: 2200
Joined: Tue Jan 06, 2009 6:17 am UTC

Re: Coding: Fleeting Thoughts

Postby Shivahn » Sun May 13, 2012 7:31 pm UTC

Hmm. Good points. I'll have to rethink the design.

Edit: Ok, I have a very newbish question. What happens when you pass something like 'new Component component' as a parameter? Does it allocate a component on the heap, then perform the function?

Osha
Posts: 727
Joined: Wed Dec 03, 2008 3:24 am UTC
Location: Boise, Idaho, USA

Re: Coding: Fleeting Thoughts

Postby Osha » Wed May 16, 2012 7:50 am UTC

Code like this

foo(new string());

will allocate an object on the heap, and then pass the pointer to the function.

So foo might look something like this:

Code: Select all

void foo(string* sPtr) {
        cout << *sPtr << "\n";
        delete sPtr; //need to delete it ourself since the calling code doesn't have it anymore
}


Passing new right in a function call is done a lot when dealing with smart pointers, you allocate something with new, pass it to a smart pointer constructor, and the smart pointer handles freeing the memory when it's done with the memory.

I hope that answers the question...

User avatar
Shivahn
Posts: 2200
Joined: Tue Jan 06, 2009 6:17 am UTC

Re: Coding: Fleeting Thoughts

Postby Shivahn » Wed May 16, 2012 6:00 pm UTC

Thanks! That does answer my question. I appreciate the help.

User avatar
Yakk
Poster with most posts but no title.
Posts: 11129
Joined: Sat Jan 27, 2007 7:27 pm UTC
Location: E pur si muove

Re: Coding: Fleeting Thoughts

Postby Yakk » Wed May 16, 2012 6:36 pm UTC

In C++, if you are going to take a pointer and delete it, tell your caller by using a std::unique_ptr<Foo> in your signature.

They'll have to wrap their calling new in a std::unique_ptr<Foo>( new Foo() ), but at least they'll know you are going to take ownership.

(If the called object may or may not take ownership, you can move a std::unique pointer to another std::unique pointer using std::move. The pointer lifetime is as long as the unique_ptr's lifetime.)

Generally, leaks (where you presume they take ownership, but don't) and heap corruption (where you presume they don't take ownership, but do) are a serious problem in non-trivial sized C++ applications. Nip them in the bud.

(To extract the Foo* from a unique_ptr, call get() -- but be careful when doing so, as ownership stays with the unique_ptr.)
One of the painful things about our time is that those who feel certainty are stupid, and those with any imagination and understanding are filled with doubt and indecision - BR

Last edited by JHVH on Fri Oct 23, 4004 BCE 6:17 pm, edited 6 times in total.

Osha
Posts: 727
Joined: Wed Dec 03, 2008 3:24 am UTC
Location: Boise, Idaho, USA

Re: Coding: Fleeting Thoughts

Postby Osha » Wed May 16, 2012 8:15 pm UTC

Learning when to use unique_ptr and shared_ptr and weak_ptr will save you so much debugging time down the line it's not even funny.

I went from tracking down subtle memory allocation related bugs and memory leaks all the time to just not having to worry about them much. I recently programmed a barnes-hut tree without a single memory related bug (that valgrind could find) where that sort of thing would have been unheard of for me before, It was great :o

(There was a missing parenthesis bug that was messing up order of operations that took a couple hours to track down, which is either a sign I need to learn the rules better, make less premature assumptions about where a bug is, actually start writing unit tests, or to always put parenthesis around EVERYTHING in the ternary operation, probably the second and third...)

Ben-oni
Posts: 278
Joined: Mon Sep 26, 2011 4:56 am UTC

Re: Coding: Fleeting Thoughts

Postby Ben-oni » Wed May 16, 2012 8:27 pm UTC

Osha wrote:put parenthesis around EVERYTHING

Since nobody actually remembers all the rules, why don't we just use lisp style parens for everything anyways? Ask yourself: is another set of parens going to hurt? Is there somebody who might potentially read the code and not know the order of operations? Might you yourself potentially be one of those people? More parens never hurt anybody. Besides, a closing parade of parens is a badge of honor: the longer the parade, the better.

User avatar
hotaru
Posts: 1045
Joined: Fri Apr 13, 2007 6:54 pm UTC

Re: Coding: Fleeting Thoughts

Postby hotaru » Wed May 16, 2012 9:16 pm UTC

Ben-oni wrote:Ask yourself: is another set of parens going to hurt?

yes.

Code: Select all

factorial product enumFromTo 1
isPrime n 
factorial (1) `mod== 1

EvanED
Posts: 4331
Joined: Mon Aug 07, 2006 6:28 am UTC
Location: Madison, WI
Contact:

Re: Coding: Fleeting Thoughts

Postby EvanED » Wed May 16, 2012 9:28 pm UTC

Ben-oni wrote:More parens never hurt anybody.

An analogy.

User avatar
headprogrammingczar
Posts: 3072
Joined: Mon Oct 22, 2007 5:28 pm UTC
Location: Beaming you up

Re: Coding: Fleeting Thoughts

Postby headprogrammingczar » Wed May 16, 2012 9:50 pm UTC

I broke the '(' key on my laptop with just a single semester of Lisp coursework.
<quintopia> You're not crazy. you're the goddamn headprogrammingspock!
<Weeks> You're the goddamn headprogrammingspock!
<Cheese> I love you

User avatar
Shivahn
Posts: 2200
Joined: Tue Jan 06, 2009 6:17 am UTC

Re: Coding: Fleeting Thoughts

Postby Shivahn » Wed May 16, 2012 10:54 pm UTC

Osha wrote:Learning when to use unique_ptr and shared_ptr and weak_ptr will save you so much debugging time down the line it's not even funny.

Could you give me a brief explanation of when to use them? I'm sort of lost with smart pointers.

User avatar
Sc4Freak
Posts: 673
Joined: Thu Jul 12, 2007 4:50 am UTC
Location: Redmond, Washington

Re: Coding: Fleeting Thoughts

Postby Sc4Freak » Wed May 16, 2012 11:15 pm UTC

Basically a general rule of thumb is to never say "new" without immediately putting the result into a smart pointer.

Code: Select all

{
   Foo* x = new Foo(); // bad
   /* ... */
   delete x; // have to call delete manually otherwise you'll leak
}

Code: Select all

{
   unique_ptr<Foo> x(new Foo()); // good
   /* ... */
} // x magically goes away!

troyp
Posts: 557
Joined: Thu May 22, 2008 9:20 pm UTC
Location: Lismore, NSW

Re: Coding: Fleeting Thoughts

Postby troyp » Wed May 16, 2012 11:30 pm UTC

I often use spacing to denote grouping when I'm using operators, eg.

Code: Select all

a*b + c*d  ?  d  :  e
I do it mainly just to make the expression easier to parse visually, but it's also good for indicating order of operations if it may be unclear. Of course, that's only useful for reading code - you still have to check the order when you write it. And you probably want to make sure your spacing always matches the actual grouping or it may be confusing.

Not that I have anything against parens! But I prefer whitespace :-)
To me, the beauty of sexprs is what they enable: a really simple and versatile syntax. I consider them a minor inconvenience that buys you a lot, rather than a good thing in themselves. Of course, there is also a conditioning effect at work: sometimes when I see sexprs, I get a warm feeling and think "ah, Lisp code, how I've missed you."


Return to “Coding”

Who is online

Users browsing this forum: No registered users and 4 guests