Zero Initialisation for Classes
(Number 5 in a series of posts about Vectors and Vector based containers.)
This is a response to comments on a previous post, roll your own vector, and has also been rewritten and updated fairly significantly since first posted.
In roll your own vector I talked about a change we made to the initialisation semantics for PathEngine's custom vector class. In my first followup post I looked more closely at possibilities for replacing resize() with reserve() (which can avoid the initialisation issue in many cases), but so far I'm been concentrating pretty much exclusively on zero initialisation for built-in types. In this post I come back to look at the issue of initialisation semantics for class element types.
Placement new subtleties
At it's root the changed initialisation semantics for our vector all come down to a single (quite subtle) change in the way we write one of the placement new expressions.
It's all about the placement new call for element default construction. This is required when elements need to be initialised, but no value is provided for initialisation by the calling code, for example in a call to vector resize() with no fill value argument.
As shown in my previous post, the standard way to implement this placement new is with the following syntax:
new((void*)begin) T();
but we chose to replace this with the following, subtly different placement new syntax:
new((void*)begin) T;
So we left out a pair of brackets.
Note that this missing pair of brackets is what I'm talking about when I refer to 'changed initialisation semantics'. (Our custom vector class does not omit initialisation completely!)
What those brackets do
So what do those brackets do, and what happens when we remove them?
Well, this is all about 'zero initialisation'.
In certain cases the memory for the object of type T being constructed will get zero initialised in the first version of the placement new call ('new((void*)begin) T()'), but not in the second version ('new((void*)begin) T').
You can see find these two initialisation types documented on cppreference.com, in 'default initialisation' and 'zero initialisation', and you can find some additional explanation of these two construction semantics on this stackoverflow answer, as well as in the related links.
This makes a difference during element construction for built in types, (as we saw with the buffer initialisation overhead in my previous post), but also for certain types classes and structs, and this is what I'll be looking at in this post.
Initialisation of built in types
It's quite well known that initialisation for built-in types works differently for global variables (which are usually created as part of the program's address space) and local variables (which are allocated on the program stack).
If we start with the following:
int main(int argc, char* argv[]) { int i; assert(i == 0); return 0; }
This runs through quite happily with the debug build, but if I turn assertions on in the release build then this assertion gets triggered. That's not really surprising. This kind of uninitialised local variable is a well known gotcha and I think most people with a reasonable amount of experience in C++ have come across something like this.
But the point is that the local variable initialisation here is using 'default initialisation', as opposed to 'zero initialisation'.
And if we change i from a local to a global variable the situation changes:
int i; int main(int argc, char* argv[]) { assert(i == 0); return 0; }
This time the variable gets zero initialised, and the program runs through without assertion in both release and debug builds.
The reason for this is that global variables can be initialised in the linked binary for your program, at no cost (or else very cheaply at program startup), but local variables get instantiated on the program stack and initialising these explicitly to zero would add a bit of extra run time overhead to your program.
Since uninitialised data is a big potential source of error, many other (more modern) languages choose to always initialise data, but this inevitably adds some overhead, and part of the appeal of C++ is that it lets us get 'close to the metal' and avoid this kind of overhead.
Zero initialisation and 'value' classes
What's less well known (I think) is that this can also apply to classes, in certain cases. This is something you'll come across most commonly, I think, in the form of classes that are written to act as a kind of 'value type', and to behave in a similar way to the C++ built in types.
More specifically, it's all about classes where internal state is not initialised in during class construction, and for which you could choose to omit the class default constructor.
In PathEngine we have a number of classes like this. One example looks something like this:
class cMeshElement { public: enum eType { FACE, EDGE, VERTEX, }; //.. class methods private: eType _type; int32_t _index; };
Default construction of value classes
What should happen on default construction of a cMeshElement instance?
The safest thing to do will be to initialise _type and _index to some fixed, deterministic values, to eliminate the possibility of program execution being dependant on uninitialised data.
In PathEngine, however, we may need to set up some fairly large buffers with elements of this type. We don't want to limit ourselves to only ever building these buffers through a purely iterator based paradigm (as discussed in my previous post), and sometimes want to just create big uninitialised vectors of cMeshElement type directly, without buffer initialisation overhead, so we leave the data members in this class uninitialised.
Empty default constructor or no default constructor?
So we don't want to do anything on default construction.
There are two ways this can be implemented in our value type class. We can omit the class default constructor completely, or we can add an empty default constructor.
Omitting the constructor seems nice, insofar as avoids a bit of apparently unnecessary and extraneous code, but it turns out there's some unexpected complexity in the rules for C++ object construction with respect to this choice, and to whether an object is being constructed with 'zero initialisation' or 'default initialisation'.
Note that what the two terms refer to are actually two different sets of object construction semantics, with each defining a set of rules for what happens to memory during construction (depending on the exact construction situation), and 'zero initialisation' does not always result in an actual zero initialisation step.
We can test what happens in the context of our custom vector, and 'value type' elements, with the following code:
class cInitialisationReporter { int i; public: ~cInitialisationReporter() { std::cout << "cInitialisationReporter::i is " << i << '\n'; } }; class cInitialisationReporter2 { int i; public: cInitialisationReporter2() {} ~cInitialisationReporter2() { std::cout << "cInitialisationReporter2::i is " << i << '\n'; } }; template <class T> void SetMemAndPlacementConstruct_ZeroInitialisation() { T* allocated = static_cast<T*>(malloc(sizeof(T))); signed char* asCharPtr = reinterpret_cast<signed char*>(allocated); for(int i = 0; i != sizeof(T); ++i) { asCharPtr[i] = -1; } new((void*)allocated) T(); allocated->~T(); } template <class T> void SetMemAndPlacementConstruct_DefaultInitialisation() { T* allocated = static_cast<T*>(malloc(sizeof(T))); signed char* asCharPtr = reinterpret_cast<signed char*>(allocated); for(int i = 0; i != sizeof(T); ++i) { asCharPtr[i] = -1; } new((void*)allocated) T; allocated->~T(); } int main(int argc, char* argv[]) { SetMemAndPlacementConstruct_ZeroInitialisation<cInitialisationReporter>(); SetMemAndPlacementConstruct_ZeroInitialisation<cInitialisationReporter2>(); SetMemAndPlacementConstruct_DefaultInitialisation<cInitialisationReporter>(); SetMemAndPlacementConstruct_DefaultInitialisation<cInitialisationReporter2>(); return 0; }
This gives the following results:
cInitialisationReporter::i is 0 cInitialisationReporter2::i is -1 cInitialisationReporter::i is -1 cInitialisationReporter2::i is -1
In short:
-
If our vector uses 'zero initialisation' form (placement new with brackets), and the value type has default constructor omitted then the compiler will add code to zero element memory on construction.
-
If our vector uses 'zero initialisation' form (placement new with brackets), and the value type has an empty default then the compiler will leave element memory uninitialised on construction.
-
If the vector uses 'default initialisation' form (placement new without brackets), then the compiler will leave element memory uninitialised regardless of whether or not there is a default constructor.
Zero initialisation in std::vector
The std::vector implementations I've looked at also all perform 'zero initialisation' (and I assume this is then actually required by the standard). We can test this by supplying the following custom allocator:
template <class T> class cNonZeroedAllocator { public: typedef T value_type; typedef value_type* pointer; typedef const value_type* const_pointer; typedef value_type& reference; typedef const value_type& const_reference; typedef typename std::size_t size_type; typedef std::ptrdiff_t difference_type; template <class tTarget> struct rebind { typedef cNonZeroedAllocator<tTarget> other; }; cNonZeroedAllocator() {} ~cNonZeroedAllocator() {} template <class T2> cNonZeroedAllocator(cNonZeroedAllocator<T2> const&) { } pointer address(reference ref) { return &ref; } const_pointer address(const_reference ref) { return &ref; } pointer allocate(size_type count, const void* = 0) { size_type byteSize = count * sizeof(T); void* result = malloc(byteSize); signed char* asCharPtr = reinterpret_cast<signed char*>(result); for(size_type i = 0; i != byteSize; ++i) { asCharPtr[i] = -1; } return reinterpret_cast<pointer>(result); } void deallocate(pointer ptr, size_type) { free(ptr); } size_type max_size() const { return 0xffffffffUL / sizeof(T); } void construct(pointer ptr, const T& t) { new(ptr) T(t); } void destroy(pointer ptr) { ptr->~T(); } template <class T2> bool operator==(cNonZeroedAllocator<T2> const&) const { return true; } template <class T2> bool operator!=(cNonZeroedAllocator<T2> const&) const { return false; } };
Oh, by the way, did I mention that I don't like STL allocators? (Not yet, I will in my next post!) This is a bog standard STL allocator with the allocate method hacked to set all the bytes in the allocated memory block to non-zero values. The important bit is the implementation of the allocate and deallocate methods. The rest is just boilerplate.
To apply this in our test code:
int main(int argc, char* argv[]) { std::vector<cInitialisationReporter, cNonZeroedAllocator<cInitialisationReporter> > v1; v1.resize(1); std::vector<cInitialisationReporter2, cNonZeroedAllocator<cInitialisationReporter2> > v2; v2.resize(1); return 0; }
And this gives:
cInitialisationReporter::i is 0 cInitialisationReporter2::i is -1
Class with no default constructor + std::vector = initialisation overhead
So if I implement a 'value class' without default constructor, and then construct an std::vector with elements of this type, then I get initialisation overhead. And this accounts for part of the speedups we saw when switching to a custom vector implementation (together with the corresponding issue for built in types).
But there's a clear workaround for this issue, now, based on the above. To use std::vector, but avoid initialisation overhead for value type elements, we just need to make sure that each of our value type classes has an empty default constructor.
Extending to a wrapper for working around zero initialisation for built-in types
In the comments (commenting on the original version of this post!) Marek Knápek suggests using the following wrapper to avoid zero initialisation, in the context of built-in types:
template<typename T> // assuming T is int, short, long, std::uint64_t, ... // TODO: add static assert class MyInt{ public: MyInt() // m_int is "garbage-initialized" here {} public: T m_int; };
And sure enough, this works (because of the empty default constructor in the wrapper class). But I really don't like using this kind of wrapper in practice, as I think that this complicates (and slightly obfuscates!) each vector definition.
Using default initialisation semantics for our custom vector avoids the need for this kind of workaround. And, more generally, if we take each of the possible construction semantics on their merits (ignoring the fact that one of these is the behaviour of the standard vector implementation), I prefer 'default initialisation' semantics, since:
- these semantics seem more consistent and avoid surprises based on whether or not an empty default constructor is included in a class, and
- value type classes shouldn't depend on zero initialisation, anyway (since they may be instantiated as local variables)
Type specialisation
One thing to be aware of, with this workaround, is that it looks like there can be implications for type specialisation.
When I try the following (with clang 3.2.1):
cout << "is_trivially_default_constructible<cInitialisationReporter>: " << is_trivially_default_constructible<cInitialisationReporter>::value << '\n'; cout << "is_trivially_default_constructible<cInitialisationReporter2>: " << is_trivially_default_constructible<cInitialisationReporter2>::value << '\n';
I get:
error: no template named 'is_trivially_default_constructible' in namespace 'std'; did you mean 'has_trivial_default_constructor'?
and then when I try with 'has_trivial_default_constructor':
cout << "has_trivial_default_constructor<cInitialisationReporter>: " << has_trivial_default_constructor<cInitialisationReporter>::value << '\n'; cout << "has_trivial_default_constructor<cInitialisationReporter2>: " << has_trivial_default_constructor<cInitialisationReporter2>::value << '\n';
I get:
has_trivial_default_constructor<cInitialisationReporter>: 1 has_trivial_default_constructor<cInitialisationReporter2>: 0
This doesn't matter for PathEngine since we still use an 'old school' type specialisation setup (to support older compilers), but could be something to look out for, nevertheless.
Conclusion
The overhead for zero initialisation in std::vector is something that has been an issue for us historically but it turns out that for std::vector of value type classes, zero initialisation can be avoided, without resorting to a custom vector implementation.
It's interesting to see the implications of this kind of implementation detail. Watch out how you implement 'value type' classes if they're going to be used as elements in large buffers, and maximum performance is desired!
Comments (discussion closed)
I read all your articles about custom vector class. You were afraid of some "auto-initialization" of data in your "buffers" managed by std::vectors because it costs some time to finish the work and it is often not important whether the data is initialized or not. You were not satisfied by std::vector's resize method because it initializes new elements and also not by reverse method because it doesn't permit you access to allocated (and uninitialized) memory/elements (technically they are not elements because no constructor was run for them, they are not "alive").
I think this article kills it. You can use std::vector<myint<int>> (code bellow) and throw your custom vector away. You can call standard resize method without paying price to initialize all elements and pass this vector to your low-level file loading API or whatever.
This has big advantage that you and your customers/users/developers have fewer things to learn, fewer things to write, debug and maintain. I hope that everyone knows the std::vector, its interface and how to use. It is well-known standard class, is optimized by tons of smart people, and handles thrown exceptions and corner cases well (push_back middle element).
If I’m wrong, correct me, please.
Marek.
[code]
template<typename t="">
// assuming T is int, short, long, std::uint64_t, ...
// TODO: add static assert
class MyInt{
public:
MyInt()
// m_int is "garbage-initialized" here
{}
public:
T m_int;
};
[/code]
This looks like this will also get zero initialised (MyInt), but try it if you like.
Try the following, and see what v[0].m_int gets set to:
But even if this would work, what a mess, no? A vector of simple built in type seems much cleaner than a vector of some wrapper class than this.
And there are other reasons for us to use our custom vector class.
See the capacity management tweaks I described in the original roll your own vector post, for example, but there are also issues with custom allocation that I will be looking at in the future.
And I think you are maybe a bit too worried about the whole idea of replacing std::vector.
In practice this has worked out fine for us, and I have no regrets about this based on my practical experience.
Note that we try and stick pretty closely to std::vector semantics for the subset of std::vector functionality that we actually require (apart from changes we make specifically for optimisation purposes), and so we get the advantages of reusing this _model_...
Tested the code and it returns -1 as expected (no extra "initialization" performed by std::vector you are so worried about - no time spend there). My point is to understand and obey std::vector first and bend it to some custom vector later if everything else fails. So I get standard behavior that everyone else understands - this is very important for me (not to learn new things if it can be easily achieved by std things). Maybe I missed something in your comment because I'm drunk of new year celebration. std::vector::resize[MyInt[int]] will be cheap because of "garbage-initialization" (not class itself, but of member variable m_int) and physical layout of MyInt[int] will be the same as int so there is no problem (if you replace int by cFace) of putting it to low-level API. I agree that my solution is a bit mess, as you say, but it can be easily typedefed as your cFace is. Next just few bullets:
- I will read your earlier posts about capacity management tweaks again.
- Yes, I'm worried about replacing std lib elements by half-working-my-use-case-specific clones.
- I'm also worried about your last sentence that you stick to std: what about move semantics: it gives you performance benefit for free deep inside std lib if you switch to C++11 compiler.
- About custom allocators: I think they changed in C++11 version, they can be statefull now (I'm not sure) this is big game changer.
- I very liked your article about vector of vectors (in your specific use-case).
Please don't get my opinions much negative, I just write what I think, maybe it is wrong (technically) maybe it doesn't make sense (architecturally) but I would like to continue in constructive debate. I will read your articles again and reply in few days.
Marek.
> Tested the code and it returns -1 as expected (no extra "initialization"
performed by std::vector you are so worried about - no time spend
there).
Yep, looks like you're right there.
And then it looks like we can also get the same effect in cInitialisationAsserter by adding an empty default constructor to that class (no need for a non-empty default constructor in fact).
(Thought I checked for this previously, but I guess not.)
So if we are careful about our exact class setup it seems like this is actually something that we can work around in practice for class element types.
(I'll look at updating the post to reflect this..)
> Please don't get my opinions much negative, I just write what I think,
maybe it is wrong (technically) maybe it doesn't make sense
(architecturally) but I would like to continue in constructive debate.
I'm very pleased for your feedback, in fact.
As I noted in the post I'm not being completely rigorous in my approach (not reading the standard for example), and more generally my intention with these posts is not to provide some kind of definitive truth, (otherwise it would be much harder for me to find the time to make these posts!), but just (hopefully) something that is worthwhile to discuss. I'd rather be wrong, but about something that is non-obvious (and that perhaps other people also did not understand), than post something that is obviously true but of no interest.
The whole zero initialisation thing is then interesting, I think: a) because it made a significant difference to us in practice, and b) because there are these kinds of tricky details that can make a difference to this, and constructive debate about this topic (and in general) is definitely appreciated!
> About custom allocators: I think they changed in C++11 version, they can
be statefull now (I'm not sure) this is big game changer.
Ok, yes that's interesting. We still need to supporting pre C++11 compilation environments (or environments where standard library implementations still have issues with stuff like this), but I should probably look into this a bit before posting about the custom allocator side of things. ;)
> I'm also worried about your last sentence that you stick to std: what
about move semantics: it gives you performance benefit for free deep
inside std lib if you switch to C++11 compiler.
Yes move semantics are very cool. In practice not so much of an issue for us, however, since our performance critical situations all involve vectors of POD types..
> Yep, looks like you're right there.
> And then it looks like we can also get the same effect in
cInitialisationAsserter by adding an empty default constructor to that
class (no need for a non-empty default constructor in fact).
Actually, it seems like this depends on whether or not c++11 support is turned on (for the compiler I am testing on at least). Will look at adding some additional info about my findings for this to the main post..
(Rewrote the post to reflect the actual final situation for this, following your comments. Thanks, again, for this feedback.)