Oct. 13, 2019, 2:14 p.m.

C/C++ Include Guidelines

Some (opinionated) guidelines for include file organisation in C/C++.

Intro

I present a list of points, probably best considered as guidelines, but which I'll refer to as rules throughout the rest of this post, since it trips off the tongue that bit more easily (or perhaps I just have some hidden authoritarian streak).

I'll assume the reader knows something about writing code in C or C++, and understands at least the basics about how compilation with include files works. Sometimes I'll say things are obvious to this intended audience, where I think this is necessary for completeness.

This is about an ideal way to organise code. For large existing code bases changing the code to follow all of the rules may not be straightforward but it should be possible to make some incremental steps towards the code structure I describe, resulting in some kind of incremental improvements in overall code quality.

Many of the rules have exceptions, and it can interesting to explore reasons for these exceptions, as well as the reasons for the rules themselves.

Modules

As an aside, please note that this is a 'pre-modules' document. If you're using c++ 20 and modules the situation will be different.

(At the time of writing I don't have any experience working with modules, and I'm not aware of all the details about how modules will work. I'm hoping that if we follow the rules described here, we should be reasonably well placed for moving to C++20 modules, but if that's not the case, and there are module related details that you think should be taken into account in this document, please let me know in the comments.)

The rules

  • 1. Headers are for linkage details, implementation goes in source files.
  • 2. Each source file should have a single matching header.
  • 3. Linkage to a source file should go only through its header.
  • 4. Guard against repeated includes.
  • 5. Each header should be self-sufficient.
  • 6. Each source file includes its own header first.
  • 7. Low level includes come after high level includes.
  • 8. Headers should not have side effects.
  • 9. Avoid unnecessary namespaces.
  • 10. Keep namespace identifiers short.
  • 11. Minimise dependencies.
  • 12. Each header should have one specific purpose.
  • 13. Header file name should correspond to what is provided.
  • 14. Make it easy to find header files.
  • 15. Make include paths explicit.

Let's go through these, one by one.

Linkage details

The first rule is a bit odd because it's a rule we can't actually obey in many situations:

  • 1. Headers are for linkage details, implementation goes in source files.

I put this rule first because I feel like this is kind of the original purpose of header files. Header files make the most sense where we can split linkage details from implementation, and the rule captures what header files are fundamentally about.

Where this works

The archetypal situation where we can obey the rule is when writing headers for functions. Function declarations are pretty much just the minimal information needed by the compiler in order to generate calls to the function from other compilation units. A function declaration can then go into a header, with the meat of the function (the function definition) in a source file.

We can split class method declarations and implementations in the same way, and can 'header' other things like class and struct names, and linkage to global variables, in a similar sense, but separating linkage from implementation gets more difficult in other circumstances, and for other C and C++ features.

When this is possible, this works well, from several viewpoints. In particular:

  • Source file compilation is insulated from the implementation details of other source files.
  • Only the headers for other, referenced, source files need to be considered, and not the contents of the source files themselves. (Makes compilation fast.)
  • Separation from implementation details of other source files is a very effective encapsulation mechanism (with header as interface and implementation details encapsulated behind this interface).
  • Implementation details can change without other source files being affected, as long as linkage details remain the same.

Where this doesn't work

As soon as we start defining classes and structs, one key 'linkage detail' is the size of the objects of the resulting class or struct. With information about object size, the compiler should be able to do implementation independent things like allocating memory or stack space for objects in advance of construction, but there's no way for us to provide the compiler with information about object size without making the whole class or struct definition visible.

(Imagine a world where we could provide some kind of 'augmented' forward declaration for classes or structs, specifying object size in addition to class or struct name, or where class definitions in headers could omit private data and methods but specify size explicitly. Kind of strange, but perhaps more consistent with the header file mechanism..)

We also find ourselves needing to put function or method definitions back into headers when we start using C++ template features, or if we want code to be inlined by the compiler.

Suggestions

  • Since we're working within a 'compile and link' framework, we should try and work to its strengths.
  • Put function declarations in a header and function definitions in a source file.
  • Put class method declarations in a header and method definitions in a source file.
  • Separate other implementation details out from headers, as far as reasonably possible. (Try and restrict header files to a minimal 'interface' to code implemented in source files.)
  • Use the 'pimpl' (or 'pointer to implementation') technique to remove class implementation details from headers.
  • Just because we're writing C++ that doesn't mean everything has to go in a class. Don't hesitate to put code in free-standing (or namespaced) functions when this makes sense.

Regarding inlining:

  • Don't assume that inlining will always improve performance.
  • (See 'Do inline functions improve performance?', in the C++ FAQ.)
  • Look for measurable benefits before inlining.
  • Be aware of disadvantages of inlining with regards to code organisation, dependencies, and compile times, and weight these things against any performance benefits.
  • Consider using linker features such as link-time code generation for performance critical builds, rather than inlining on a case by case basis.

Regarding templates:

At one point the standard included a possibility to 'export' templates (with the export keyword, and the 'export template' syntax here) but compiler support was very limited and this was removed from the language for C++11. (See this C++ FAQ entry.) If we have a limited number of concrete types to instantiate, however, it may still be possible to move the implementation of the template into a source file, using explicit template instantiation.

Otherwise:

  • Consider using virtual interfaces for generality, instead of templates (virtual function calls may be less significant than than you think - see this article for some discussion of this point)
  • Before templating for performance reasons (as with inlining), make sure there is an actual measurable and significant performance benefit, and weigh this against the (often also quite significant) cost in terms of code structure, build times, and executable size.
  • Where template code is unavoidable, try and 'insulate' other parts of the code (where performance is less important, or generality is not required) from the headers that contain the template code. Sometimes it can be worth providing both templated and polymorphic versions of an interface, for this purpose.

I guess it's just a case of using C++ features pragmatically and weighing the advantages of these features against practical considerations for the build system.

Let's move on to the other rules, most of which we should be able to follow reasonably consistently..

Source/header pairing

The next rule is:

  • 2. Each source file should have a single matching header.

Headers can be stand-alone (where this makes sense), but source files should each come with a corresponding header, and the file names for the resulting source/header pair should match.

So we might have SomeClass.cpp and SomeClass.h, or just SomeTemplateStuff.h (without a matching source file), but SomeOtherObject.cpp by itself is not allowed.

Motivation

If we use a header, (from other code), we need to know which object to link with to satisfy resulting linkage requirements.

Conversely, when working on code inside the source file, we can look at the header to see what linkage other source files expect from us.

Switching between source file and corresponding header is a very common operation, and it should be straightforward and obvious how to do this, which is why editors and IDEs often have a built in 'toggle header' command, which depends on names corresponding. Our code should then be organised to work well with this kind of tooling.

Exception - standard linkage elements

An exception to this rule is something like Main.cpp (or Winmain.cpp, or whatever), which serves as application entry point.

The linkage to Main.obj is expected to be something like:

int main (int argc, char *argv[]);

So this is known linkage, standardised for a reason (to serve as an entry point), and there are other situations where we might want to define some kind of standard linkage.

Consider, for example, the following assert macro definition:

// in "Assert.h":

int AssertFailed(const char*, int, const char*);

#ifdef FULL_RELEASE
 #define assert(expr) do{}while(0)
#else
 #define assert(expr) do\
 {\
   if(!(expr))\
     AssertFailed(__FILE__,__LINE__,#expr);\
 } while(0)
#endif

The details of the assert macro itself aren't important. The point is, code using this macro is likely to be linked into different contexts, where different actions should be taken on assertion failure. Perhaps the code is linked into a windowed application, where a message box should be displayed, and a server, where the assertion failure should be logged. So we just assume that a function named AssertFailed is defined, somewhere, with standard linkage. Each application must then fill this in appropriately.

Exception - splitting up large classes

Less commonly, imagine we have a really big old monolith class definition. (Obviously, try not to end up with big old monolithic class definitions, but, well, it happens.) Something I've seen, then, is for the implementation of that class definition to get split across a number of source files, like this:

MassiveClass.h
MassiveClass.cpp
MassiveClass__Persistence.cpp
MassiveClass__Rendering.cpp

If we do find ourselves working with something like this, we should try and split the class 'properly', first of all. So try and factor some of the logic out into separate classes that can be treated as components, or into stand-alone functions. But, in the meantime, I guess it's good to at least break things down into smaller pieces, in some way, and splitting the monolith class across multiple source files might be better than nothing.

A broader definition

If we broaden our definition of 'matching', some of the above cases don't actually need exceptions.

Specifically, we could interpret the rule as saying something like: 'For each source file there should be a single header, clearly related to that source file by naming conventions, that describes linkage into the source file.'

For the assertion example, the function header for AssertFailed should be pulled out from the macro definition, into something like StandardLinkage/AssertFailed.h:

// in "StandardLinkage/AssertFailed.h":
int StandardLinkage_AssertFailed(const char*, int, const char*);

Code to fulfill this requirement could then go into WindowedApplication/StandardLinkage/AssertFailed.cpp and ServerApplication/StandardLinkage/AssertFailed.cpp (with the relevant source file brought in, somehow, depending on target platform).

Similarly, for MassiveClass.h, double underscore in source file names could be reserved for this kind of naming, and then at least the various source files all point back to the same header, and the files all appear next to each other in a sorted directory listing.

(For extra points, we could even consider extending our IDE to understand these relationships, and do the right thing when the 'toggle source/header' hotkey is pressed.)

Don't bypass headers

  • 3. Linkage to a source file should go only through its header.

This follows on fairly directly from the previous rule. Each source file has a matching header, describing the possibilities for linkage with the contents of the source file, and this header should then be used for that purpose.

It's possible to bypass header files, with code like the following:

// (in some cpp file)

// nasty direct linkage to a 
// function defined in 'foo.cpp'
// bypassing 'foo.h'
int Foo_InternalFunction(bool);

void SomeCode()
{
    int result = Foo_InternalFunction(true);
}

I'm just going to say: Don't do this! :)

Include guards

Certain things we might want to put in a header, such as class definitions, should only be seen by the compiler once per compilation unit (with repeated definitions resulting in a compile error).

Because headers can include other headers, it's common for certain include lines to be encountered more than once, and we have to take steps to guard against repetition of the included content. The next rule is then:

  • 4. Guard against repeated includes.

(C/C++ programmer surely already know this but I'm including the rule nevertheless, for completeness.)

I personally recommend using #Pragma once at the top of each header, because this doesn't depend on the programmer choosing a unique identifier that's unique to each header file, and adds less visual noise to the header.

While this is officially non-standard, it's very widely supported in practice. Note the caveats here, though, which may apply if there's a possibility for the compiler to see the same include file at multiple locations (e.g. directly and then through a symbolic link). (Using old style include guards, is also reasonable enough.)

Exceptions?

Some things can be repeated multiple times per compiled object, without causing a compilation error.

This is true for forward declarations of classes, for example, and function declarations, and include guards could then be omitted from something like the following, without any problems:

// (SingleEdgesPartition.h)

class iGraph;
class cBitVector;

void
SingleEdgesPartition(
        const iGraph& graph,
        cBitVector& edgeSplitFlags
        );

Guarding against repeat inclusion for such a header doesn't hurt, however, and will probably also result in slightly faster compilation.

'Using' include files

Use of include files is a very common part of the average C/C+ programming work flow.

By 'use of' an include file, I'm referring to situations where we're working on some code, find we need to reference some other code, and add a new include line for this purpose. We could also talk about being an include file 'consumer'.

You might have seen the following quote, about the important of making source code readable:

"Indeed, the ratio of time spent reading versus writing is well over 10 to 1. We are constantly reading old code as part of the effort to write new code. ...[Therefore,] making it easy to read makes it easier to write."

--- Robert C. Martin, Clean Code: A Handbook of Agile Software Craftsmanship

Well I think a similar principle applies to include files, i.e. more time will be spent 'using' than writing the average include file. Ease of use trumps other considerations, and we should 'optimise' header files for this.

This is a theme which then underlies many of the rules in this post.

When we add a new include line somewhere in our source code, one of two things can happen, depending on how our headers are set up:

  1. Affected objects continue to compile correctly, and we can go on with our work.
  2. Or the new include line breaks the build. We have to figure out why, potentially solving some kind of constraint satisfaction puzzle, and change other include relationships, or the order of other includes, to get things working again.

For such a common action I think it's imperative to avoid costly breaks in programmer flow, and ensure the first of these two things happen, as much as possible.

Don't make the user resolve dependencies

The following rule follows directly from this:

  • 5. Each header should be self-sufficient.

Headers shouldn't make assumptions about what has already been included. Anything that needs to be included in order for the header to compile should be included, directly, at the top of the header itself.

Moving towards self-sufficiency

This rule, btw., is something that's usually easy to work towards incrementally. If we have a large code base that doesn't satisfy this requirement, we can often add include lines to headers as and when we notice that these include lines are required in order to make a certain header file more self-sufficient.

'Standard' headers

It's a common pattern to see certain things used throughout a particular codebase, and to pull the relevant includes out into a single standard header, to be included at the top of every source file. Maybe this standard header is also set up as a precompiled header.

To avoid confusing repetition of the word 'header' I'll replace 'the standard header', in the following couple of paragraphs, with stdafx.h, a filename often used for precompiled headers on Windows.

Standard (and precompiled) headers are a bit of a thorny issue, when it comes to code organisation, but my advice with regards specifically to this rule number 5 is to explicitly add include lines for all other headers theoretically required by a header, whether or not these are already included by stdafx.h, with the following reasoning:

  • The include lines at the top of a header provide useful documentation about dependencies and linkage requirements.
  • Omitting include lines from a header because they are in some stdafx.h makes the header dependent on that specific stdafx.h file.
  • Changes to stdafx.h can then break code using the header.
  • And it becomes difficult to use the header in another project (which may have a different stdafx.h file set up), and from other code, generally, in the future.

Leaving out include lines for dependencies is an ease of use problem stored up for the future, and the onus is then on the person writing a header file to include everything required by the header, explicitly.

Checking self-sufficiency

It's not enough to require header self-sufficiency. We need to actually check this, which brings us to the next rule:

  • 6. Each source file includes its own header first.

So, for example:

// (in Foo.cpp)

// include our own header
// (checks self sufficiency)
#include "Foo.h"
// other headers then follow
#include "Bar.h"
#include "AnotherHeader.h"
#include <vector>

//.. rest of the code in Foo.cpp

This ensures that, in at least one compiled object, the compiler gets presented with the header first, before any other headers are included. If the header isn't self-sufficient then we should get a compilation error for the 'hosting' object.

Standard headers

Coming back to standard headers, we can see that there is some conflict between the practice of using standard headers, and this rule number 6.

If we have a 'standard' header that isn't set up as a precompiled header, we should try and mode the standard header include line at least below the source file's own header. (To get the benefit of self sufficiency checking for the source file's own header.)

In practice, however, standard headers are often also set up as precompiled headers, (or a precompiled header is set up for build optimisation, and then becomes a kind of de facto standard header) and then in this case we have to include the precompiled header first, because that's just how this compiler feature works.

It's difficult to simply forbid precompiled headers, because they can make a very significant difference to total build times (in some cases), but my experience is that precompiled headers tend to make source file organisation, and particularly include relationships, significantly worse.

When using precompiled headers, we should at least put the hosted header right after the precompiled header, to get as much checking for header self sufficiency as possible.

We can then either just accept that header self-sufficiency is not checked directly, (and fix self-sufficiency issues on a more ad-hoc basis, as and when they come up), or figure out some other way to check for header self-sufficiency (more on this, perhaps, in a future blog post).

Double checking self-sufficiency

The next rule is based on the same kind of logic, and also helps check that headers remain self-sufficient:

  • 7. Low level includes come after high level includes.

Consider the following sequence of includes:

// (Foo.cpp)

#include "Foo.h"
#include <string>
#include "Bar.h"

//.. rest of the code in Foo.cpp
//.. (using std::string)

and:

// (Bar.h)

#pragma once

struct Bar
{
  std::string name;
  int number;
};

The Bar.h header uses std::string without including the necessary system header, breaking the rule about self-sufficiency. The problem is covered up (in this compilation unit at least), by the fact that we include the string standard header before Bar.h.

What rule 7 tells us to do is move the string include line down to the bottom of the includes (after Bar.h is included).

After this change, (assuming string is not also included by Foo.h), we get a compiler error about the std::string use in Bar.h. This is exactly what we want, in the interest of 'failing fast'! (At which point we can go ahead and fix the Bar.h header.)

Note that we're not worried about the opposite problem, i.e. the string header using stuff defined in Bar.h, because the string header is 'lower level', in some sense, than the Bar.h header.

Ordering non-system includes

The same kind of logic can be applied to non-system includes.

Imagine that we have some header, ReportError.h, with error handling in our application being pretty straightforward, in particular without any chance that this communicates over the network. This can then be considered as lower level, in the same sense as above, with respect to NetworkConnection.h.

// (CreateSession.cpp)

#include "CreateSession.h"
#include "NetworkConnection.h"
// ^ higher level
// v lower level
#include "ReportError.h"

//.. rest of the code in CreateSession.cpp

In some cases, we can even do this purely on the basis of language features, without knowing anything about the actual application logic. Consider a header that exists solely to provide the definition of an enumeration, and another header that provides a struct. The enumeration header can then be considered lower level, since structs can contain enums but enums cannot contain structs:

// (after some other includes)
#include "struct/Foo.h"
// ^ higher level
// v lower level
#include "enum/Bar.h"

Don't sweat it

This rule in particular is definitely more of a guideline than a strict rule, and the intention is certainly not to make it more difficult to add include lines.

We can note that the self-sufficiency checking is actually redundant, to some extent, if the previous rule (6) has already been applied consistently throughout a codebase (although this does remain useful for checking stand-alone headers, or other headers for which we don't actually compile source code, and when we're in the process of working through a codebase and working towards a state where headers are self-sufficient).

And we shouldn't have to spend time analysing headers, or try to force include lines into some kind of strict ordering (which probably doesn't exist). Rather, we should just aim for some kind of 'partial ordering' as far as it's clear which includes should be considered higher or lower level than others.

It is nice to have an idea of source code structure, however, and I think that having some kind of high to low level ordering between includes can help with this, as well as helping with checking header self-sufficiency.

Avoid side effects

Going back to our goal of making headers easy to 'use', an important part of this is that it should be as straightforward as possible to add a new include line to existing code. We should be able to add a new include line without worrying about the change breaking other stuff in the same compilation unit. To avoid collisions, or unexpected interactions between headers, the effects of including a given header should then be clearly defined and limited.

  • 8. Headers should not have side effects.

A classic example of bad practice here is the Windows header defining min and max. Preprocessor defines like this are particularly horrible, but the same principle applies to other compilation effects that can 'leak' out of the header and affect following code.

Using declarations

Another commonly quoted example is that we shouldn't put using declarations (such as using namespace std;) in header global scope. This has the effect of 'leaking' the entire contents of the namespace, to potentially clash with identifiers referenced anywhere after the header is included. The same goes for using definitions (as I think they're called), such as using std::string;, although with the effects limited to one specific identifier in this case.

We can find discussion about this point in various places around the internet, and I think that there's a kind of general agreement that this is bad form. See this stack overflow question, for example, and this blog post. The blog post talks about using function or namespace scope to avoid side effects, but we notably can't put using declarations in class scope. This is genuinely a pain in many cases, since explicit namespace specification can mean a load of extra typing and visual noise, but the point is that ease of use for a header trumps ease of writing the header, and side effects cause a lot more pain than visual noise.

Namespaces aren't free

The following pair of rules follow on directly from the observation, above, that explicit namespace specifiers are difficult to use in headers:

  • 9. Avoid unnecessary namespaces.
  • 10. Keep namespace identifiers short.

Namespaces can be an important tool for code organisation, but keep in mind that code using the namespace contents cannot always elide the namespace identifiers, so there's some tradeoff involved.

Think about how likely it actually is for identifiers to collide, in practice, if left outside of a namespace, and don't add namespaces just because it seems like a nice representation of program structure.

Dependencies

Dependency reduction is an essential part of code organisation. Code with less dependencies means faster builds. Reducing dependencies minimises the effects of code changes, and makes it easier to work with parts of our code in isolation. The next rule then simply states:

  • 11. Minimise dependencies.

This is something we can do, to a large extent, one header file at a time, with each header designed to bring in the smallest number of dependencies necessary to achieve the task at hand.

So:

  • Use forward declarations wherever possible.
  • Follow the various bits of advice for removing implementation details from headers from the discussion about rule 1. (Using pimpl, explicit template instantiation, etc.)
  • C and C++ dependencies tend to get pulled in by the compiler's need to know object size, so consider using pointer types (preferably smart pointers) purely as a way to separate out dependencies, in cases where the cost of the resulting heap allocation is not significant.

Knowing what to include

The rules up to here should help ensure we can 'just add the include line for a new header, and carry on coding', but before we get to that point we need to figure out we actually want to include that header.

Maybe there's some container class requirement coming out of the code we're working on, and this can be satisfied by an existing object somewhere in our code base, or perhaps we need to instantiate an object of a class for which we know the name.

More generally, we need to know when there is a header that can help us with a given coding task, and, if so where to find that header. The next three rules should help with this.

  • 12. Each header should have one specific purpose.
  • 13. Header file name should correspond to what is provided.
  • 14. Make it easy to find header files.

Single purpose

The 'single responsibility principle' is commonly applied to things like classes and functions, but I think the same principle applies equally well to header files.

We should aim to organise our headers so that, as far as reasonably possible, each header file does just one thing, but does it well.

It's not always clear what 'doing just one thing' should actually mean, in the general case, and, in particular how fine grained this 'one thing' should be, but one very common header file use case where I think this is pretty clear is the provision of class definitions.

In this case the language arguably doesn't allow us break things up enough (see the previous discussion about splitting linkage details from implementation for class definitions), and we should then at least aim for the class definition as a kind of minimum level of granularity.

So if we find ourselves with a bunch of class definitions (perhaps related to some aspect of our application) concatenated together into a single header (e.g. AnimationSystemClasses.h), we should try and split this up into separate headers for each class (e.g. Bone.h, Skeleton.h, and so on). (This usually turns out to be quite a straightforward refactoring step, in practice, since the old category header can initially just go ahead and include all of the class specific headers.)

For other C/C++ 'things' it can be less clear what level of granularity to choose. For a bunch of preprocessor defines, splitting each one into its own header would be pretty silly, but, in my experience, it can be useful to split enums and supporting structs out from other headers, in particular if these bring in other dependencies, or have less dependencies that the other stuff they are combined with.

Beyond a certain size, functions definitely qualify as a 'single thing' (and therefore benefit from being split out into their own source file/header pair), but the actual size threshold for this is a judgement call.

In each case, having a single, specific, purpose for each header file enables us to identify more clearly exactly what that header does, when looking at the header file name, but also helps with dependency reduction, since we can then choose to only pull in headers for the classes we actually need.

Don't hesitate to create small files, when this makes sense

There is a trade-off with file system costs for handling lots of small header files, but I think that file caching tends to work pretty well these days, and dependency reduction is important. For these reasons I think that, in many situations, we should pretty much abandon any preconceived notion of minimum header size, and if splitting something off into a small header helps reduce dependencies significantly then we should do this.

Header naming

Once we have headers that do just one thing, the next step is to make it as clear as possible, from the file names and file structure, exactly what this one thing is.

If the header provides a class definition, the header file name should correspond with the name of that class.

For code bases with some kind of class name prefix character, e.g. if where a Bone class is actually defined as class cBone; (something that's kind of out of fashion these days), it's worth following the same convention for our header files (so, in this case, the header file name should be named cBone.h).

For code bases without class name prefixes, an alternative is to put class headers into a 'class' subdirectory (and then enums into 'enum', and so on). The point is just to make it that much clearer, when looking at a list of header files, in the file system, what each file actually provides.

Likewise, if the header provides linkage details for a single function, the header file name should match the function name. If the header provides definitions of a bunch of stuff wrapped in a namespace, the header file name should match the name of the namespace, and so on.

Finding headers

Beyond individual file names, we should do whatever we can to make it easy to find header files (and the 'things' they provide).

A good, logical file structure helps with this (e.g. container classes grouped together), as does logical decomposition into components, indexing and documentation.

Include search paths

Consider two code snippets, corresponding to different ways to set up include search paths.

// (in CreateSession.cpp)

#include "CreateSession.h"
#include "Session.h"
#include "Settings.h"
#include "Dialogs.h"
#include "ReportError.h"
#include "RakPeerInterface.h"
#include "BitStream.h"

//.. rest of the code
// (in CreateSession.cpp)

#include "Network/CreateSession.h"
#include "Network/Session.h"
#include "Application/Settings.h"
#include "Application/Dialogs.h"
#include "Application/ReportError.h"
#include "ExternalLib/RakNet/RakPeerInterface.h"
#include "ExternalLib/RakNet/BitStream.h"

//.. rest of the code

The headers are at exactly the same file system locations, in each case. The difference is that, in the first snippet, the Network, Application and ExternalLib/RakNet directories have each been added individually to the compiler include search path, with less information then required at point of include.

Our final rule tells us to prefer the second code snippet:

  • 15. Make include paths explicit.

There are two main reasons for this. We want to make it clear what file is included, and from where, first of all, for anyone looking at the include directives in the source code. And then explicit paths also help avoid header collisions and confusing situations where headers are included from unexpected locations.

The RakNet bitstream header referenced by the code snippets above is a good example of both points. In the first snippet, it's not clear to someone auditing the code that the Bitstream header represents an external dependency, with the Bitstream class implementation actually provided by the RakNet external library. And then, without the additional include path information, this header now 'collides' with any BitStream.h we might choose to create for our own application.

Similar considerations then also apply to internal code. As codebases expand, organisation into subdirectories is important for code structure. Code subdirectories should reflect some kind of modular structure of the source code. Include relationships tell us about dependency relations between code subdirectories, and namespace collisions between code subdirectories should be avoided.

Suggested setup

As an ideal case, where possible, I suggest a single root directory for our application internal includes (which could be our application 'code' subdirectory, or something like this). This also has the effect of reducing the amount of project configuration, (particularly if our source code is split into a number of different projects internally), which is also a good thing.

External libraries may need some additional setup. For example, in a concrete situation involving the RakNet headers mentioned above, I actually moved these from a directory like ExternalLibraries/RakNet into something like ExternalLibraries/RakNet/external_api/ExternalLib/RakNet. Note that ExternalLib/RakNet is actually the only thing inside external_api, and, from a file system perspective, this seems kind of superfluous. The point is that we are going to add the external_api directory to our include path, and this has the effect of dumping everything in that directory into a kind of global includes 'namespace'. ExternalLib/RakNet is then added purely for namespacing purposes. (You may recognise this setup from other third part libraries, which already come with similar, seemingly superfluous, 'namespacing' subdirectories.)

Relative paths

I also suggest avoiding relative include paths, for similar reasons. This means both explicit relative paths (with '..') and include paths which resolve to file system locations relative to the directory of the cpp file being included in the search.

Wrapping up

So that wraps up the rules list.

Mechanical improvements

The rules listed here are relatively mechanical, by which I mean that these are things that can be applied locally to individual bits of source code, without strokes of genius or flashes of inspiration, but with the possibility to nevertheless significantly improve code organisation, which is useful particularly in the case of large legacy code bases.

Quarantine

If we find ourselves obliged to include certain specific headers which break a lot of these guidelines, it can be a good idea to attempt to 'quarantine' such headers, i.e. restrict the part of the code that actually deals directly with the header as much as possible. (One header in particular springs to mind here, which is the windows.h platform header on Windows.)

Further considerations

Other important questions, extending beyond the topic of include file organisation into more general source code architecture, are: 'How do we avoid the possibility for circular include relationships?', and 'How best to split code across different source code components?' and these are things I may come back to in future blog posts.

(Reddit discussion)