May 8, 2024, 11:43 a.m.

This post is about risks in the development of increasingly capable AI, in particular the risk of losing control to AI and extinction risk. I'll suggest that a key question is, "When do we need to take this kind of risk seriously?"

We'll look at the issue of 'agency overhang', which suggests that adding agent-like abilities to AI could result in a sudden and surprising increase in these kinds of risks.

I'll draw on intuitions about humans taking administrative and political control (with reference to the 'Dictator Book Club') and rephrase agency overhang as 'power play overhang'.

I'll finish by suggesting that a lot of people may be making a subtle but important mistake in imagining just one fairly specific path to dangerous AI.

Post

Oct. 13, 2019, 2:14 p.m.

Some (opinionated) guidelines for include file organisation in C/C++.

Post

May 31, 2018, 3:33 p.m.

Automatic Object Linkage, with Include Graphs

(Source code sharing without static libraries)

In this post I describe an alternative approach to sharing source code elements between multiple build targets.

It's based on a custom build process I implemented at PathEngine for our C and C++ code-base (but the core ideas could also be relevant to other languages with compilation to object files).

We'll need to enforce some constraints on the way the source code is set up, notably with regards to header file organisation, but can then automatically determine what to include in the link operation for each build target, based on a graph of include relationships extracted from the source files.

The resulting dynamic, fine grained, object-file centric approach to code sharing avoids the need for problematic static library decomposition, and the associated 'false dependencies'.

Post, Comments

Sept. 23, 2017, 3:12 p.m.

A cautionary tale about statically-linked libraries, as generated by C/C++ build tools.

As a project accumulates features, and complexity, it gets harder to understand exactly what's going on, and to find your way around the source code. You need to find some way to organise the code and try and keep things manageable.

A common idea, in this situation, is to group some source files together to split out as a static library.

I'm going to argue that this actually does very little, in itself, to increase modularity, can have the effect of significantly increasing dependencies, and is maybe not such a good idea, after all.

Post, Comments

Feb. 16, 2017, 2:52 p.m.

Who Owns Who?

(Inverting the PathEngine ownership graph)

Designing an API with creation and destruction of objects can be tricky, particularly if the objects have dependency relationships.

How do we ensure that constructed API objects get destroyed at the appropriate time (and are not 'leaked')? How do we prevent code from attempting to access objects after destruction? And how do we do all this in situations where some objects depend on the existence of other objects?

In this post I talk about how I approached this in PathEngine (a pathfinding library, in C++).

When I first designed the PathEngine API, I chose an approach based on larger objects each owning a set of smaller 'contained' objects. That kind of worked out, but it turns out there's another, much better way to do this. And so it was that I recently found myself going through the SDK and turning the whole thing upside down.

Post

Dec. 29, 2016, 4 p.m.

Skeleton Meshes

(Break it all apart!)

This post explores a slightly unusual design choice for some key data structures at PathEngine.

More specifically, it's about polygon mesh data structures (in C++), and follows on from a previous post about polygon meshes 'in the style of std::vector'.

One of the consequences of a vector-like approach is that mesh elements can be referenced by index, and then accessing elements by index means we can break stuff apart.

It's a story of violated object oriented programming principles, broken encapsulation, and unsafe code, but the code we end up with is then more functional and a lot more fun to work with!

Post

Nov. 6, 2015, 5:52 p.m.

Over the years, I've learnt to be suspicious of any dynamic allocation in performance critical code, and moving allocations back up the call stack can often yield significant speedups.

But I'm guilty of being a little religious about this, on some occasions, and this hasn't always been for the best.

Dynamic allocation helps a lot with writing flexible and robust code. Changing code to remove dynamic allocations can make the code more complicated, brittle, and harder to maintain, and sometimes generalised memory allocation is really just the best tool for the job.

The first rule of optimisation is to measure. In this post I show how to measure when and where dynamic allocation is actually a significant overhead (and a worthwhile optimisation target), in a simple platform independent way.

(Sample code is in C++ but the core idea is straightforward and can be applied in other languages.)

Post

Aug. 26, 2015, 4:21 p.m.

So I wanted to look up some local travel info, the other day.

There's a detailed PDF file available with exactly the information I need, but my smart-phone really struggles with this heavy-weight PDF.

Luckily, it turns out to be easy to pre-render the PDF to a big old jpeg and view this image with my phone's built-in image viewer.

This is so straightforward and convenient, I might as well share the details for the process here.

Post

July 29, 2015, 5:07 p.m.

Vector Meshes

(Polygon meshes in the style of std::vector)

I remember that when the STL came out, vectors were quite a surprising new way to approach dynamic arrays. In this post we'll look at some of the implications of applying the vector 'paradigm' to a more involved data structure, the dynamic mesh.

We do a load of mesh processing at PathEngine, both during content-processing and at SDK run-time.

Sometimes this involves modifying the mesh, and we need dynamic mesh data structures to support this.

We started out using a traditional pointer-based, 'list-like' mesh data structure, but switched to a 'vector-like' approach, based on contiguous buffers, and from iterators to direct indexing..

Post, Comments

Feb. 26, 2015, 5:32 p.m.

Vectors are great when adding or removing elements at the end of a sequence, but not so hot when deleting elements at arbitrary positions.

If that's a requirement, you might find yourself reaching for a pointer-based list.

Not so fast!

Memory locality is important, contiguous buffers are a really good thing, and a standard vector will often out-perform pointer-based lists even where you perform non-contiguous, list-style modifications such as arbitrary element deletion.

And we can 'host' a list within a vector to get the advantages of a contiguous buffer at the same time as 0(1) complexity for these kinds of manipulations.

Post, Comments

Oct. 20, 2014, 1:30 p.m.

Cryptocurrencies are great for decentralised payments. Once you have bitcoin you can pay arbitrary destinations without the need to trust in, or get the approval of, any third party.

But what about if you want to exchange your bitcoin for something else, say another cryptocurrency?

This normally means registering with a third party exchange service, and trusting that third party service not to lose your funds during the exchange. It's not ideal, and doesn't really fit with the whole decentralised thing!

It turns out that proper atomic exchange between two cryptocurrencies isn't fundamentally hard, as long as you have the right building blocks, and this is something that I think should then be standardised across different cryptocurrencies.

Post, Comments

Sept. 26, 2014, 11:38 a.m.

In PathEngine, we do a lot of stuff on the surfaces of ground meshes.

One example is tracking the ground position for a moving agent, which is implemented as traversal along a 2D line through a mesh.

When coding stuff like this the need to check and handle special case conditions like vertex crossings, or traversal exactly along an edge, can add a lot of complexity.

In this post I talk about a technique we use in PathEngine, based on an infinitesimally offset mesh model, to eliminate these kinds of special case conditions and simplify our mesh based geometry.

Post

July 1, 2014, 3:44 p.m.

Another custom container post for my Vectors and Vector Based Containers series.

In this post I'll look at an alternative implementation of the humble bit vector, designed specifically for optimising zero fill operations.

It's not a new technique, and not particularly complicated to implement, but it's not completely obvious, and it's something that gave us some noticeable speedups when applied to PathEngine so I think it's worth sharing here.

Post, Comments

June 20, 2014, 2:54 p.m.

(Another post about bitcoin RPC from Python.)

For certain bitcoin applications you'll need to perform some kind of processing on all of the transactions coming through on the bitcoin block chain. In a bitcoin wallet application, for example, you need to check each new transaction to identify any outputs which are spendable by the wallet, and add the corresponding amounts to the wallet balance.

In this post we'll look at how to do this with bitcoin RPC calls.

Theres one obvious way to achieve this, but the obvious approach requires a full transaction index, and also works out to be quite inefficient. I'll show you a less obvious, and more efficient way to do this, without the transaction index requirement.

Post, Comments

April 7, 2014, 10:02 a.m.

The reference bitcoin client includes a powerful API and RPC interface.

In this post I show you how to call into this from Python (which is something that turns out to be almost trivially easy to set up).

Python can work well as a kind of (overpowered) scripting language for automating complicated tasks through the bitcoin reference client, but this is also a great way to get started out if you're interested in writing more involved software that works with bitcoin transactions or the bitcoin blockchain.

Post, Comments

Jan. 21, 2014, 4:55 p.m.

(Number 6 in a series of posts about Vectors and Vector based containers.)

A few posts back I talked about the idea of 'rolling your own' STL-style vector class, based on my experiences with this at PathEngine.

In that original post and these two follow-ups I talked about the general approach and also some specific performance tweaks that actually helped in practice for our vector use cases.

I haven't talked about custom memory allocation yet, however. This is something that's been cited in a number of places as a key reason for switching away from std::vector so I'll come back now and look at the approach we took for this (which is pretty simple, but nonstandard, and also pre C++11), and assess some of the implications of using this kind of non-standard approach.

I approach this from the point of view of a custom vector implementation, but I'll be talking about some issues with memory customisation that also apply more generally.

Post, Comments

Dec. 22, 2013, 4:41 p.m.

(Number 5 in a series of posts about Vectors and Vector based containers.)

This is a response to comments on a previous post, roll your own vector, and has also been rewritten and updated fairly significantly since first posted.

In roll your own vector I talked about a change we made to the initialisation semantics for PathEngine's custom vector class. In my first followup post I looked more closely at possibilities for replacing resize() with reserve() (which can avoid the initialisation issue in many cases), but so far I'm been concentrating pretty much exclusively on zero initialisation for built-in types. In this post I come back to look at the issue of initialisation semantics for class element types.

Post, Comments

Dec. 20, 2013, 4:39 p.m.

Avoid resize()?

(Uninitialised buffers and std::vector)

(Number 4 in a series of posts about Vectors and Vector based containers.)

This post is essentially a response to feedback to this previous post.

In that post I talked about a change we made to the initialisation semantics for PathEngine's custom vector class, and described a specific use case where this can make a difference, with that use case involving calls to the vector resize() method.

In the comments for that post, Herb Sutter says:

Thomas, I think the point is that "reserve + push_back/emplace_back" is the recommended style and the one people do use. Really resize() is not used often in the code I've seen; you use resize() only when you want that many extra default-constructed elements, since that's what it's for.

In this post I'll be looking into our use case in a bit more detail, and in particular whether or not a resize() call is actually required.

Post, Comments

Dec. 15, 2013, 12:14 p.m.

(Third in a series of posts about Vectors and Vector based containers.)

STL vectors offer a great combination of dynamically resizing convenience and low level contiguous buffer accessing performance, but std::vector is not the only option, and custom implementations of this container can sometimes be a better fit for your specific requirements.

Post, Comments

Nov. 22, 2013, 12:45 p.m.

(Second in a series of posts about Vectors and Vector based containers.)

STL style vectors are convenient because they hide the details of internal buffer management, and present a simplified interface, but sometimes convenience can be a trap!

In my previous post I touched briefly on STL vectors with non-simple element types, and mentioned the 'vector of vectors' construct in particular as a specific source of memory woes.

In this post I'll talk about this construct in more detail, explain a bit about why there is an issue, and go on to suggest a fairly straightforward alternative for many situations.

Post, Comments

Nov. 18, 2013, 10:39 a.m.

(First in a series of posts about Vectors and Vector based containers.)

STL style vectors are a pretty useful construct.

I remember coming to C++ (and the STL) from a background in lower level programming, and finding STL vectors a bit surprising, with the idea of amortized constant time complexity and so on.

And then it was actually something of a minor revolution just to be able to set up a contiguous data buffer, without any significant programming effort, and without knowing the maximum required buffer size in advance.

Before we had C++ and the STL I remember a common source of errors being dynamically generated data overflowing fixed buffers, which is something we just don't see any more, thanks in no small part to the humble vector (and brethren).

But convenience can come at a cost, and there's some stuff you need to be careful about when applying STL vectors in performance critical situations.

In this post I'll discuss the main ways in which STL style vectors are applied in PathEngine, and look at some of the key performance and memory points coming out of this.

Post

Nov. 18, 2013, 10 a.m.