The Conseptizer programming language was originally called Polliwog, which stood for “POlliwog’s Like LIsp WithOut Gc”. Instead of a GC, it was supposed to use explicit region-management. It quickly drifted away a bit from Lisp by becoming stack-oriented, and finally I decided that there’s some things which are just so much more convenient with fully automatic memory management – especially when it comes to mutable state, and I certainly didn’t want to go the pure way.
I wrote an implementation that used a Mark&Sweep GC. But that hinders multi-threading and real-time applications (like games). Unless it’s a conservative GC, it also requires permanent attention in the implementation.
I wrote another implementation, this time using reference counting. But it still requires permanent attention in the C-code everywhere. Since extending the implementation (e.g. by adding library bindings or performance-critical application parts) is a common thing to do and messing it up can happen really easiely, which results in really ugly bugs, I was not satisfied.
Now I have decided to go back to the roots, for the sake of simplicity and the sake of trying something that as far as I know, nobody has really tried so far (because we all seem to not care that much about simplicity?). Region-based management has its advantages and disadvantages. In this article, I want to talk about both.
But first of all: What exactly are those regions I am using now? A region is a place where we can allocate memory. An object allocated in a pool can not be deallocated on its own. Instead, we deallocate the region as a whole as soon as we know that we don’t need any of those objects anymore. So region-based memory management is a form of semi-automatic memory management.
The advantages. First of all, allocation is cheap: We basically just have to increment a pointer (and check whether we’re at the end of a block, in which case we add a new one to the region). Deallocation is ultra-cheap: We just have to add the blocks of a region to a list of free blocks.
The implementation of this is trivial, every programmer can understand it (think about how it compares to the GC in a modern JVM in terms of complexity). There’s no problem with supporting soft real-time or multithreading. For global state, it’s like normal manual memory management (so global state is being discouraged by the language), but most of the time, you won’t have to think about memory management, since it is semi-automatic: As long as your code is purely functional or only has some local side-effects – which should be most of the time – it’s technically simple: All you need to do is insert a call to a certain subroutine in strategic places; this subroutine creates a new region, uses it while running a user-supplied piece of code and deallocates the region afterwards. Oh, maybe it also copies the return value of the user-supplied code back to the previous region (but that will also compact it like a Copying-GC, so that’s okay).
Regions increase locality of reference: When we allocate two new objects, they are guaranteed to be next to each other in memory (except when we’re at the end of a block, but that’s unavoidable). In this case, “next to each other” is to be taken literally – there just is no management overhead involved, while malloc() usually involves a rather large header before objects (which is mostly unavoidable if you want to be able to deallocate objects of different sizes individually).
If regions wouldn’t have any disadvantes, they would certainly be used more widely (though I think they deserve more attention than they’re getting usually). Since it is only semi-automatic, it requires some manual work and thus it’s possible to get it wrong. It can create both leaks and dangling pointers, though the latter can only happen when using mutable state crossing the boundaries of regions. In principle, there’s region-inference to avoid such issues, which can be done by compilers, but that has its own problems I won’t dive into here, as its clearly at odds with our philosophy of extreme simplicity anyway.
Regions can also increase memory consumption, as we have to pre-allocate a full block for each region in use. This will waste space when using lots of small regions, so you have to find the right places to insert the region-management in your code.
So we can see that regions are not the ultimate solution to our memory management headaches, but they are very simple and very efficient (actually they tend to be the most efficient way of all). I think it is worthwhile to try using explicity regions in a language supporting higher-order programming; this way, we might get a good compromise of convenience on the outside and simplicity on the inside.