Allocating Memory in the Compiler

So you want to work on the JIT compiler in OpenJ9 but it isn’t obvious how use the Compiler’s Memory Manager? Then this is the blog post for you!

Background

It is important to note that because OpenJ9 is built as an extension of Eclipse OMR (henceforth referred to as OMR), the Compiler Memory Manager in OpenJ9 is also an extension of the Compiler Memory Manager in OMR. It is worth reading the documentation here and here. You might have noticed the term “Compiler Memory Manager” as opposed to “JVM Memory Manager” or just “Memory Manager”. This is because the Compiler manages its own memory.

There are a few reasons for why the Compiler does so. Using an allocator provided by the Port Library, such as j9mem_allocate_memory, is expensive for Compiler’s allocation patterns. Because the Port Library is not in the same shared library as the Compiler, making too many of these cross component function calls can have a noticeable negative impact on compilation time. Additionally, there are too many participants for informal trust, book-keeping can get expensive, and attempting to optimize the Memory Manager for divergent component use-cases is difficult or even impossible (GC vs VM vs JIT memory allocation patterns are wildly different), which negatively affects memory footprint.

Thus, in order to minimize compilation time and memory footprint, the Compiler manages its own memory, only allocating large segments as and when it needs to. However, in order to ensure that all memory in the JVM is accounted for, as well as to minimize the likelihood of memory leaks, the OpenJ9 Compiler Memory Manager uses the Port Library APIs to allocate the memory it manages, instead of directly using OS APIs.

Scratch vs Persistent Memory

The OpenJ9 JIT can be, loosely speaking, split into two main components: the JIT Compiler and the JIT Runtime. The JIT Compiler component deals with compiling Java bytecodes into native assembly whereas the JIT Runtime component deals with compilation control, runtime assumptions, code/data cache management, memory management, etc. The data these two components work with have different lifetimes; therefore, conceptually, the Compiler Memory Manager provides the ability to allocate two different kinds of memory:

  1. Scratch Memory. This refers to memory required to perform the compilation (eg. representing IL, CFG, Instructions, etc.). This memory is allocated by the JIT Compiler component, and is released at the end of a compilation. There is a Scratch Space Limit specified for each compilation; reaching this limit causes the compilation to abort, possibly retried at a lower optimization level. There are two sub-types of Scratch Memory:
    1. Heap Memory is just dynamically allocated memory, analogous to new or malloc.
    2. Stack Memory is NOT memory on the C/C++ stack. It is also dynamically allocated memory, but memory that is allocated using Stack Mark / Stack Release Semantics (described below).
  2. Persistent Memory. This refers to memory that persists throughout the lifetime of the JVM (eg. Class Hierarchy Table entries, IProfiler data, etc.). This memory is generally allocated by the JIT Runtime component.

If you take a look at TRMemory.hpp in OMR, you will see a bunch of #defines. These are used to add a bunch of placement new overrides into various classes, as well as a per-class version of jitPersistentAlloc/jitPersistentFree. These defines are added to a class via the TR_ALLOC macro, which takes a TR_MemoryBase::ObjectType parameter. This is used to keep track of persistent memory allocations, which helps debug memory leaks. There also exists the TR_Memory class; as described here, it is the old way through which memory in the Compiler was allocated.

Thus, you might see (old) code where memory is allocated using TR_Memory::allocateMemory as well as calls to per-class versions of jitPersistentAlloc/jitPersistentFree. The problem with these approaches is that it requires one to manually allocate memory and then construct objects or cast data onto the memory. The aforementioned #defines do, however, provide placement new overrides to minimize the need for a malloc like API.

In general, however, new code should use C++ Standard Library Template (STL) containers; in keeping with C++ best practices, explicitly allocating memory should be avoided as much as possible.

Allocating Scratch Memory

There are four main ways of allocating Scratch Memory:

Compilation Allocator

This approach is used to allocate memory from the pool of memory that will exist until the end of the compilation. However, because this approach does not yield a C++ Allocator, it needs to be wrapped in a TR::typed_allocator via a call to getTypedAllocator. This extra step exists because of the CS2 structures present in the code; these structures need the use of the CS2 Allocators. Eventually, all this should be simplified to just use a TR::Region. To get an instance of this allocator, do:

TR::Allocator compAllocator = comp()->allocator();

To initialize a C++ container with this allocator, you would do something like:

_jniCallSites(getTypedAllocator<TR_Pair<TR_ResolvedMethod,TR::Instruction> *>(TR::comp()->allocator()));

Note the need for getTypedAllocator.

Heap Memory Region

This approach is also used to allocate memory from the pool of memory that exists until the end of the compilation. The difference is that this approach will yield a TR::Region, which has an automatic conversion (ie. it wraps it in a TR::typed_allocator), so there’s no need to explicitly do so when using C++ containers. The trade-off is, in order to get an instance, you will need to do something like:

TR::Region &heapMemoryRegion = comp()->region();

which is less canonical than comp()->allocator().

To initialize a C++ container with this allocator, you would do something like:

_jniCallSites(comp()->region());

As an aside, comp()->trMemory()->heapMemoryRegion() is equivalent to comp()->region().

TR::Region

This is used when you want Region Semantics. TR::Region is an object that defines a region from which memory can be allocated; when the TR::Region object is destroyed (either explicitly, or more commonly, when it goes out of scope), all memory that was allocated through it gets freed. The above TR::Compilation::region() is actually just a TR::Region that does not get destroyed until the end of a compilation. As mentioned before, TR::Region has an automatic conversion operator which means it can be used as a C++ allocator. In order to create a new region, simply pass in a reference to an existing region. Usually this “existing region” will just be TR::Compilation::region():

TR::Region myRegion(comp()->region());

However, there’s nothing stopping you from building your own nested regions, though if you’re going to do so, you should make sure the lifetimes of the objects allocated in the various nested levels are extremely well understood.

To initialize a C++ container with this allocator, you would do something like:

TR::Region myRegion(comp()->region());
_jniCallSites(myRegion);

TR::StackMemoryRegion

This is used when you want Stack Semantics (Stack Mark / Stack Release); for more details, see this. TR::StackMemoryRegion extends TR::Region, and so has all the same properties. To perform a Stack Mark, simply create a new TR::StackMemoryRegion:

TR::StackMemoryRegion stackMemoryRegion(comp()->trMemory());

A Stack Release occurs automatically when the TR::StackMemoryRegion goes out of scope. TR::StackMemoryRegion exists to aid in creating nested regions, which can be particularly useful if you wish to have a scratch space for memory needed during some optimization. For example:

int32_t TR_AmazingNewOpt::perform()
   {
   // Stack Mark
   TR::StackMemoryRegion stackMemoryRegion(comp()->trMemory());

   // Do optimization
   ...

   // Automatic Stack Release
   return ...;
   }

All memory allocated during TR_AmazingNewOpt is limited to the optimization. Any data that needs to persist longer than the optimization can just be allocated using the Heap Memory Region.

To get the current Stack Memory Region:

TR::StackMemoryRegion &currentStackRegion = comp()->trMemory()->currentStackRegion();

Allocating Persistent Memory

There are two ways to get an allocator for persistent memory when using a C++ container.

Persistent Allocator

TR::PersistentAllocator should be used when trying to allocate persistent objects whose class does not have TR_ALLOC in its definition. To get an instance of this allocator, do:

TR::PersistentAllocator &allocator = TR::Compiler->persistentAllocator();

This allocator also contains an automatic conversion operator which allows it to be used as a C++ allocator

Typed Persistent Allocator

TR_TypedPersistentAllocator should be used when trying to allocate persistent objects whose class does have TR_ALLOC or TR_ALLOC_SPECIALIZED in its definition (see this for more details). The difference between the two macros (specifically with respect to TR_TypedPersistentAllocator) is, the former will result in the amount of memory allocated incremented in the UnknownType bucket, whereas the latter will result in the increment occurring in the bucket of the type specified by the macro. The reason for this distinction is to provide an easy way of measuring the JIT Shared Library Memory Footprint cost of switching from TR_ALLOC to TR_ALLOC_SPECIALIZED (since the latter defines a template class; take a look at TRMemory.hpp if you’re curious).

TR_TypedPersistentAllocator is a wrapper around TR_PersistentMemory, which enables tracking of persistent allocations per type (the jitPersistentAlloc/jitPersistentFree APIs are also wrappers around TR_PersistentMemory). It also contains an automatic conversion operator which allows it to be used as a C++ allocator. To get an instance, you would do something like:

MyClass::TrackedPersistentAllocator allocator = MyClass::getPersistentAllocator();

Usually you won’t need to explicitly assign the allocator to some variable, eg:

std::list<MyClass, MyClass::TrackedPersistentAllocator> myClassList(MyClass::getPersistentAllocator()));

unless you wish to track the memory allocated for something in the same bucket as the object the allocator came from, eg:

MyClass::TrackedPersistentAllocator allocator = MyClass::getPersistentAllocator();
std::list<MyClass, MyClass::TrackedPersistentAllocator> myClassList(allocator);
std::list<MyClassMetadata, MyClass::TrackedPersistentAllocator> myClassMetadata(allocator);

Explicit Allocations

As previously mentioned, explicitly allocating memory is not the preferred approach. Interactions with legacy code might require doing so, but attempts should be made in all new code to use C++ containers, or at the very least, minimize or localize the use of explicit allocations/frees. However, for the sake of completeness, and in order to give a sense of what you might find in older code, this section outlines how to explicitly allocate both Scratch and Persistent Memory.

Scratch Memory

To explicitly allocate memory for an object that has the TR_ALLOC macro specified in its class’ definition, you would do something like:

TR_MyClass *myClass = new (comp()->trHeapMemory()) TR_MyClass(...);

or

TR_MyClass *myClass = new (comp()->trMemory()) TR_MyClass(...);

or

TR_MyClass *myClass = new (comp()->region()) TR_MyClass(...);

or

TR::Region myRegion(comp()->region());
TR_MyClass *myClass = new (myRegion) TR_MyClass(...);

or, for Stack Memory allocations:

TR_MyClass *myClass = new (comp()->trStackMemory()) TR_MyClass(...);

or

TR::StackMemoryRegion stackMemoryRegion(comp()->trMemory());
TR_MyClass *myClass = new (stackMemoryRegion) TR_MyClass(...);

or

TR_MyClass *myClass = new (comp()->trMemory()->currentStackRegion()) TR_MyClass(...);

If the class does not have the TR_ALLOC macro, then you would do something like:

void *storage = comp()->trHeapMemory().allocate(sizeof(TR_MyClass));
TR_MyClass *myClass = new (storage) TR_MyClass(...);

or

void *storage = comp()->trMemory()->allocateHeapMemory(sizeof(TR_MyClass));
TR_MyClass *myClass = new (storage) TR_MyClass(...);

or

void *storage = comp()->trMemory()->allocateMemory(sizeof(TR_MyClass), heapAlloc);
TR_MyClass *myClass = new (storage) TR_MyClass(...);

or

TR::Region myRegion(comp()->region());
void *storage = myRegion.allocate(sizeof(TR_MyClass));
TR_MyClass *myClass = new (storage) TR_MyClass(...);

or, for Stack Memory allocations:

void *storage = comp()->trStackMemory().allocate(sizeof(TR_MyClass));
TR_MyClass *myClass = new (storage) TR_MyClass(...);

or

void *storage = comp()->trMemory()->allocateStackMemory(sizeof(TR_MyClass));
TR_MyClass *myClass = new (storage) TR_MyClass(...);

or

void *storage = comp()->trMemory()->allocateMemory(sizeof(TR_MyClass), stackAlloc);
TR_MyClass *myClass = new (storage) TR_MyClass(...);

or

TR::StackMemoryRegion stackMemoryRegion(comp()->trMemory());
void *storage = stackMemoryRegion.allocate(sizeof(TR_MyClass));
TR_MyClass *myClass = new (storage) TR_MyClass(...);

or

void *storage = comp()->trMemory()->currentStackRegion().allocate(sizeof(TR_MyClass));
TR_MyClass *myClass = new (storage) TR_MyClass(...);

Technically these APIs (both the placement new overrides and the function calls) also take a default parameter of type TR_MemoryBase::ObjectType. However, because currently per-type Scratch Memory allocations are not tracked, the type can be omitted.

Persistent Memory

To explicitly allocate memory for an object that has the TR_ALLOC macro specified in its class’ definition, you would do something like:

TR_MyClass *myClass = new (PERSISTENT_NEW) TR_MyClass(...);

or, within TR_MyClass:

TR_SomeData *someData = (TR_SomeData *)jitPersistentAlloc(sizeof(TR_SomeData));

If the class does not have the TR_ALLOC macro, then you would do something like:

void *storage = comp()->trMemory()->allocateMemory(sizeof(TR_MyClass), persistentAlloc);
TR_MyClass *myClass = new (storage) TR_MyClass(...);

or

void *storage = TR_Memory::jitPersistentAlloc(sizeof(TR_MyClass));
TR_MyClass *myClass = new (storage) TR_MyClass(...);

There exist TR_Memory and per-class versions of jitPersistentAlloc/jitPersistentFree. The TR_Memory version will increment, in the UnknownType bucket, the amount of memory allocated, unless the TR_MemoryBase::ObjectType is specified, eg:

void *storage = TR_Memory::jitPersistentAlloc(sizeof(TR_MyClass), TR_MemoryBase::ObjectTypeOfMyClass);
TR_MyClass *myClass = new (storage) TR_MyClass(...);

The per-class version will increment the type as specified in the parameter of TR_ALLOC. Any call to jitPersistentAlloc/jitPersistentFreefrom within a class that has TR_ALLOC specified in its definition will automatically use the per-class versions.

Conclusion

And that’s it! Hopefully this post helped you better understand memory allocation in the Compiler. There are more advanced topics, such as how to create a TR::Region outside a compilation (for example if you wished to create a region using persistent memory), and what all the other memory structures are for (eg, TR::SegmentAllocator, TR::SegmentProvider, etc), but that’s best left for another blog post. If you have any questions, concerns, ideas for improvements, etc., feel free to start a conversation in the OpenJ9 Slack Instance, or via the OpenJ9 Mailing List.

Leave a Reply