Arraylets: What are They?

Garbage Collectors (GCs) are responsible for allocating and deallocating objects making sure the running application run as smoothly as possible. There are many types of GC configurations out there, mark sweep, generational, region based, concurrent, parallel, reference counting [1] and a combination of these. The way we represent objects in each of these configurations, also known as policies in OpenJ9, can highly affect the JVM performance. In OpenJ9 we have generational collectors: gencon and balanced; mark sweep collectors: optthruput and optavgpause; and region based collectors: balanced and metronome. The purpose of this blog is to explain how we represent arrays in region based collectors.

In most GC configurations, objects are normally allocated in a single contiguous block of memory. With the exception of arrays, objects tend to be small, between 32 and 100 bytes. Arrays, on the other hand, scale with the number of elements in the array, and can reach kilobytes or more. In most GC configurations we would still allocate such large arrays in a contiguous fashion; however, in the case of balanced and metronome, we came up with clever way to store arrays. As mentioned before, these two GC policies are region based GCs, meaning, the heap is divided into logical regions where objects are allocated. If the application (mutators) tries to allocate an object larger than a region an OutOfMemoryError (OOM) is thrown. To support large arrays the GC divides the array into region-sized chunks to accommodate the allocation. This is where arraylets come in.

Every array, in either balanced or metronome policies, is represented as an arraylet.  An arraylet’s layout depends on its size. If the entire arraylet can fit within a region, then it will have a contiguous layout. If the arraylet cannot fit into a single region, then it will have either a hybrid or discontiguous layout.

Note that I’m assuming we’re running in compressed references (class and object references are 32 bits long) and using a 64bit architecture. A contiguous arraylet layout looks like the following:

Figure 1: Contiguous Arraylet Layout

Contiguous arraylet are the most common type of array and it’s header is composed of a class field, which contains object information and size field which contains the number of elements in the array. Its data elements come right after the header as depicted in the figure. Now, if the data associated with the array is too large to fit into a single region, we divide the data into region-sized chunks. Each of these chunks is called an arraylet leaf, which are referenced by arrayoids (references that points to arraylet leaves). While contiguous arraylets contain the elements alongside its header, non-contiguous layout contains pointers to the array data instead.

The first type of chunked arraylet is the discontiguous arraylet layout which is illustrated below:

Figure 2: Discontiguous Arraylet Layout

The discontiguous arraylet header consists of the fields: class, mustBeZero, size, and padding if necessary (required in compressed references and not required for full references where class field is 64 bits). The mustBeZero field is used to differentiate between contiguous and non-contiguous arraylets.  The mustBeZero field is at the same location as the size field of a continuous array allowing us to easily distinguish the type of array: a zero sized array means we are dealing with a non-continuous array and need to look at the second size field to determine the real size.

The size of a discontiguous arraylet data in bytes must be a multiple of the region size; meaning, the entire arraylet data fits perfectly in the regions: arraylet_data_size_in_bytes % region_size == 0. E.g. region_size = 64KB, and arraylet_data_size_in_bytes = 192KB, results in an arraylet that can occupy three 64KB regions with no leftover data. As you can observe, in the figure 2, the size of the arraylet was large enough to occupy three regions, resulting in 3 arrayoids that points to its respective arraylet leaves.

What happens if the array data is larger but not a multiple of region size? That results in a third layout called hybrid arraylet layout:

Figure 3: Hybrid Arraylet Layout

Hybrid arraylets are an optimization over discontiguous arraylet for the case where data does not fit perfectly into a number of regions. In this case, there are some leftover data, which instead of allocating an entire region for the leftover, we put it alongside the arraylet header. E.g. region_size = 64KB, and arraylet_data_size_in_bytes = 200KB, results in an arraylet that can fully occupy three 64KB regions and the leftover 8KB is placed right after the arrayoids as depicted in the above picture (arraylet_data_size_in_bytes % region_size != 0). Instead of occupying an entire region with the leftover data, we place it right after the header. Also note that we have 4 arrayoids in this example because the last arrayoid always points to the data which resides alongside the header .

Lastly, we have a special case, of zero sized arrays:

Figure 4: Zero sized Arraylet

You may ask, why zero sized arraylets have the same layout as hybrid and discontiguous arraylets? That’s because of the mustBeZero field. If this field is zero we treat the arraylet as discontiguous which leaves no choice for the zero sized arraylets but to be represented as a discontgiuous layout, in which case both mustBeZero and size fields would be equal to zero as depicted in the above figure.

In summary, OpenJ9 region based collectors, balanced and metronome, uses a clever way to store arrays called arraylets. Depending on its size, arraylets can have different layouts, they are: contiguous, hybrid and discontiguous. If an array is small enough to fit within a region, then the arraylet will have a contiguous layout illustrated in figure 1. On the other hand, if the array is large enough that it cannot fit withing one single region, the arraylet will have wither an hybrid or discontiguous layout (figures 2 and 3 respectively). The main different between these two, is that if the entire data of array fits perfectly into a number of regions, arraylet_data_size_in_bytes % region_size == 0 , then the layout would be discontiguous, and all other cases hybrid. Lastly, there’s a special case where zero sized arraylets must be treated as discontiguous arraylet because of the way we use the field mustBeZero to distinguish between contgiuous and discontiguous layouts.

We also recommend checking this blog post, where we discuss how we can further improve large array object management through double mapping.

Leave a Reply