Balanced GC performance improvements: Eden + heap sizing improvements

Open J9’s Balanced GC policy is a region based, generational collector, which aims to provide a bound on worst case pause times, by eliminating the need for frequent global garbage collections. A more detailed description of how Balanced GC policy works, along with some of its advantages over the Gencon GC policy, can be found here (this reading serves as important background information for this blog).


This blog post outlines the overhaul of the Balanced GC heap and eden sizing logic, as well as discusses the new command line option for Balanced GC, -Xgc:targetPauseTime, which allows users to set a soft pause time goal. Many of the details outlined in this post can be found in the following github issue: https://github.com/eclipse-openj9/openj9/issues/12054 – This article will provide a high level overview of the design, the implemented solution, and finally, what this all means for OpenJ9 users.

Background

Gencon

While the Balanced GC policy does have several advantages over Gencon, Gencon’s heap sizing logic, for both nursery and tenure areas, was previously far superior to Balanced GC. Gencon dynamically adjusts the size of the nursery and tenure areas to maximize the performance of the application. The size of nursery (red and blue regions in figure 1) is adjusted so that between 1% and 5% of the application runtime is spent in local GC (Scavenge) pauses; tenure space on the other hand is sized in order to be large enough to store the application’s live set, along with some extra room for tenured objects. These heuristics work very well in practice, and help make Gencon a premier collector. One weakness of Gencon’s heap sizing logic, is the fact that nursery cannot be resized if the heap has reached its maximum size. When the heap has reached its maximum size (-Xmx), Gencon can no longer resize the nursery or tenure areas of the heap to meet any of the GC overhead goals (the exact reasons why are somewhat beyond the scope of this article, but in essence, the mechanics of doing this are almost impossible). While not being able to resize nursery when the heap is fully expanded typically isn’t detrimental to performance, it can potentially leave a lot of performance gains on the table – this is where the new Balanced GC logic sees significant improvements over Gencon.

Generational heap - https://developer.ibm.com/articles/garbage-collection-tradeoffs-and-tuning-with-openj9/
Figure 1. Gencon GC heap layout

Balanced

The existing Balanced GC heap sizing logic is relatively straightforward. First, the GC attempts to resize the heap to meet -Xminf and -Xmaxf targets (between 30% and 60% of the heap free). Once these targets have been met, the heap sizing logic attempts to respect -Xmint and -Xmaxt goals (between 5% and 13% of time spent in GC pauses). This can cause several problems, all of which are outlined in the following github design issue https://github.com/eclipse-openj9/openj9/issues/12054.
Eden sizing is even simpler – 1/4 of the heap is set to be the eden size (unless specified by -Xmn/-Xmns/-Xmnx). This means that eden size is not calculated by its own set of heuristics, but rather, is highly dependant on the total heap sizing logic. In most cases, this works just fine, but, this leaves a lot of potential performance gains on the table.

Balanced GC uses a region based heap layout (see figure 2). Region based heaps are “logical” divisions of the heap, and are very flexible in nature. On one GC cycle, a specific region might be used as “eden”, while that same region might be used as “Survivor” on the next GC. Having the ability to dynamically increase/decrease the amount of regions used as “eden” regions, can have a significant impact on performance. This flexibility gives the Balanced GC policy several advantages in terms of heap/eden sizing, specifically when the heap is fully expanded – an area in which Gencon can’t compete due to technical limitations. These implications will be discussed in the following sections.

The diagram is explained in the surrounding text
Figure 2. Balanced GC heap layout

Case study

Before diving too deep into the work that was done for Balanced GC, it is important to understand some general performance tradeoffs when using copying collectors. In copying collectors (such as Gencon’s “scavenge” and Balanced GCs Partial Garbage Collection (PGC)), there is a very significant relationship between the size of the “New” space (“nursery” for Gencon, and “eden” for Balanced) and overall GC performance. “New” space, refers to an area in the heap where new objects are being allocated, and this is extremely important to GC performance due to the “Weak Generational Hypothesis”. A few key metrics for GC performance include: average GC pause time, the overall amount of time spent in GC pauses, and interval between GC pauses.

In order to better understand the relationship between “New” heap size, and GC performance, a set of experiments were performed with both the Gencon and Balanced GC policies. These experiments tested the effects of setting -Xmn to various sizes, while using very large -Xmx/-Xms values in order to remove any noise due to global collections. Figure 3 shows a small sample of the findings for SPECjbb® 2015* (see footnote) test runs with Balanced GC. It is worth noting that several additional tests were performed – this is just a small sample of the results in order to illustrate key relationships. The results below are taken from the steady state of a SPECjbb® 2015 run.

Eden size (-Xmn)Avg. PGC Pause time (ms)% of time in GC pausesAvg. Interval between PGCs (s)
1G2127.882.70
2G3626.755.41
8G6313.0221.64
Figure 3: Relationship between eden size, Average Local GC pause time, and Total GC pause time

The key relationships we can observe from these experiments are as follows:

  • As the eden size increases, the average local GC pause time also increases. The relationship here is logarithmic.
  • As the eden size increases, the total % of time spent in GC pauses decreases.
  • As the eden size increases, the interval between local GCs increases. This relationship is linear (very tight fit).

It is important to note that the increase in PGC pause time per increase in eden size, is highly dependant on the application. For example, the HyperAlloc GC benchmark only sees a very small increase in PGC pause time if eden is doubled, but sees a dramatic improvement in total time spent in GC pauses (about 50% less time). In SPECjbb® 2015 on the other hand (as illustrated in Figure 3), increasing eden size by a factor of 2, led to an increase in average PGC pause time by ~60%, but this also led to a significant reduction in total time spent in GC pauses. These relationships are very key to understanding the upcoming sections.

The new eden/heap sizing approach in Balanced GC

The work that was performed for Balanced GC is outlined in depth in this github issue. A high level summary of the changes is as follows:

  • Eden Sizing: Allow eden to be resized by stats relevant to eden size, including: PGC duration, percent of time spent in PGC pauses, and interval between PGCs.
    • When the heap IS NOT fully expanded, a certain set of heuristics should be used which assume that memory is “unlimited”.
    • When the heap IS fully expanded, a slightly different set of heuristics should be used, since there are now additional constraints (notably, a finite amount of free heap space)
  • Non-Eden Sizing: Allow non-eden/”tenure” to be resized based on a blend of Global Mark Phase (GMP) overhead and free memory percentage. This allows the total heap size to remain more consistent/stable after GMP points, where lots of memory is reclaimed.

Since the major performance enhancements are mostly related to the eden sizing logic changes, this will be the primary focus of the remainder of the article (along the newly introduced -Xgc:targetPauseTime command line option).
The total heap (i.e. non-eden) sizing logic will be left to the github design issue, which outlines the changes in more depth.

Why add target pause time?

As outlined in the case study, changing the eden size can massively reduce the percentage of time in PGC pauses, however, this can also result in very long average PGC pause times – which is not desirable. With that in mind, the need for a target pause time was necessary so that PGC pauses do not become too long. That being said, Balanced GC now has support for -Xgc:targetPauseTime option, which sets a “soft” pause time target. Adding this pause time target, means that Balanced GC should try to respect a pause time goal, all while still attempting to minimize the percentage of time spent in GC pauses. If the percentage of time spent in GC pauses (referred to as “GC overhead” from this point on) is too high, then the pause time target may be sacrificed a little bit in order to help the GC overhead get closer to its own target (between 2%-5%).


The default target pause time for Balanced GC is 200ms.

NOTE: -Xgc:targetPauseTime is also used for metronome GC policy, where it provides a hard pause time goal.

Eden sizing when the heap is not fully expanded

When the heap has not reached it’s max size (-Xmx), the eden sizing logic is relatively straightforward. If PGC overhead (ratio of time spent doing PGC vs total time) is above -Xgc:dnssExpectedTimeRatioMaximum (5% default), eden will grow; if it is below -Xgc:dnssExpectedTimeRatioMinimum (2% default), eden will shrink. This is very similar to what Gencon does. Where Balanced GC differs from Gencon is in the fact that it also tries to respect a target pause time goal. If PGC times are observed to be close to the target pause time, and the PGC overhead is high, then the eden sizing logic will make a decision on if the target pause time should be respected strictly, or if it should continue growing eden to try to get PGC overhead closer to -Xgc:dnssExpectedTimeRatioMaximum. The underlying logic is outlined in depth in the github design issue but here are a few examples of the logic in action (Figure 4. assumes a 200ms pause time, and a 2%-5% PGC overhead target).

PGC cpu overheadAvg pgc timeResultReason
10%100msExpand edenOverhead is too high, and we are below the pause time target; growing eden is fine
15%200msExpand edenOverhead is way too high; Expand eden a bit to try to reduce overhead, even though this will violate the pause time target
8%230msDo nothingDespite being above the pause time target, overhead is too high; do not change eden. This is a happy balance between the pause target and overhead goals.
10%300msContract edenDespite overhead being too high, PGC pause time is far above the target; shrink eden to try and get closer to target pause time (200ms)
3%170msDo nothingBoth the overhead and PGC pause time goals are being met
4%12msDo nothingPGC pause time is well below the target, and overhead goal is being met
1%100msContract edenOverhead is too low; shrinking eden will reduce memory footprint
Figure 4: Examples of eden sizing recommendations

A few key notes about this logic:

  • PGC pause times far below the target pause time will never cause eden to shrink
  • The reason we do not strictly adhere to the target pause time is because it may cause large increases to the PGC overhead. By balancing the satisfaction of these two criteria, the GC makes sure to get a good blend of both, rather than satisfying one goal at the potential expense of the other.
  • When eden grows/shrinks, the non-eden size remains essentially* the same . This causes the total memory footprint to increase/decrease accordingly.

*A very small change in non-eden size will likely occur to account for change in objects surviving PGC collections

Eden sizing when the heap IS fully expanded

When the heap has reached its maximum size (-Xmx), there are now additional constraints that eden must be aware of – notably, that it can no longer expand infinity due to a limit in max heap size. What the GC will do in this case, is size eden in order to minimize total GC (GMP + PGC) cost, while respecting the target pause time. In this situation, total memory is limited, so if eden grows, it will steal regions from non-eden, which keeps the total heap size the same. If eden grows too much (causing non-eden/”tenure” to be very small), GMP cycles will be triggered very frequently, leading to overall poor GC performance. On the other hand, if eden is too small, PGCs will be frequent, and relatively expensive, which leaves lots of potential performance gains on the table. Let’s dive a bit deeper into how the GC decides whether eden should grow or contract.

There are 3 key pieces of information that are used to predict which eden size will give the best total GC performance. These relationships can be extrapolated from the experiments from figure 3.

  1. The time interval between 2 consecutive PGCs (figure 7): This is directly proportional to the size of eden. If eden doubles in size, then the time interval between PGCs will also double (assuming steady allocation rate).
  2. The number of PGCs per GMP cycle (figure 6): Assuming a relatively constant tenuring/survival rate of objects being copied from Eden (which tends to be the case in practice), if Eden doubles in size (in other words, “non-eden” size is halved), then there will be approximately 2x fewer PGCs per GMP cycle. As previously mentioned, this is because eden will steal regions from the non-eden space if eden grows (effectively shrinking tenure).
  3. The average PGC time (figure 8): This is by far the most complex variable to analyze. For some applications, increasing eden may result in significant increases in pause times, whereas for others, it might only result in very small increases in pause times. The new eden sizing logic is able to predict how much the average PGC pause time will change for a specific change in eden size. This model/prediction is self-tuning, and is adjusted throughout the lifetime of the application.

Combining these 3 pieces of information, the following model was created, which predicts how the total GC overhead (PGC + GMP costs) will change, with respect to a change in Eden size. With this model, the optimal eden size can be found, which will minimize total GC cost. This formula is described in figure 5 below:

image
figure 5: relationship between change in eden size (X), and predicted GC overhead (O(x)). tglobal is the time taken for GMP work to be done

In short, the model from figure 5 is “predicted total time spent doing GC work” / “predicted total time interval”.

The model from figure 5 uses multiple utility functions, which are shown in figure 6-8, used to predict how PGC time, interval between PGCs, and PGC/GMP ratio will change as a function of change in eden size.

Figure 6: relationship between change in eden size (x), and predicted number of PGC collections per GMP cycle. nobsLocalGc represents the current number of PGCs per GMP cycle, and sfreeTenure represents the current amount of free tenure/non-eden memory

Figure 7: relationship between change in eden size, and time interval between successive PGCs. iobsLocalGc represents the current interval between PGCs, and seden represents current eden size.

Finally, the most complex relationship to analyze, is how PGC pause time will be affected by a change in eden size. This model is given by the function in Figure 6. Note that seden is the current size of eden, tminEden is the minimum amount of for PGC collection when eden is as small as possible (5ms), and tlocal is the last observed PGC time.

Figure 8: tlocalGCPred (x) is the relationship between a change in eden size (x) and the predicted PGC pause time.

If we graph the equation given in figure 8, figure 9 is the result. Note: figure 9 is showing eden as being 1.15GB, current PGC pause time is 130ms, and 1.9GB of free tenure.

Figure 9. PGC time (y axis, in ms) as a function on change in eden size (x axis, in GB’s)

We can see that as eden size increases, the PGC time increases at a slower and slower rate. If we shrink eden by 1GB (making eden ~150Mb), the average PGC time will be ~25ms. On the other hand, if we grow eden by 1Gb (making eden 2.15GB), the average PGC time will be ~180ms. This logarithmic relationship is a significant reason behind total GC overhead decreasing if eden size increases.


With all of these relationships, we can graph the equation from figure 5, which produces figure 10.
Note: the graph below is configured with eden currently at 1.15GB in size, and 1.9GB of free tenure space – the exact shape of this graph is highly dependant on the current PGC time, and interval, as well as GMP cost as per the equation in figure 5.

Figure 10: GC overhead as a function of change in eden size. X axis represents change in eden size (in GB), while the Y axis represents GC overhead.

If we look at the graph in figure 10, it is telling us that the “ideal” change eden size (the size which will minimize total GC overhead, ie y’ = 0), is to increase eden by 1GB! In this case, the total GC overhead will be reduced from 8% (current) to 7% (predicted).

A few key notes about this model:

  • As the change in eden size approaches -1GB (in this case, causing eden to be very tiny), GC overhead skyrockets. When eden is very very small, PGC happens very frequently, and is quite costly.
  • As the change in eden size approaches +1.9GB (exhausting all free heap space), GC overhead skyrockets. This is because GMP will become so frequent, that it incurs a massive overhead cost.

Similar to the eden sizing logic when the heap is not fully expanded, this logic will also keep in mind the target pause time, and attempt to respect it. If the GC overhead is far away from the target, then the target pause time may be sacrificed a bit in order to reduce total GC overhead. The exact mechanics of this are discussed in the github design issue .

What does this mean for Openj9 users?

These new changes mean that users can expect a few performance changes when using the Balanced GC policy. The degree to which the following points actually vary/improve is application dependant.

  • Less total time spent in GC. This is dependant on the application, but anywhere up to 50% reduction in total GC time can be expected. The Hyperalloc GC benchmark saw total pause time reductions by 50-70%, with a very similar memory footprint. For SPECjbb® 2015, reductions in total GC overhead (depending on configuration), can be between 10%-30%.
  • Better performance out of the box. As with most generational collectors, it takes a bit of time to fine tune the -xmn/-xmns/-xmnx options to optimize GC performance. If eden size is too big, then GMP becomes very costly. On the other hand, if eden is too small, then PGC will be very expensive, causing the overall GC performance to not be optimal. These changes mean that the eden tuning is done automatically, and can even react to changes in the applications live set and allocation rate.
  • The ability to fine tune your average GC pause time length in Balanced GC through -Xgc:targetPauseTime.
    If you have a high tolerance for longer pauses, and want to focus on minimizing total time spent in GC (i.e., maximizing throughput), increasing the target pause time will help achieve that goal. Conversely, if you want to shorten your target pause times, this can also be specified (possibly at the expense of some throughput)
  • More efficient memory use: In many cases, smaller heaps can be used, and give similar (or better) performance. In cloud environments, smaller heaps (aka, memory usage), can lead to lower costs.

Example performance improvements

Examples of performance improvements can be seen in github design issue, (SPECjbb® 2015) https://github.com/eclipse-openj9/openj9/issues/12054#issuecomment-786897105 and (HyperAlloc GC Benchmark) https://github.com/eclipse-openj9/openj9/issues/12054#issuecomment-788205951

Here is a brief summary of SPECjbb® 2015 results.

SPECjbb® 2015: Baseline (old logic)

Figure 11: SPECjbb® 2015 Baseline run in PRESET mode, 4000IR

SPECjbb® 2015: New Logic, -Xgc:targetPauseTime=200

image
Figure 12: SPECjbb® 2015 with new heap/eden sizing logic, PRESET mode, 4000IR, and 200ms pause time target

SPECjbb® 2015: New Logic -Xgc:targetPauseTime=500

image
Figure 13: SPECjbb® 2015 with new heap/eden sizing logic, PRESET mode, 4000IR, and 500ms pause time target

Results Summary

statisticBaseline200ms pause target500ms pause target
% of time spent in GC pauses11.21% 11.02%8.76%
% of time spent in GC pauses (steady state)10%-12% 9.75%~7.3%
Average pause time in steady state180-220ms230ms450ms
Heap size in steady state3.5G – 5.5G3.5G6G
Eden size in steady state0.8G – 1.1G1.3G3.5G
Number of PGCs727651315
Number of GMP cycles13127
Number of GMP collections684820
Figure 14: SPECjbb® 2015 PRESET mode, 4000IR summary table

By specifying a 200ms targetPauseTime, the eden sizing logic decided to compromise pause times by about 30ms, in order to reduce the percentage of time spent in GC pauses. Not only did this run slightly improve the percentage of time spent in GC pauses, but it sized the heap to a steady 3.5G – a reduction by ~1G (25%) compared to the baseline.

When 500ms was specified as the targetPauseTime, eden sizing logic drove eden to increase to about 3.5G (about 3x larger than in the baseline), which helped reduce the time spent in GC pauses by ~25%.

Heapothesys tests provided even better results, which are briefly summarized here. Extensive performance summary for Heapothesys can be found at https://github.com/eclipse-openj9/openj9/issues/10721

Summary

The new approach for eden sizing in Balanced GC is a significant improvement to the old logic. Not only does it stabilize the total heap size in many cases, but it reduces GC overhead, and makes more efficient use of heap space. Additionally, the introduction of the -Xgc:targetPauseTime option for Balanced GC gives OpenJ9 users an extra tool when it comes to fine tuning the performance of their java applications.

Special thanks to Aleksander Micic for his guidance, suggestions, and key ideas throughout the entire design and development process, as well as to Dmitri Pivkine for sharing his wealth of GC expertise throughout the development process.



*All SPEC references in the above article are referring to products owned by Standard Performance Evaluation Corporation (SPEC) http://spec.org/


Leave a Reply