The Quarkus framework has gained tremendous popularity in the last few months. This success is due in part to the runtime characteristics expressed as Supersonic, Subatomic Java on the project’s website. In other words, Quarkus is extremely fast to start up and extremely light in terms of memory footprint when run in “native mode” (native mode is essentially when the Quarkus application compiled to a statically compiled executable). Even when run in a traditional JVM mode, comparisons between Quarkus and a SpringBoot cloud native stack showed that Quarkus was much faster to start up and far more memory efficient.
In this blog post, we show that Quarkus run in JVM mode retains its substantial performance advantages against a SpringBoot cloud native stack when run on the Eclipse OpenJ9 JVM (since there is no OpenJ9 native image mode, the comparisons are limited to JVM mode) across different hardware platforms. In our experiments we also added a shared classes cache into the Docker image in order to show the impact of this OpenJ9 technology on startup time and memory footprint.



Recently, multi-layered shared classes cache functionality was added to OpenJ9 specifically to map onto Docker’s layered architecture for applications. This feature allows a user to right size their shared classes cache and add into the Docker layer that they want without impacting any other Docker layer in any way.
All the graphs shown in this blog post only show relative performance (both in the case of start up time and memory footprint) between the different configurations since the absolute numbers can vary widely depending on the specific hardware being used. All of the presented data was collected by pinning to a single core on the different platforms.
Startup time
For purposes of this blog post the startup time is defined to be the time taken for the server application to respond to the first request. In the graphs, the startup times have all been normalized and therefore are not absolute startup time values.
The below results show a substantial (more than 2X) startup time advantage from using Quarkus over using a SpringBoot cloud native stack for a simple REST application using OpenJ9 whether one is using a shared classes cache or not. OpenJ9’s shared classes cache feature provides a 20-30% reduction in Quarkus startup time across the different platforms. In these runs, a shared classes cache was generated by running the application once and then packaged (i.e. copied) into the Docker image for the application.

The below results also show a substantial (more than 2X) startup time advantage from using Quarkus over a SpringBoot cloud native stack for a simple REST + CRUD application using OpenJ9 whether one is using a shared classes cache or not.

Further, we show that the picture on the Power Systems and IBM Z platforms is also very similar to what is seen on X86 reinforcing the point that Quarkus runs just as well on those platforms as it does on X86. The OpenJ9 JVM provides leading edge performance on Power Systems and IBM Z platforms and is used by many customers in production and should also be an excellent choice for Quarkus users on those platforms.




Docker behaviour
During our experimentation to collect these results, we noticed a couple of behaviours related to Docker that we think are interesting to call out for Java users. Startup time for a framework as lightweight as Quarkus inside Docker can be substantially higher than on bare metal. This is due to two distinct factors slowing down startup time inside Docker containers:
- Time taken to start up the Docker container itself (independent of starting up the Java application) : This time can vary quite a bit for the different server machines that we tried, e.g. we have seen it vary between 0.5 seconds and 1.4 seconds in our experiments, but on a given server machine, the time taken to start up the Docker container itself was relatively similar from run to run. For a Quarkus application running inside Docker, the time to start up the Docker container can be a substantial proportion of the overall startup time experienced by the end user, especially for simple applications.
- Slower disk I/O due to Docker’s layered filesystem compared with accessing a host directory that is mounted into the Docker image: A shared classes cache is stored as a memory mapped file that gets mapped into shared memory. There are two typical ways to set up the shared cache within a Docker environment.
- Packaging shared cache in the Docker image: The cache is always available regardless of which host the container is scheduled to run on since it is included in the Docker image. The cache is accessible via the layered filesystem in Docker.
- Accessing the shared cache via a mounted host directory/volume: The shared cache may not be available unless it is placed on the host filesystem before the container is scheduled to run. The file is not part of the Docker image and so it does not need to be accessed via the layered filesystem in Docker.
We have found that there can be quite a difference in the effectiveness of the OpenJ9 shared classes cache from a startup time perspective depending on how the shared classes cache is accessed by the application. Packaging an OpenJ9 shared classes cache into the application Docker image only provides about half the benefit of accessing the shared classes cache from an externally mounted host directory/volume holding the shared classes cache. This is unfortunate since packaging an OpenJ9 shared classes cache into the Docker image is the preferred way to use the shared classes cache in containers. This approach is preferred since it offers a more portable solution in that a shared classes cache is always available regardless of which node the container is scheduled to run on. Of course, the slower disk I/O from the layered filesystem does not only affect the OpenJ9 shared classes cache; other application (e.g. configuration) files that are stored in the Docker image (and therefore using the layered filesystem) can also slow down the startup time substantially since these artifacts on disk are typically heavily accessed early in the application run.
Memory Footprint
Quarkus is very memory efficient compared to other traditional (e.g. SpringBoot) cloud native stacks and this can translate into better application density on the cloud leading in turn to much higher cumulative throughput for a given memory envelope. Efficient utilization of resources, whether it be memory or CPU is really the name of the game from a cloud economics perspective. It is easy to see why Quarkus is particularly attractive as a cloud runtime.
OpenJ9 default behaviour is also quite conservative both from a Java heap memory and native memory perspective due to the internal heuristics related to Java heap memory growth and the more efficient management of memory segments used for JIT compilations.
The below results show a solid memory footprint advantage from using Quarkus over using a SpringBoot cloud native stack when running on OpenJ9 for the two simple applications we have experimented with so far in this blog post. In the graphs, the memory footprints have all been normalized and therefore are not absolute memory footprint values.


As was the case with the startup time comparison, the memory footprint picture on Power Systems and IBM Z has significant similarities with the picture on X86 again.




Conclusion
We shared some interesting Docker behaviour which may be of particular interest to performance engineers comparing startup time performance in Docker against bare metal. However, the main take-away is that Quarkus delivers significant startup and memory advantages, that coupled with the strengths of Eclipse OpenJ9, can power your Java applications across X86, Power Systems and IBM Z.
Special thanks to Joshua Dettinger and Marius Pirvu as contributing authors to this content.
I am surprised that the shared cache instances have a bigger memory footprint than the non-shared ones. What would be the explanation for this?