Save Money with JITServer on the Cloud – an AWS Experiment

With contributions from Marius Pirvu.

What is JITServer Technology?

JITServer Technology is an Eclipse OpenJ9 JVM technology that decouples the JIT compiler from the JVM and lets the JIT compiler run remotely in its own process. This technique keeps the advantages of JIT compilations while eliminating their negative effects such as CPU and memory spikes, which could result in spurious out-of-memory events. JITServer improves application density by reducing the peak memory usage of the JVM, resulting in smaller container sizes, and thus reduced overall costs. Moreover, JITServer substantially simplifies the provisioning of applications, allowing the user to focus on the CPU and memory needs of the Java application. This mechanism increases the robustness of Java applications, as an intermittent crash in the JIT compiler will not bring down the JVM.

Why Should You Use JITServer in a Cloud Environment?

JITServer advantages are mostly seen in constrained environments such as the ones in the cloud which has limited resources for your application and the JIT compiler. You might be wondering how JITServer can save you money in the cloud; this will be made apparent as we take you through performance experiments done on JITServer versus Default configuration in the AWS Cloud Platform.

JITServer on Amazon Web Services (AWS)

We decided to run our experiments in RedHat OpenShift on Amazon Web Services (ROSA) as it is a fully managed OpenShift service that allows users to quickly build an OpenShift cluster [1]. We chose AWS since it is the most broadly adopted cloud platform [2] and to illustrate that JITServer is not bound to IBM Cloud.

First, we will walk you through the steps needed to provision an OpenShift Cluster using ROSA, and then we’ll describe the experiments.

Prerequisites to use ROSA in AWS:

Before you install the OpenShift cluster you will need to follow the following prerequisites:

  1. Create an AWS account
    • If you are planning to use an AWS organization account, you will need Service Control Policy (SCP) applied to the AWS account. You can view the minimum required SCP here.
    • Keep note of the following information from your AWS account: AWS IAM User, AWS Secret Access Key, and AWS Access Key ID.
  2. Create a Red Hat account
    • You can create a Red Hat account here.
  3. Install the ROSA CLI and OpenShift CLI
    • Install the ROSA CLI by clicking here.
    • Install the oc CLI by running rosa download oc in the terminal or by following the instructions here.
  4. Install and configure the AWS CLI
    • You can configure the AWS CLI by running aws configure in the terminal. Then enter your AWS Access Key ID and AWS Secret Access Key that you kept note of from your AWS account. For example:
root@flood1:~# aws configure
AWS Access Key ID [None]: AKXXXXXXXXXXXXXXXXXX
AWS Secret Access Key [None]: BXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
Default region name [None]: us-east-2
Default output format [None]: table
  1. Login using ROSA CLI into your Red Hat account
    • You can login by typing rosa login in the terminal. This will bring you to this web page where you can then login, copy the token, and paste it into the CLI prompt.
  2. Verify your Red Hat and AWS credentials
    • To verify your credentials, enter rosa whoami in the terminal; you should see output similar to the one below:
root@flood1:~# rosa whoami
AWS Account ID:               XXXXXXXXXXXX
AWS Default Region:           us-east-2
AWS ARN:                      arn:aws:iam::XXXXXXXXXXXX:user/Eman.Elsabban@domain.com
OCM API:                      https://api.openshift.com
OCM Account ID:               1tXXXXXXXXXXXXXXXXXXXXXXXXXX
OCM Account Name:             Eman Elsabban
OCM Account Username:         emanelsabban
OCM Account Email:            eman.elsabban@domain.com
OCM Organization ID:          1XXXXXXXXXXXXXXXXXXXXXXXXX
OCM Organization Name:        Eman Elsabban
OCM Organization External ID: XXXXXXXX

Now that the prerequisites are set, we can move to provisioning the OpenShift Cluster.

Steps to provision an OpenShift cluster using ROSA:

  1. Enable the ROSA service
    • To enable the use of the ROSA service within your AWS account you have to log in to the AWS console, look for Red Hat OpenShift Service under Services and click on Enable OpenShift. You should see a green bar at the top, indicating that your service is enabled.
  2. Verify SCP policies, quota and validate your AWS credentials
    • This step will verify if your AWS account is ready for deploying the cluster. Enter the rosa init command to verify if SCP policies, AWS quota, IAM permissions, and the command line interfaces for OpenShift and AWS are all set up and if ROSA has been enabled. You should see a similar output as below:
root@flood1:~# rosa init
I: Logged in as 'emanelsabban' on 'https://api.openshift.com'
I: Validating AWS credentials...
I: AWS credentials are valid!
I: Validating SCP policies...
I: AWS SCP policies ok
I: Validating AWS quota...
I: AWS quota ok
I: Ensuring cluster administrator user 'osdCcsAdmin'...
I: Admin user 'osdCcsAdmin' already exists!
I: Validating SCP policies for 'osdCcsAdmin'...
I: AWS SCP policies ok
I: Validating cluster creation...
I: Cluster creation valid
I: Verifying whether OpenShift command-line tool is available...
I: Current OpenShift Client Version: 4.7.13

Ensure there are no errors in the output above before moving to create a cluster.

  1. Create the OpenShift cluster
    • To create the OpenShift Cluster use the command rosa create cluster --interactive. The --interactive flag is recommended as it shows you what information is needed. It will take about 30-40 minutes for the cluster to be created. You can view the cluster state using rosa list clusters. Once the cluster is in the ‘ready’ state, you can describe the cluster to get the API URL needed to connect to OpenShift using oc CLI.
root@flood1:~# rosa describe cluster -c jitserver
Name:                       jitserver
ID:                         XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
External ID:                XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
OpenShift Version:          4.7.13
Channel Group:              stable
DNS:                        jitserver.xmaq.p1.openshiftapps.com
AWS Account:                XXXXXXXXXXXX
API URL:                    https://api.jitserver.xmaq.p1.openshiftapps.com:6443
Console URL:                https://console-openshift-console.apps.jitserver.xmaq.p1.openshiftapps.com
Region:                     us-east-2
Multi-AZ:                   false
Nodes:
 - Master:                  3
 - Infra:                   2
 - Compute:                 3
Network:
 - Service CIDR:            172.30.0.0/16
 - Machine CIDR:            10.0.0.0/16
 - Pod CIDR:                10.128.0.0/14
 - Host Prefix:             /23
State:                      ready
Private:                    No
Created:                    Jun  7 2021 15:43:18 UTC
Details Page:               https://cloud.redhat.com/openshift/details/1XXXXXXXXXXXXXXXXXXXXXXXXXXXXX8
  1. Lastly, create an admin user
    • To be able to access your cluster right away, you will need a cluster-admin user. To create one run the following command:
root@flood1:~# rosa create admin --cluster=jitserver
W: It is recommended to add an identity provider to login to this cluster. See 'rosa create idp --help' for more information.
I: Admin account has been added to cluster 'jitserver'.
I: Please securely store this generated password. If you lose this password you can delete and recreate the cluster admin user.
I: To login, run the following command:

   oc login https://api.jitserver.xmaq.p1.openshiftapps.com:6443 --username cluster-admin --password XXXXX-XXXXX-XXXXX-XXXXX

I: It may take up to a minute for the account to become active.

You can now use oc login https://api.jitserver.xmaq.p1.openshiftapps.com:6443 --username cluster-admin --password XXXXX-XXXXX-XXXXX-XXXXX to login into the cluster.

Performance Experiments on OpenShift in AWS

In this section, we will walk you through the experimental setup, the container images used, JITServer options configuration, how JITServer increased the application density, and the throughput results of JITServer configuration compared to Default configuration.

Experimental setup

The setup of the experiments consists of the following:

  • OCP Cluster with the following VM configurations:
    • Three Master nodes of type m5.xlarge (4 vCPU, 16GB)
    • Two Infra nodes of type r5.xlarge (4 vCPU, 32GB)
    • Three Worker nodes of type C5.2xlarge(8 vCPU, 16GB)
  • Four different Java applications:
    • AcmeAir Monolithic (can be found in the Monolithic repository)
    • AcmeAir Microservices (can be found in the Microservices repository)
    • PetClinic (can be found in the PetClinic repository)
    • Quarkus (can be found in the Quarkus repository)
  • A C5.xlarge VM used by JMeter to apply load to the Java applications

Container images used in the experiments

The experiments were performed on the following images which were used in deploying containers in ROSA.

  • JITServer image: docker.io/emanelsabban/acmeairmicrosvc:jitserver-java8-20210326
  • Java application Images
    • AcmeAir Monolithic image: docker.io/emanelsabban/acmeair:openj9-java8-build20210326
    • PetClinic image: docker.io/emanelsabban/petclinic:openj9_8
    • Quarkus image: docker.io/emanelsabban/rest-crud-quarkus:openj9_8
    • AcmeAir Microservices images:
      • Main service: docker.io/emanelsabban/acmeairmicrosvc:mainsvcnightly
      • Auth service: docker.io/emanelsabban/acmeairmicrosvc:authsvcnightly
      • Booking service: docker.io/emanelsabban/acmeairmicrosvc:bookingsvcnightly
      • Customer service: docker.io/emanelsabban/acmeairmicrosvc:customersvcnightly
      • Flight service: docker.io/emanelsabban/acmeairmicrosvc:flightsvcnightly
  • Databases Images
    • Postgres image: docker.io/postgres:10.5 (database used by Quarkus application)
    • Mongodb image: docker.io/emanelsabban/mongodb:mongo_acmeair (used by AcmeAir Monolithic and AcmeAir Microservices applications)
  • JMeter Images
    • JMeter image for AcmeAir Monolithic: docker.io/emanelsabban/acmeair-jmeter:jmeter_20210608
    • JMeter image for AcmeAir Microservices: docker.io/emanelsabban/acmeairmicroservices-jmeter:jmeter_20210608
    • JMeter image for PetClinic: docker.io/emanelsabban/petclinic-jmeter:jmeter_20210608
    • JMeter image for Quarkus: docker.io/emanelsabban/quarkus-jmeter:jmeter_20210608

Configuration of JITServer options

Below are the options that were passed to the client JVMs (Java applications) through the JVM environment variable JAVA_OPTIONS in Default and JITServer configurations as well as the options passed to JITServer through environment variable OPENJ9_JAVA_OPTIONS in the JITServer configuration.

Default configuration:

In the Default configuration, JITServer was not used, thus only options passed to clients JVMs were added.

  • -Xjit:verbose={compilePerformance},vlog=/tmp/vlog.txt enables verbose logging on the client side.
JITServer configuration:

In the JITServer configuration the client JVMs connected to JITServer, thus options to connect to JITServer were added to the clients.

  • On JITServer the following options were used:
    • -Xjit:verbose={compilePerformance|JITserver},vlog=/tmp/vlog.txt enables verbose logging on JITServer and directs the output to vlog.txt file.
    • -XX:+JITServerShareROMClasses allows the read-only part of the classes cached at the server to be shared among the multiple connected client JVMs, thus reducing memory used by JITServer.
  • On the client JVMs the following options were used:
    • -XX:+UseJITServer to start the JVM in client mode.
    • -XX:JITServerAddress=<JITServer’s hostname or IP address> provides the address of the server. In this case, it is the name of the JITServer service in OpenShift.
    • -Xjit:verbose={compilePerformance|JITServer},vlog=/tmp/vlog.txt enables verbose logging on the client.
    • -Xjit:enableJITServerHeuristics allows the client to guess which compilations are cheap (in terms of resources required) and perform them locally, instead of making a compilation request to the server. If this option is enabled, the client will try to maximize the benefits of using JITServer while minimizing the networking overhead.

How JITServer enables higher application density

The JIT compiler is a complex piece of technology that can consume a lot of memory during its operation. This creates transient footprint spikes that force the user to employ larger containers (in terms of memory) in order to avoid native out-of-memory scenarios. In contrast, when compilations are offloaded to JITServer the footprint spikes due to JIT activity disappear, and the high watermark of memory consumption at the client JVMs is both lower and more predictable. This allows the user to reduce the memory limits for containers, which in turn opens the door for cost savings. Even after we consider the memory used by JITServer, the overall system memory usage is expected to be lower because spikes of peak memory usage from the bulk of client JVMs are unlikely to align perfectly. But let’s look at an example.

For each deployment type we experimentally determined the minimum container memory limits needed to avoid Out-of-Memory events for both configurations, with JITServer and without. As can be seen in Table 1, for any of the applications considered, JITServer is quite effective at reducing the peak memory consumption and thus decreasing container memory limits. Comparing the total amount of memory needed by the JVMs, the JITServer configuration uses 8700 MB less than the Default configuration, which represents savings of about 42%. Even after accounting for the memory needed by the databases (which is the same in both configurations) and by the JITServer instances, the JITServer configuration still uses less memory, yielding 6300 MB savings, or 25% in relative terms.

Replica CountMemory limit without JITServer per replicaTotal memory limit without JITServerMemory limit with JITServer per replicaTotal memory limit with JITServer
AcmeAir Monolithic8500 MB4000 MB250 MB2000 MB
AcmeAir Main2200 MB400 MB150 MB300 MB
AcmeAir Auth2350 MB700 MB250 MB500 MB
AcmeAir Booking8550 MB4400 MB400 MB3200 MB
AcmeAir Customer4550 MB2200 MB350 MB1400 MB
AcmeAir Flight6450 MB2700 MB250 MB1500 MB
PetClinic8450 MB3600 MB250 MB2000 MB
Quarkus8350 MB2800 MB150 MB1200 MB
Total JVMs4620800 MB12100 MB
MongoDB41000 MB4000 MB1000 MB4000 MB
Postgres1600 MB600 MB600 MB600 MB
JITServer20 MB0 MB1200 MB2400 MB
Final Total5325400 MB19100 MB
Table 1. Comparing memory limits of Java applications with and without JITServer

Since the user pays by the node, let’s compute how many worker nodes are needed for each configuration. Our worker nodes are equipped with 16GB of RAM, but due to OS and OpenShift overhead, only ~12.3 GB is available for the workloads. It follows that the JITServer configuration can fit into just two nodes, while the Default configuration needs three nodes. From the user point of view this represents a 33% reduction in cost.

Figure 1 illustrates how OpenShift scheduler distributed the pods across the nodes for the two configurations. The colored boxes represent different pods, and the numbers inside correspond to the memory limits set for these pods. For instance, “AM 500” is a pod for the AcmeAir Monolithic application and it uses a memory limit of 500 MB (in the Default configuration). The boxes are drawn at scale, so a bigger box means a pod with a proportionally larger memory limit. This helps us visualize at a glance that the pods for the Default config cannot fit into just two nodes and the user has to pay for a third node to accommodate all these deployments.

Figure 1. Comparing pod distribution in Default configuration versus JITServer configuration

Throughput Experiments

Higher application density doesn’t mean much without a comparable level of throughput and therefore we decided to study the behavior of the two configurations, Default and JITServer, under load. In our experiments, three different levels of load were applied: low load (about 9% CPU utilization), medium load (about 17.5% CPU utilization), and high load (about 33% CPU utilization). These levels of load reflect the overall machine (worker nodes) utilization at steady state (not including the 7% CPU overhead imposed by the OpenShift infrastructure), and they were chosen based on conditions seen in practice [3]. A stagger delay of 2 minutes was used between applying load on an application and the next, and the load was applied in order from biggest to smallest application in terms of memory (AcmeAir Microservices -> AcmeAir Monolithic -> PetClinic -> Quarkus). Figures 2-4 compare the throughput of the two configurations for the three selected levels of load.

Figure 2. Throughput graphs comparing JITServer vs. Baseline (Default config.) for medium load (17.5%)

Figure 3. Throughput graphs comparing JITServer vs. Baseline for high load (33%)

Figure 4. Throughput graphs comparing JITServer vs. Baseline for low load (9%)

The important thing to note is the JITServer configuration achieves the same level of steady state throughput as the Baseline configuration, irrespective of load, and for all the applications considered. From a ramp-up (time to reach peak throughput) point of view, the JITServer configuration experiences a tiny bit of a lag, more visible for the high load scenarios and almost indiscernible for the low load. This is due to the limited number of CPU resources available to the JITServer (two nodes instead of three).

On the graphs, you can also notice some dips in throughput, more pronounced for AcmeAir monolithic and for high load scenarios. This is due to interference between applications, or the so called noisy neighbor effect. As explained above, since in practice applications are not likely to be loaded at the exact same time, in these experiments we apply load to the four applications in a staggered fashion, two minutes apart. Those throughput dips correspond to these two minute intervals when the next application starts to become exercised causing a flurry of JIT compilations to happen. If you pay close attention you’ll observe that the Baseline configuration is affected too by the noisy neighbor effect, but to a lower extent because Baseline has 50% more CPUs to its disposal (3 nodes vs 2).

Conclusion

The experiments conducted on Amazon Cloud (AWS) demonstrate that the JITServer Technology can increase container density in the cloud without sacrificing long-term throughput and therefore can reduce operational costs of Java applications by 20 to 30%. Ramp-up can be slightly affected in high density scenarios due to limited computing resources; however the extent of this depends on the amount of load (as seen with the three levels of load) and the number of pods concurrently active. For low levels of load, JITServer achieves the same ramp-up characteristics as the Baseline but at a lower cost. Overall, JITServer is an effective cost saving technology for Java applications running in the cloud or in any resource-constrained environment.

1 Reply to “Save Money with JITServer on the Cloud – an AWS Experiment”

  1. That sounds interesting. Where can we learn more about it. Is this specific to openj9 or can be enabled with hotstop as well?

Leave a Reply to Anbu Cancel reply