Unprivileged OpenJ9 CRIU Support

Overview

The previous blog introduced using OpenJ9 CRIU Support. However, the application had to be run with elevated privileges. This blog outlines how to use OpenJ9 CRIU Support with minimal privileges. We have tested running a UBI8 container on RHEL 8.6 and a Ubuntu 22.04 container on Ubuntu 22.04 with the latest kernel updates (this is important since there was an issue fixed in the latest updates that was preventing successful restore); YMMV on other OSs.

Prerequisites

  • A kernel that supports the CAP_CHECKPOINT_RESTORE Linux capability. This capability was introduced in kernel version 5.9 but has been backported to RHEL kernel versions used in RHEL 8.6.
  • The latest version of podman for the linux distribution; at the time of this blog post, docker does not yet support CAP_CHECKPOINT_RESTORE (though this support seems to be present in the development stream at the docker project).
  • podman configured to use crun or runc.
  • If using runc, the version needs to be 1.1.3 or higher to have the recent fix which enables mounting /proc/sys/kernel/ns_last_pid.
  • The custom seccomp profile (renamed to criuseccompprofile.json).

Unlike docker, podman does not use a daemon with root authority, so the user who launches the container must have the authority to grant it the necessary Linux capabilities. This blog assumes the podman commands are run as the root user.

Get the containerfile

The following section uses a Ubuntu 22.04 container. Click here to jump to using a UBI8 container.

Ubuntu 22.04

To get started clone the InstantOnStartupGuide repo

git clone https://github.com/ibmruntimes/InstantOnStartupGuide.git
cd InstantOnStartupGuide

Next build the container image. This will first acquire the prerequisites for running Eclipse OpenJ9, then it will acquire the CRIU prerequisites and build CRIU from source. It is important to note that the CRIU source in this container file comes from here as it includes changes not yet in the headstream that are needed to perform an unprivileged restore.

podman build -f Containerfiles/Containerfile.ubuntu22.unprivileged -t instantondemo .

The next section goes over using a UBI8 containerfile. Click here to jump to trying the demo.

UBI8

To get started clone the InstantOnStartupGuide repo

git clone https://github.com/ibmruntimes/InstantOnStartupGuide.git
cd InstantOnStartupGuide

Next build the container image. This will first acquire the prerequisites for running Eclipse OpenJ9, then it will acquire the CRIU prerequisites as well as the CRIU binary and shared library. It is important to note that the CRIU binary pulled into this container file is built from this source level as it includes changes not yet in the headstream that are needed to perform an unprivileged restore.

podman build -f Containerfiles/Containerfile.ubi8.unprivileged -t instantondemo .

It is worth noting that this containerfile is different from its privileged counterpart in that instead of deriving from registry.access.redhat.com/ubi8/ubi:latest, it derives from icr.io/appcafe/open-liberty:beta-instanton which comes from the Open Liberty Instant On blog post. In order to build CRIU from source, one needs to have a subscription to a Red Hat Entitlement Server; this blog post does not assume this of the reader.

Trying the demo

Launch the container image

podman run --user 1001 --privileged -it --name checkpoint_run instantondemo

Note, we’re running with --privileged as this container will only be used to run the application to build the checkpoint. Typically this step is expected to occur during the build phase of a deployment pipeline, and therefore will not be executed in a production environment where it may be susceptible to external threats. However, the restore step will not be run with --privileged. Therefore, we need to take a few steps to facilitate this.

The first step is already done, namely running with the --user 1001 option above. This is needed because in order for an unprivileged restore to occur, the process needs to have been run as an unprivileged user. Next, once inside the container, open the HelloInstantOn.java file and

  1. Uncomment the line
    //checkPointJVM("checkpointData");
  2. Update
    .setLogFile("logs")
    with
    .setLogFile("logs").setUnprivileged(true)

The first change checkpoints the application, and the second informs CRIU that the restore will be unprivileged. Next, create the directory that will be used to store the checkpoint data

mkdir checkpointData

Compile and run the code

javac HelloInstantOn.java
java -XX:+EnableCRIUSupport HelloInstantOn 1>out 2>/dev/null </dev/null

At the time of this blog, in order to restore a process without elevated privileges, STDIN, STDOUT, and STDERR cannot be tied to the terminal, and therefore have to be redirected to a file. Thus, running this command will not show an output. However, you can verify the checkpoint succeeded by either listing the checkpointData directory, or cat‘ing the contents of the out file.

$ cat out 
Start
Load and initialize classes
....

Next, exit the container and commit it to create a new image

podman commit checkpoint_run restorerun

This is necessary because the container that we used to create the checkpoint was privileged. Next, start an unprivileged container.

podman run \
  --rm \
  -it \
  --user 1001 \
  --cap-add=CHECKPOINT_RESTORE \
  --cap-add=NET_ADMIN \
  --cap-add=SYS_PTRACE \
  --security-opt seccomp=criuseccompprofile.json \
  -v /proc/sys/kernel/ns_last_pid:/proc/sys/kernel/ns_last_pid \
  restorerun

This command is considerably more involved than the privileged case so let us go through those command line switches and why they are needed

  • The --user 1001 ensures we start the container as an unprivileged user.
  • The --cap-add options grant the container three Linux capabilities that CRIU needs.
  • The --security-opt option uses the criuseccompprofile.json file (which is based on the default Docker seccomp profile) to unblock system calls used by CRIU.
  • The -v /proc/sys/kernel/ns_last_pid:/proc/sys/kernel/ns_last_pid option is needed to give CRIU access to ns_last_pid.

When running on a kernel with the clone3 system call, mounting /proc/sys/kernel/ns_last_pid is unnecessary, and in fact can cause the podman run command to fail unless podman is configured to use crun, or the version of runc installed on the machine is at least 1.1.3. To check what container runtime is configured for your podman, run

podman info

and look at the ociRuntime section. Additionally, depending on the operating system, specifying --security-opt seccomp=criuseccompprofile.json may be unnecessary as the podman installed may grant the necessary system calls by default to the containers it launches.

Getting back to the demo, once inside the container, run the restore command

time criu restore --unprivileged -D ./checkpointData --file-locks --shell-job -v4 --log-file=restore.log

Note the need for the --unprivileged flag passed to CRIU to be able to perform the unprivileged restore. Because we redirected all outputs to files, to verify that the application did indeed resume, simply cat the out file again

$ cat out 
Start
Load and initialize classes
....
Application ready!

As in the privileged scenario, you should observe that application resumes after the “checkpointJVM” method and that the application is ready roughly 10 times faster than when the application was run from the start, with the added benefit of running as an unprivileged user with the minimal set of required capabilities.

Setting up container restore

The previous steps demonstrate how to create a checkpoint and restore it. In production you’d ideally want to save the checkpointed image as part of your build phase and restore it at deployment. To accomplish this one can construct a container image with the checkpointed image.

The following section uses a Ubuntu 22.04 container. Click here to jump to using a UBI8 container.

Ubuntu 22.04

To start, first exit the container to return to the InstantOnStartupGuide repo, then launch the build script.

bash Scripts/unprivileged/build.sh

The script will run all the steps in the previous section and save the CRIU checkpoint image files into a container image. Once it completes the only remaining step is to launch the restore container. Below is a helpful script that can launch it for you; use

bash Scripts/unprivileged/restoreContainer.sh

Because STDIN, STDOUT, and STDERR are redirected to files, these scripts run the HelloInstantOn.java program in a mode that keeps the program running indefinitely but periodically prints out a message. To verify the program has resumed, in another terminal run

podman exec -it restore_run tail -f out

you should see

Start
Load and initialize classes
....
Application ready!
Heartbeat
Heartbeat
Heartbeat
...

The next section goes over using a UBI8 containerfile. Click here to jump to the Conclusion.

UBI 8

To start, first exit the container to return to the InstantOnStartupGuide repo, then launch the build script.

bash Scripts/unprivileged/buildUBI.sh

The script will run all the steps in the previous section and save the CRIU checkpoint image files into a container image. Once it completes the only remaining step is to launch the restore container. Below is a helpful script that can launch it for you; use

bash Scripts/unprivileged/restoreContainerUBI.sh

Because STDIN, STDOUT, and STDERR are redirected to files, these scripts run the HelloInstantOn.java program in a mode that keeps the program running indefinitely but periodically prints out a message. To verify the program has resumed, in another terminal run

podman exec -it restore_run tail -f out

you should see

Start
Load and initialize classes
....
Application ready!
Heartbeat
Heartbeat
Heartbeat
...

Conclusion

This blog post went over how to use OpenJ9 CRIU Support without requiring elevated privileges on the Ubuntu 22.04 and UBI 8 platforms. However, this demo only covered how to run containers on a local machine using podman. See Deploying on Kubernetes and OpenShift with OpenJ9 CRIU Support to learn about deploying restore containers on Kubernetes and OpenShift Container Platform.

3 Replies to “Unprivileged OpenJ9 CRIU Support”

Leave a Reply