Overview
The previous blog introduced using OpenJ9 CRIU Support. However, the application had to be run with elevated privileges. This blog outlines how to use OpenJ9 CRIU Support with minimal privileges. We have tested running a UBI8 container on RHEL 8.6 and a Ubuntu 22.04 container on Ubuntu 22.04 with the latest kernel updates (this is important since there was an issue fixed in the latest updates that was preventing successful restore); YMMV on other OSs.
Prerequisites
- A kernel that supports the
CAP_CHECKPOINT_RESTORE
Linux capability. This capability was introduced in kernel version 5.9 but has been backported to RHEL kernel versions used in RHEL 8.6. - The latest version of
podman
for the linux distribution; at the time of this blog post,docker
does not yet supportCAP_CHECKPOINT_RESTORE
(though this support seems to be present in the development stream at thedocker
project). podman
configured to usecrun
orrunc
.- If using
runc
, the version needs to be 1.1.3 or higher to have the recent fix which enables mounting/proc/sys/kernel/ns_last_pid
. - The custom seccomp profile (renamed to
criuseccompprofile.json
).
Unlike docker
, podman
does not use a daemon with root
authority, so the user who launches the container must have the authority to grant it the necessary Linux capabilities. This blog assumes the podman
commands are run as the root
user.
Get the containerfile
The following section uses a Ubuntu 22.04 container. Click here to jump to using a UBI8 container.
Ubuntu 22.04
To get started clone the InstantOnStartupGuide repo
git clone https://github.com/ibmruntimes/InstantOnStartupGuide.git
cd InstantOnStartupGuide
Next build the container image. This will first acquire the prerequisites for running Eclipse OpenJ9, then it will acquire the CRIU prerequisites and build CRIU from source. It is important to note that the CRIU source in this container file comes from here as it includes changes not yet in the headstream that are needed to perform an unprivileged restore.
podman build -f Containerfiles/Containerfile.ubuntu22.unprivileged -t instantondemo .
The next section goes over using a UBI8 containerfile. Click here to jump to trying the demo.
UBI8
To get started clone the InstantOnStartupGuide repo
git clone https://github.com/ibmruntimes/InstantOnStartupGuide.git
cd InstantOnStartupGuide
Next build the container image. This will first acquire the prerequisites for running Eclipse OpenJ9, then it will acquire the CRIU prerequisites as well as the CRIU binary and shared library. It is important to note that the CRIU binary pulled into this container file is built from this source level as it includes changes not yet in the headstream that are needed to perform an unprivileged restore.
podman build -f Containerfiles/Containerfile.ubi8.unprivileged -t instantondemo .
It is worth noting that this containerfile is different from its privileged counterpart in that instead of deriving from registry.access.redhat.com/ubi8/ubi:latest
, it derives from icr.io/appcafe/open-liberty:beta-instanton
which comes from the Open Liberty Instant On blog post. In order to build CRIU from source, one needs to have a subscription to a Red Hat Entitlement Server; this blog post does not assume this of the reader.
Trying the demo
Launch the container image
podman run --user 1001 --privileged -it --name checkpoint_run instantondemo
Note, we’re running with --privileged
as this container will only be used to run the application to build the checkpoint. Typically this step is expected to occur during the build phase of a deployment pipeline, and therefore will not be executed in a production environment where it may be susceptible to external threats. However, the restore step will not be run with --privileged
. Therefore, we need to take a few steps to facilitate this.
The first step is already done, namely running with the --user 1001
option above. This is needed because in order for an unprivileged restore to occur, the process needs to have been run as an unprivileged user. Next, once inside the container, open the HelloInstantOn.java
file and
- Uncomment the line
//checkPointJVM("checkpointData");
- Update
.setLogFile("logs")
with.setLogFile("logs").setUnprivileged(true)
The first change checkpoints the application, and the second informs CRIU that the restore will be unprivileged. Next, create the directory that will be used to store the checkpoint data
mkdir checkpointData
Compile and run the code
javac HelloInstantOn.java
java -XX:+EnableCRIUSupport HelloInstantOn 1>out 2>/dev/null </dev/null
At the time of this blog, in order to restore a process without elevated privileges, STDIN
, STDOUT
, and STDERR
cannot be tied to the terminal, and therefore have to be redirected to a file. Thus, running this command will not show an output. However, you can verify the checkpoint succeeded by either listing the checkpointData
directory, or cat
‘ing the contents of the out
file.
$ cat out
Start
Load and initialize classes
....
Next, exit the container and commit it to create a new image
podman commit checkpoint_run restorerun
This is necessary because the container that we used to create the checkpoint was privileged. Next, start an unprivileged container.
podman run \
--rm \
-it \
--user 1001 \
--cap-add=CHECKPOINT_RESTORE \
--cap-add=NET_ADMIN \
--cap-add=SYS_PTRACE \
--security-opt seccomp=criuseccompprofile.json \
-v /proc/sys/kernel/ns_last_pid:/proc/sys/kernel/ns_last_pid \
restorerun
This command is considerably more involved than the privileged case so let us go through those command line switches and why they are needed
- The
--user 1001
ensures we start the container as an unprivileged user. - The
--cap-add
options grant the container three Linux capabilities that CRIU needs. - The
--security-opt
option uses thecriuseccompprofile.json
file (which is based on the default Docker seccomp profile) to unblock system calls used by CRIU. - The
-v /proc/sys/kernel/ns_last_pid:/proc/sys/kernel/ns_last_pid
option is needed to give CRIU access tons_last_pid
.
When running on a kernel with the clone3
system call, mounting /proc/sys/kernel/ns_last_pid
is unnecessary, and in fact can cause the podman run
command to fail unless podman
is configured to use crun,
or the version of runc
installed on the machine is at least 1.1.3. To check what container runtime is configured for your podman
, run
podman info
and look at the ociRuntime
section. Additionally, depending on the operating system, specifying --security-opt seccomp=criuseccompprofile.json
may be unnecessary as the podman
installed may grant the necessary system calls by default to the containers it launches.
Getting back to the demo, once inside the container, run the restore command
time criu restore --unprivileged -D ./checkpointData --file-locks --shell-job -v4 --log-file=restore.log
Note the need for the --unprivileged
flag passed to CRIU to be able to perform the unprivileged restore. Because we redirected all outputs to files, to verify that the application did indeed resume, simply cat
the out
file again
$ cat out
Start
Load and initialize classes
....
Application ready!
As in the privileged scenario, you should observe that application resumes after the “checkpointJVM” method and that the application is ready roughly 10 times faster than when the application was run from the start, with the added benefit of running as an unprivileged user with the minimal set of required capabilities.
Setting up container restore
The previous steps demonstrate how to create a checkpoint and restore it. In production you’d ideally want to save the checkpointed image as part of your build phase and restore it at deployment. To accomplish this one can construct a container image with the checkpointed image.
The following section uses a Ubuntu 22.04 container. Click here to jump to using a UBI8 container.
Ubuntu 22.04
To start, first exit the container to return to the InstantOnStartupGuide repo, then launch the build script.
bash Scripts/unprivileged/build.sh
The script will run all the steps in the previous section and save the CRIU checkpoint image files into a container image. Once it completes the only remaining step is to launch the restore container. Below is a helpful script that can launch it for you; use
bash Scripts/unprivileged/restoreContainer.sh
Because STDIN
, STDOUT
, and STDERR
are redirected to files, these scripts run the HelloInstantOn.java
program in a mode that keeps the program running indefinitely but periodically prints out a message. To verify the program has resumed, in another terminal run
podman exec -it restore_run tail -f out
you should see
Start
Load and initialize classes
....
Application ready!
Heartbeat
Heartbeat
Heartbeat
...
The next section goes over using a UBI8 containerfile. Click here to jump to the Conclusion.
UBI 8
To start, first exit the container to return to the InstantOnStartupGuide repo, then launch the build script.
bash Scripts/unprivileged/buildUBI.sh
The script will run all the steps in the previous section and save the CRIU checkpoint image files into a container image. Once it completes the only remaining step is to launch the restore container. Below is a helpful script that can launch it for you; use
bash Scripts/unprivileged/restoreContainerUBI.sh
Because STDIN
, STDOUT
, and STDERR
are redirected to files, these scripts run the HelloInstantOn.java
program in a mode that keeps the program running indefinitely but periodically prints out a message. To verify the program has resumed, in another terminal run
podman exec -it restore_run tail -f out
you should see
Start
Load and initialize classes
....
Application ready!
Heartbeat
Heartbeat
Heartbeat
...
Conclusion
This blog post went over how to use OpenJ9 CRIU Support without requiring elevated privileges on the Ubuntu 22.04 and UBI 8 platforms. However, this demo only covered how to run containers on a local machine using podman
. See Deploying on Kubernetes and OpenShift with OpenJ9 CRIU Support to learn about deploying restore containers on Kubernetes and OpenShift Container Platform.
3 Replies to “Unprivileged OpenJ9 CRIU Support”