The Deploying on Kubernetes and OpenShift with OpenJ9 CRIU Support blog post (which is now out of date) showed how to deploy on OpenShift Container Platform (OCP) using OpenJ9 CRIU Support. Since that time however, OCP has released Version 4.13 which uses Red Hat CoreOS (RHCOS) 4.13, which is based on RHEL 9. This means that the kernel now has the clone3()
system call, obviating the need to mount /proc/sys/kernel/ns_last_pid
as a volume. Additionally, IBM now provides IBM Semeru container images with the OpenJ9 CRIU Support feature. Finally, the OpenJ9 CRIU Support feature has been improved to require an even smaller set of Linux Capabilities on restore. This blog will go over how to deploy on OCP 4.13 to take advantage of all these improvements.
Build the Restore Image
The following assumes running as root
. Clone the InstantOnStartupGuide repo
git clone https://github.com/ibmruntimes/InstantOnStartupGuide.git
cd InstantOnStartupGuide
Run the script to build the image (note this script uses podman
)
bash Scripts/unprivileged/buildUBI8RH9.sh
This will generate an image with the tag localhost/restorerun:latest
. The application in the image is the same as in previous blogs (i.e. the HelloInstantOn.java
application). Tag and push the image appropriately.
It is important that the image be built on RHCOS 4.13 (or an OS with an equivalent kernel level) to ensure that the checkpoint image can be restored via clone3
.
Deploy the Restore Image
Create a new project so the deployment can be appropriately scoped
oc new-project criu
oc project criu
Create a service account
oc create sa criusvcacct
Create the appropriate Security Context Constraint (SCC) to allow a restore to occur with minimal privileges. This SCC is based on the restricted
SCC. Additionally, create a new Role that uses this SCC.
oc apply -f YAMLs/ocp/scc-cap-cr-4.13.yaml
oc apply -f YAMLs/ocp/role-custom-scc-cap-cr-my-app-criu.yaml
Create a new Role Binding to bind the Role the Service Account.
oc apply -f YAMLs/ocp/rolebinding-criusvcacct-my-app-criu.yaml
Launch the deployment. Note, the image
field in my-app-criu-4.13.yaml
should be updated with the name of the container image with the checkpointed application.
oc apply -f YAMLs/common/my-app-criu-4.13.yaml
Inspect the out
file to view the application output.
oc get pods
oc exec <POD_NAME_FROM_PREVIOUS_CMD> -- tail -f out
What Changed?
Reduced SCC Capabilities
The cap-cr-scc
now only needs the CHECKPOINT_RESTORE
and SETPCAP
capabilities to perform a restore.
Separate CRIU Binaries
The icr.io/appcafe/ibm-semeru-runtimes:open-17-jdk-ubi
provides two separate criu
binaries. The checkpoint phase uses the /usr/local/sbin/criu
binary, which requires the CHECKPOINT_RESTORE
, SYS_PTRACE
, and SETPCAP
capabilities. The restore phase uses the /opt/criu/criu
binary, which only requires the CHECKPOINT_RESTORE
and SETPCAP
capabilities.
Simpler Deployment YAML
The deployment YAML no longer needs to specify volumeMount
and volumes
.
Using Semeru Base Image
The Containerfile used to build the checkpoint/restore image no longer uses an OpenLiberty image as the base image; instead it uses icr.io/appcafe/ibm-semeru-runtimes:open-17-jdk-ubi
.
What’s Next?
At the time of this blog post, the custom SCC still needs to force the deployment to run as user 1001
:
runAsUser:
type: MustRunAs
uid: 1001
We are exploring how to remove this limitation in the future, thus further improving the user experience for OpenJ9 CRIU Support in OCP.
Great article, regards: ‘It is important that the image be built on RHCOS 4.13’, do you mean the base image used in the build file, in this case the ibm-semeru-runtimes:open-17-jdk-ubi image; or the actual machine where the build command is run?
Thanks