JVM debug mode
It is important that developers can use debuggers like Eclipse, NetBeans, and JDB to debug a Java program running in debug mode. Under the hood, debugging features like setting breakpoints, single stepping, inspecting variable values, etc., are implemented through JVM Tool Interface (JVMTI). JVMTI is the lowest level of programming interface used by debugging tools which allows the debugger to inspect states of the VM and control its behavior. More detailed information about how to use the JVMTI can be found here.
The JVMTI agents are loaded when a JVM starts up. Before the actual Java application starts running, a startup routine asks the JVM for capabilities for which the JVM enables the corresponding hooks if the capabilities are supported. Then the Just-In-Time (JIT) compiler needs to check for the enabled hooks to decide whether the JIT code needs to be compiled specially in order to support them. This process is illustrated in the diagram below. This blog focuses on why JIT code needs special compilation for debugging and how it is achieved in Eclipse OpenJ9.
JIT code supporting debug mode
The debugging features are supported on the VM side and works in interpreter. However, running the entire program in the interpreter will slow down the application performance significantly – 20x compared to non-debug mode running JIT code. It would be a very frustrating experience if a developer sets a breakpoint and the program takes 20x longer to even reach the breakpoint. OpenJ9 speeds up this process by allowing a Java program to start off running JIT code and then transition to interpreter only when debugging events are actually triggered and performance is no longer a concern, for example when a breakpoint is hit and the debugger starts single stepping and inspecting variable values. Achieving this transition requires special mechanics as the VM’s bytecode stack state must be recreated from the values held in registers and on the stack by the JIT method implementation. It is important to note that transitions can only happen at yield points like checks, helper calls, etc., where JIT code calls into VM which takes over the execution of the program. There are 2 modes in the OpenJ9 JIT compiler under which this transition is supported: Full Speed Debug (FSD) and Mimic Interpreter Frame Shape (MIFS).
Debug modes in OpenJ9 JIT compiler
Mimic Interpreter Frame Shape (MIFS)
In MIFS, JIT code mimics the behavior of interpreter and makes sure the JIT method frame is the same as interpreter method frame at any yield point in the program. Whenever debugging events are triggered, the interpreter can take over and just continue the program execution as if the program had been executing interpreted. Performance is not ideal in this mode because many optimizations, including inlining, have to be disabled to make sure local variable values are always in the stack slot matching the interpreter. However, MIFS is less complicated and therefore can be used for diagnosing bugs in FSD. MIFS is the default debugging mode when advanced compiler features are enabled such as AOT because we try to avoid using FSD with these features to add more complexity.
Full Speed Debug (FSD)
FSD is based on On Stack Replacement (OSR), a mechanism to replace a JIT method frame with an interpreter method frame at any yield point, even with various optimizations (involving code motion and elimination) turned on. Performance of FSD is much better than MIFS, because of the various optimizations that are allowed.
On Stack Replacement (OSR)
OSR is the cornerstone of FSD, as well as a lot of other technologies in the JIT compiler, because it allows JIT code to, optionally, transition execution back to the VM at yield points in the program. The JIT can execute the most optimized version with debugging events turned off, and transition to the VM only when the debugging event is triggered. The current OSR implementation is very complex and supports a number of different execution models. This section aims to discuss the basic ideas about how OSR works in OpenJ9.
Involuntary and Voluntary OSR
OSR can operate in two modes: voluntary and involuntary.
Under involuntary OSR, the VM controls when execution will transition from JIT code to the bytecode interpreter loop – this mode is used to implement FSD. In this mode, any operation which may yield control to the VM may allow the VM to trigger an OSR transition – we view this as involuntary OSR because the JIT code does not control when a transition may occur.
Under voluntary OSR the JIT code controls when to trigger an OSR transition to the VM. This mode is used in a number of ways by the JIT, and the most representative example is nextGenHCR.
OSR representation in IR
To reconstruct the state of the interpreter from a certain yield point in JIT code, some kind of OSR mapping is required to describe where to find the autos in the interpreter from the JIT stackslots and registers. In OpenJ9, this mapping relationship is expressed in the Intermediate Representation (IR) in form of a translation block containing a bunch of stores from the value in the JIT code to auto stack slots.
There is a single translation block per method. Inlined callees are handled by linking a callee’s translation block to its caller’s. If an OSR transition happens inside an inlined callee, the program executes the callee’s translation first and then jump to its caller’s translation block until it finishes executing all the translation block in the call chain and reconstructs stack slots for all the interpreter frames.
An OSR transition is treated as an exception and the translation block is connected to the rest of the CFG through an exception edge. They can exit a basic block at any point, jumping to the start of the destination. This allows one basic block to contain several OSR transitions, without splitting it, which would disrupt optimization. Analyses, such as liveness and reaching definitions, will treat these edges conservatively to ensure correctness.
The benefit of using IL to represent OSR mappings is that all optimizations will see the translation blocks just as they would see the other blocks in the IL which represents the original program, and consider the restrictions the translation blocks enforce accordingly. Therefore, the mapping are updated automatically by any transformations done to the trees and no extra maintenance is needed.
OSR Liveness analysis and side data structure
Other than explicit IR, side data structures are needed when 2 auto symbols in JIT code can map to the same interpreter stack slot. In JIT code, the same memory location with different types are represented with different symbols. At each bytecode index, the compiler needs to decide which value should be copied to the interpreter stack slot; this is equivalent to asking which symbol is alive if the program is executing in interpreter. Doing liveness analysis right after IL generation can determine the answer because liveness info is the same in JIT code without optimizations and the interpreter. At each bytecode index, the side data structure maps each interpreter stack slot to a JIT symbol that’s alive at that point. The following graph shows how the entire process works for an auto stack slot shared by Address type data and Primitive type data. In the JIT Method Frame, the 2 types of data uses different stack slots. When OSR happens, the values are copied to a intermediate buffer. The side data structure is consulted, and at bytecode Index 4, the value of the Address type data is copied to interpreter method frame while at bytecode index 10 the value of the Primitive type data is copied.
Summary of OSR representation
The most notable advantage of the OpenJ9 OSR representation is that it isn’t invalidated by compiler optimizations. The translation blocks are treated no different from other IL trees and no significant changes to existing optimizations are needed to guarantee correctness. The side data structure represents the fixed state in interpreter and won’t change throughout the entire compilation, and therefore doesn’t need extra effort either.
OSR in FSD
OSR is used in FSD to transition from JIT to interpreter, for instance in the case of debugger starts single stepping. It is also been used to switch from a faster version of JIT code to a slower version and delay the overhead of the debugging feature until it’s necessary like in the case of field watch, where debugger needs to report all the fields reads and writes. In field watch, a method is first compiled and executed without any helper calls to report field accesses. Later when the watch is turned on, executing methods on the call stacks do OSR transitions to get to interpreter. This allows a slower version of JIT code with helper calls to report field access to be compiled and executed.
Performance Results and Future work
An industry-standard benchmark Daytrader 3 running with WebSphere Liberty is used to compare the performance of FSD, MIFS and non-debugging mode. The score is normalized and is the higher the better. The process is pinned to 4-cores on a Skylake machine. From the result we can see that FSD based on OSR is around 50% better than MIFS. There is still gap between FSD and non-debugging mode.
Future work will focus on further closing the gap between FSD and non-debugging mode. There are two known problems. First one is that OSR exception path introduces extra control flow which can be improved by special handling of translation block in dataflow analysises. The second one is calling vm helpers for debugging hooks which can be fixed in similar way as Field Watch mentioned above.
This blog introduced about what’s happening behind the scenes of OpenJ9 debugging mode and some basic ideas of how OSR works. Hopefully you will have a chance to try debugging a Java program with OpenJ9 and see the improvement from FSD to MIFS yourself!