Eclipse OpenJ9 uses the
omrintrospect library in Eclipse OMR to incorporate native stack traces in the
javacores (diagnostic files). This library allows the collection of thread call stacks in a process by suspending and iterating over each thread. On MacOS, all stacks are generated using the
libunwind library, a simple and efficient API for register analysis in a call chain. The call stacks produced will contain a symbol name + offset, instruction pointer (IP), and module name + offset for each stack frame. This functionality is useful for quickly obtaining diagnostic information after a crash or during a particular state of program execution.
How omrintrospect sets up the stack collection
There are a few steps to take before the call stacks of the process can be obtained using
- The procedure starts with a call to
omrintrospect_threads_startDo_with_signal(signal context is available, will ignore signal handler frames) or
omrintrospect_threads_startDo(no signal context). These functions construct the walk state and walk data to be preserved through the thread iteration.
- A call to
suspendAllPreemptiveinstalls a signal handler from which a call stack will be generated from, and records and suspends all threads in the process except the current thread.
- This list of threads is iterated starting with the current thread in
setupNativeThread, which allocates a container to store thread information produced by direct calls to
- For the rest of the threads, the user must call
omrintrospect_threads_nextDorepeatedly until it returns
NULL(signifying completion or error which resumes suspended threads).
setupNativeThread, but the
introspect_backtracefunctions are not called directly by the current thread. Instead, a signal is sent to the iterated thread, which invokes the signal handler installed previously that then calls the backtrace functions.
Backtracing using libunwind
libunwind is a library that is included in clang and available in MacOS 10.6 and later. It is an efficient C API that allows determining call stack information of the currently executing thread. This method of backtracing was chosen instead of the BSD
backtrace because it is thread and signal safe, and provides more flexibility when handling signal-handler frames – MacOS modifies the stack during a call to a signal handler function.
Here is an example of output produced by BSD
backtrace (note the missing frame(s) before the signal handler):
_sigtramp+0x1d (0x00007FFF6FA2E5FD [libsystem_platform.dylib+0x35fd]) (0x0000000000000000 [<unknown>+0x0]) monitor_wait+0x126e (0x000000000C9966CE [libj9thr29.dylib+0x76ce]) monitorWaitImpl+0x17b (0x000000000C67A54B [libj9vm29.dylib+0x7a54b]) _ZN32VM_BytecodeInterpreterCompressed3runEP10J9VMThread+0x171fc (0x000000000C6A2E5C [libj9vm29.dylib+0xa2e5c]) bytecodeLoopCompressed+0x8d (0x000000000C68BC4D [libj9vm29.dylib+0x8bc4d])
Here is the same native thread stack with
_sigtramp+0x1d (0x00007FFF6FA2E5FD [libsystem_platform.dylib+0x35fd]) __psynch_cvwait+0xa (0x00007FFF6F979882 [libsystem_kernel.dylib+0x3882]) _pthread_cond_wait+0x2ba (0x00007FFF6FA3A425 [libsystem_pthread.dylib+0x6425]) monitor_wait+0x126e (0x000000000719A6CE [libj9thr29.dylib+0x76ce]) monitorWaitImpl+0x17b (0x0000000006E7E54B [libj9vm29.dylib+0x7a54b]) _ZN32VM_BytecodeInterpreterCompressed3runEP10J9VMThread+0x171fc (0x0000000006EA6E5C [libj9vm29.dylib+0xa2e5c]) bytecodeLoopCompressed+0x8d (0x0000000006E8FC4D [libj9vm29.dylib+0x8bc4d])
The backtrace function
unw_backtrace(void **array, uintptr_t size) proceeds as follows:
- A call to
unw_context_tstructure. This saves a snapshot of the CPU state.
unw_init_local(&cursor, &uc)initializes an unwind cursor of type
unw_cursor_t. This cursor starts off pointing to the current frame (i.e. the caller) based on the
unw_context_tstructure initialized earlier.
- The cursor iterates up the call chain using repeated calls to
unw_step(&cursor); the instruction pointer (IP) is obtained by calling
unw_get_reg(&cursor, UNW_REG_IP, &ip)for each frame.
memcpyis used to save the cursor at the frame just iterated. The name of the final iterated frame can be checked using this cursor. If the stack walk is finished and the final frame iterated (the earliest frame in the call chain) is
_sigtramp, then special handling is required.
There are two segfault cases where
libunwind fails to capture frames past the signal handler
_sigtramp (the function that finds the real handler and calls it). The first case is when a call to an invalid address is made and the IP is invalid. The second is when a signal is received in kernel mode and the IP points to a glibc
syscall wrapper function, but the registers have been overwritten by the kernel. In either case,
_sigtramp will be the final frame recorded and register consistency must be restored to continue iterating the call stack. The method to restore the registers is platform dependent; for example, on x64, information for the frames prior to
_sigtramp is stored in a
ucontext_t structure in
rbx by the
_sigtramp procedure. Restoring the base, stack, and instruction pointers from this context allows
libunwind to continue with the stack walk.
Native stack produced by omrintrospect and libunwind
The native stacks obtained through
omrintrospect can be used to quickly determine the source of a crash and investigation can be delegated to the team responsible for the code.
omrintrospect can also be used to take a snapshot of the state of a program and provide useful information via the native stacks of all threads.
Below is an example native stack produced by
libunwind of a crashing thread from standard output. One can quickly tell that the source of the problem is the function
Java_jdk_internal_misc_ScopedMemoryAccess_closeScope0 from console output.
OpenJ9 stores the native stack traces in the
javacores. By default, segfaults will generate these diagnostic files. Additionally, the
-Xdump option can be used to specify certain events for generating a
javacore. For example,
-Xdump:java:events=user will generate a
javacore whenever the VM is sent the
SIGQUIT signal. This can be useful to generate diagnostic information when the java process hangs or is deadlocked.
omrintrospect library uses
libunwind on MacOS to generate native stack traces for all threads in a process. The
libunwind API comes preinstalled on MacOS 10.6 and later, and is simple and flexible, handling cases where alternative
backtrace methods cannot. Native stack information can be used to broadly identify the source of a crash or obtain call chain information during a particular state of execution of a program.
- For technical documentation on
libunwind, see https://www.nongnu.org/libunwind/man/libunwind(3).html.
- For more information on the
-Xdumpoption, see https://www.eclipse.org/openj9/docs/xdump.
omrintrospectlibrary’s design discussion and implementation on MacOS, see https://github.com/eclipse/omr/issues/3506 and https://github.com/eclipse/omr/pull/6267 respectively.