This is blog 4 in the OpenJ9 locking/synchronization series (blog 1). Before reading this blog, one needs to know about the system-monitors (blog 2), GC-spinlocks (blog 2), and object-monitors (blog 3). Please read the aforementioned blogs before reading this blog.
Three-tier spinning is used to make acquiring locks more efficient when the lock contention is low. Each tier has an incrementally larger wait-interval between the attempts to acquire the lock. The three tiers rely upon two kinds of locking structures/calls: lightweight and heavyweight. Initially, the lightweight structures/calls are used for a specific number of spins. When the spin is exhausted, then the heavyweight structures/calls are used which put the thread into a suspended state that does not utilize the CPU.
For the lightweight structures, atomic compare-and-swap (CAS) operations are used. For the heavyweight structures, synchronization primitives such as mutexes, condition variables and semaphores are used.
The three tiers are:
- Tier 1: yieldCPU + busy wait by decrementing a counter (nops).
- Tier 2: omrthread_yield.
- Tier 3: Block using the heavyweight structures.
The just-in-time (JIT) compiler can inline a quick check on the lightweight structures, and then fall-back on the VM to handle the contended cases, which is an effective approach.
Some threads may use the lightweight structures while other may use the heavyweight structures. So, lightweight and heavyweight structures need to interact in order to maintain a consistent state of the lock across all the threads.
There are two variants for the three tiers. The pseudocode for variant 1 is as follows:
three-tier-spin-variant-1: {
for (spin3)
for (spin2) {
if (lightweight_monitor_enter()) { goto success; }
yieldCPU(); /* CPU yield. */
for (spin1) { /* Busy wait, decrementing a counter. */ }
}
omrthread_yield(); /* Relinquish CPU. */
}
fail: heavyweight_monitor_enter();
success: lock_acquired;
}
The pseudocode for variant 2 is as follows:
three-tier-spin-variant-2: {
for (spin3) {
for (spin2) {
if (lightweight_monitor_enter()) { goto success; }
}
omrthread_yield(); /* Relinquish CPU. */
}
fail: heavyweight_monitor_enter();
success: lock_acquired;
}
Variant 1 is used by default in OpenJ9. Variant 2 does not have tier 1; this reduces CPU utilization by avoiding busy waiting; but, this also reduces the time spent for spinning. Overall, variant 2 case reduces resource utilization while preserving performance when the majority of locks have small hold times.
Also, there are two features to control the amount of spinning:
- Concurrency restriction: An upper bound is enforced on the number of threads allowed to spin on a lock simultaneously. One upper bound is used for all the locks. -Xthr:maxSpinThreads=<X> (command line option) can be used to change the upper bound. The upper bound can be removed/ignored by setting maxSpinThreads to 0. The default value of maxSpinThreads varies based upon the number of CPUs on a machine. The code to select the default maxSpinThreads value is shown here.
- Adaptive spinning: Enables/disables spinning for a lock based upon heuristics.
Control Parameters
The level of three-tier spin for the object and system monitors can be controlled by different command line options.
The options to control the level of spin are as follows:
LOCK TYPE | STATE | TIER | Command Line Option |
Object Monitor | Flat | Decrementing a counter. | spin1 |
Object Monitor | Flat | lightweight_monitor_enter; yieldCPU. | spin2 |
Object Monitor | Flat | omrthread_yield. | yield |
Object Monitor | Inflated | Decrementing a counter. | tryEnterSpin1 |
Object Monitor | Inflated | lightweight_monitor_enter; yieldCPU. | tryEnterSpin2 |
Object Monitor | Inflated | omrthread_yield. | tryEnterYield |
System-monitor | N/A | Decrementing a counter. | threeTierSpinCount1 |
System-monitor | N/A | lightweight_monitor_enter; yieldCPU. | threeTierSpinCount2 |
System-monitor | N/A | omrthread_yield. | threeTierSpinCount3 |
Object Monitor | Flat | Enable three-tier variant 1 [n] or variant 2 [noN] in spinOnFlatLock. | [n/noN]estedSpinning |
Object Monitor | Inflated | Enable three-tier variant 1 [t] or variant 2 [noT] in spinOnTryEnter. | [t/noT]ryEnterNestedSpinning |
Example:
java -Xthr:spin1=32,spin2=64,yield=8,...
The default spin values vary by platform. The default spin values are as follows:
Values | All but others listed | ZOS | AIX or Linux PPC | Linux with sched_compat_yield=0 |
spin1 | 256 | 1 | 96 | 256 or 96 (PPC) |
spin2 | 32 | 8 | 32 | 32 |
yield | 45 | 128 | 45 | 270 |
tryEnterSpin1 | 256 | 1 | 256 | 256 |
tryEnterSpin2 | 32 | 8 | 32 | 32 |
tryEnterYield | 45 | 128 | 45 | 270 |
threeTierSpinCount1 | 256 | 256 | N/A | 256 |
threeTierSpinCount2 | 32 | 32 | N/A | 32 |
threeTierSpinCount3 | 45 | 1 | N/A | 270 |
[n/noN]estedSpinning | on | off | on | on |
[t/noT]ryEnterNestedSpinning | on | off | on | on |
Code
- vmthread.c: Contains the code for parsing the above command line options for the spin control parameters.
Four locations where three-tier spinning is utilized:
- spinOnFlatLock (object-monitor; control parameters: spin1, spin2, yield, [n/noN]estedSpinning.)
- spinOnTryEnter (object-monitor; control parameters: tryEnterSpin1, tryEnterSpin2, tryEnterYield, [t/noT]ryEnterNestedSpinning.)
- omrthread_spinlock_acquire (system-monitor; control parameters: threeTierSpinCount1, threeTierSpinCount2, threeTierSpinCount3; non-nested spinning is unavailable.)
- omrgc_spinlock_acquire (GC-spinlock; no control parameters; fixed spin counts; non-nested spinning is unavailable.)
Other important components:
Suggested Readings
The next blog in the OpenJ9 locking/synchronization series covers OpenJ9’s adaptive spinning (blog 5) strategy, which dynamically enables/disables lock spinning using heuristics in order to avoid the negative impact of lock spinning.
[Note] Not all the command line options, mentioned above, may have customer support. Some options are only available for experimental work.
4 Replies to “Three-Tier Spinning”