Ahead Of Time Compilation: Validation

This post finally gets to the much alluded to topic of Validation. As mentioned in a previous post, Validation is one of the two actions the JVM must perform to generate and execute AOT code; the other is Relocation, which is described here.

What is Validation?

In the context of AOT, Validation is the process of ensuring that an AOT compiled method that is loaded from the Shared Classes Cache (SCC) is compatible with the environment of the JVM performing the load; as mentioned previously, the JVM instance in which the method is loaded is, in general, not the same instance in which the method was compiled. Validations are performed before Relocations; a failed validation can imply1 that the method being validated is not going to be compatible in the current JVM instance and so the performing relocation would be:

  1. unnecessary, since the code isn’t going to be executed
  2. unsafe, as it might bring down the JVM trying to query something using invalid indices (e.g. a constant pool).

Thus, if validation succeeds, the compiler continues with performing relocation; if validation fails, the compiler aborts the AOT load and proceeds to do a regular JIT compilation.

How to perform Validation?

At a high level, there are three main phases of AOT Validation.

AOT Header Validation

The very first validation the compiler performs is an AOT Header Validation. This is a one-time validation that ensures that certain basic properties of the environment of the current JVM instance (that’s loading from the SCC) are consistent with the environment of the JVM that populated the SCC. This involves verifying that the JVM version, the GC policy, etc., as well as the Feature Flags, are the same. The Feature Flags are used to indicate which aspects of the environment affect compiled code. The AOT Header Validation also ensures that the AOT code to be loaded is compatible with the processor that the current JVM instance is running on. If you’re interested in the specifics of the AOT Header validation, take a look at the code here.

AOT Method Header Validation

The first per-method validation the compiler performs is the AOT Method Header Validation. The validation is performed by comparing the flags set in TR_AOTMethodHeader structure. This is used to ensure that certain properties of the environment that only affected the method being loaded still hold2. If you’re interested in the specifics of the AOT Method Header validation, take a look at the code here.

Validation Records

The majority of the validation is performed using Validation Records. These are implemented using Relocation Records. Thus, one creates Validation Records in the manner one creates Relocation Records. The actual validation is performed in the applyRelocation methods (called by TR_RelocationRecord::applyRelocationAtAllOffsets – refer to Ahead Of Time Compilation: Relocation for a refresher) defined by the child class of TR_RelocationRecord that provides the APIs to do the relocation, or, in the case of Validation Records, the validation.

So what’s so complicated about all this?

Having read the previous sections, it doesn’t seem like there should be anything particularly complicated about validations; they are, after all, basically the same as Relocation Records right? Well, the complexity and subtlety comes entirely from the dynamic nature of Java.

Class Loading

Java has the notion of Class Loading and Unloading. All classes are loaded by a Class Loader. The primordial Java Classes are loaded by the Bootstrap Class Loader in the VM3. Thus, depending on the class loader used, a Class can be different either in its shape, or its inheritance, or in the interfaces it implements.

Multiple Class Loaders

Java also supports multiple class loaders. This means that two different class loaders can load the same .class file, but they result in two different Java Classes. Thus if one was to instantiate an object of type A_Loaded_By_CL1 and tried to check if it was an instanceof A_Loaded_By_CL2, the answer would be false.

Takeaway

Thus, when we say we need to validate a method prior to relocating and executing it, we mean that we need to ensure that all classes, methods, fields, etc. that are both referred to in the Java code and introspected by the compiler, are the same in current JVM’s environment as it was in the JVM’s environment where the compilation occurred.

Background

RAM Class vs ROM Class

A RAM Class completely describes a loaded Java Class in a given JVM instance. A ROM Class roughly corresponds to the contents of a .class file; it is not a complete class because it only references its superclass and implemented interfaces by name4. The JVM uses the SCC for class sharing; it does so by storing the ROM Class in the SCC. This is relevant for AOT because in order to talk about a class in different instances, the compiler needs to be able to “remember” the class. That’s where Class Chains come into play.

Class Chains

The way the OpenJ9 compiler figures out whether the shape of a class is the same in one instance as it was in the instance where the method being loaded was compiled, is by using something called Class Chains. Because a RAM Class is a hierarchy of ROM Classes, a Class Chain is essentially a chain of ROM Classes that gets stored to the SCC. During an AOT compilation, the compiler will store the class chains of all classes it acquired via various queries of the JVM environment. During an AOT load, the compiler can use the stored class chain to verify against the class chain of a candidate RAM Class; if the class chain is the same, then the compiler can be certain the shape of the class is the same. You can read more about Class Chains here.

How to validate?

With that background, we are finally able to talk about how the compiler performs validation during an AOT load.

Traditional Validation

Traditional validation refers to the method of validation used up until Openj9 v0.11.05. The Validation Record generated would contain not only the information required to materialize the class in a different instance, but also the class chain for the class being validated. This ensures that 1. the compiler is able to materialize the class, and 2. the class has the same shape it did in the JVM instance where the method was compiled. There are three main types of validation records generated.

Instance and Static Fields

Instance Field Validations and Static Field Validations are used to ensure that the assumptions the compiler makes during an AOT compilation regarding the static or instance field of a class are consistent between JVM instances.

Class From Constant Pool

Class From Constant Pool Validations are used to ensure that a class acquired from the constant pool of some other class is consistent between JVM instances.

Arbitrary Classes

Arbitrary Class Validations are used to validate classes used in situations where the class was acquired arbitrarily, for example if the compiler acquired the class via profiling information. This validation is a bit more involved; though the compiler can verify that the shape is the same, it isn’t able to ensure that the candidate RAM Class in the current environment is the same as in the JVM instance where the method was compiled. However, because arbitrary classes are only used6 in scenarios like the inlining of some profiled class’ implementation of some method, choosing the wrong7 class when relocating the guard that surrounds this inlined code results in functionally correct but inefficient code (because the inlined code will not execute as frequently).

The compiler tries to guess what the best candidate class is by using the Class Loader that loaded the arbitrary class being validated. However, Class Loaders are themselves Java Objects, which cannot be persisted. Thus, the Validation Record for an Arbitrary Class contains not just the class chain of the class being validated, but also the class chain of the first class loaded by the class loader that loaded the class being validated. Perhaps an example will make that twisted statement more straightforward:

If a Class Loader CL loads the classes A, B, C, and D (in that order), and the compiler decides to inline D‘s implementation of some method, then an Arbitrary Class Validation for D will contain 1. the class chain of D and 2. the class chain of A.

The motivation is that the order in which classes are loaded is unlikely to change from run to run. For a more in depth description, take a look at the documentation for Class Chains.

Symbol Validation Manager

The Symbol Validation Manager (SVM) is the method of validation used from Openj9 v0.11.05 onward. As described in the documentation, it addresses the issue of determining the provenance of any class, method, or (more generally) opaque symbol acquired from the JVM environment by making the answer to the question “where does this whatever come from?” explicit and unambiguous.

The SVM assigns IDs to the value returned by some query made to the JVM environment. These IDs are environment agnostic, and so are easily added into the Validation Records. The way the compiler interprets these IDs depends on the Validation Record. The gist of the process is:

  1. The SVM assigns IDs to the value returned by a JVM environment query, unless one has already been assigned to it; the SVM maintains a mapping between symbols and IDs.
  2. In the compile run, the compiler generates Validation Records, which generally contain the ID of the symbol to be validated along with other IDs that correspond to the data that is needed (by the JVM environment query) in order to acquire the value being validated.
  3. In the load run, as the compiler goes through the Validation Records, the SVM materializes the symbol based on the information in the Validation Record.
    • If the symbol has never been seen before, the SVM assumes that the symbol is valid, and associates the ID being validated with the symbol.
    • If the symbol has been seen before, then the SVM checks to see if the ID from the Validation Record matches the ID that has already been associated with the symbol; a mismatch implies an inconsistent environment and the compilation is aborted.

The reason this approach works is because the validation process effectively repeats, in the same chronological order as in the AOT compilation, the queries the compiler performed during the AOT compilation. A mismatch means that the same query returned different answers in the AOT compile and AOT load runs, which means that the method being loaded is not compatible in the current environment.

The SVM is not a replacement of Traditional Validation, but rather a refinement. Class Chains still play a critical role, and Profiled Class Validations (which are validations for classes acquired from profiling information) are essentially the same as Arbitrary Class Validations. However, by making the provenance of any symbol explicit, the SVM makes it possible for anything that can be done in a JIT compilation to be done in an AOT compilation. For a full understanding of how the SVM works, take a look at the documentation. For a list of validation records, take a look at this.

Conclusion

As promised, AOT Validations are subtle and complicated. However, hopefully this post has given you a better appreciation and understanding of how Validations are performed in OpenJ9, as well as a glimpse into what any dynamic language runtime might require in order to implement AOT Compilation.


1. A failed validation does not necessarily mean that the method being validated is never going to be compatible in the current JVM instance. If some Java code has never run before, Classes or Constant Pool entries might not be initialized or resolved (respectively); if, for example, a Validation Record involves getting the J9Class of some Java Class via some query, and the clinit of that Java Class hasn’t run yet, the validation would fail since the NULL returned by the query isn’t descriptive – NULL could mean a the class doesn’t exist in the current instance or that it hasn’t been initialized yet. However, in the future, that query could very well pass, meaning that if the validation had occurred in the future, it would have likely succeeded.

2. Technically, most (or all) of the AOT Method Header validations could be performed using Validation Records; the reason for the distinction is mainly because of legacy code. Anyone interested in converting these validations into Validation Records (before I eventually (fingers crossed) get around to them) is more than welcome to do so.

3. This is only true in Java 8 and before. From Java 9 onward, java.* classes can be loaded by the bootstrap, platform, or application loader.

4. https://github.com/eclipse/openj9/blob/openj9-0.11.0/doc/compiler/aot/AOTClassChains.md#background

5. Actually, at the time of this post, for AOT compilations during Application Startup or with optimization level cold, Traditional Validation is still performed. This is because validation using the Symbol Validation Manager results in many more Validation Records generated; for an application using a small SCC, this can result in fewer AOT methods stored impacting startup time. However, this is being actively worked on so that AOT Validations are consistent everywhere.

6. Actually, prior to the Symbol Validation Manager, Arbitrary Class Validations were (incorrectly) used for classes acquired by name, as well as a whole range of other queries. This incorrectness is part of the reason the Symbol Validation Manager was created.

7. Wrong here means the class chosen has the same shape, but is not the same as the class chosen during the compilation; this can happen if multiple class loaders load the same class.

One comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s