Inline caches / CacheIR

CacheIR is the core of our speculative optimization strategy. It is a simple linear bytecode with only a few types of instructions:

Guards, which verify some property. If a guard fails, the entire stub fails. Examples: GuardShape, GuardToObject
Simple idempotent operations with no side-effects. Examples: LoadObject, LoadWrapperTarget
Result operations, which return a result value, and are the only operations that can have side-effects. There can only be one result operation in a CacheIR sequence, and it must be the last operation.

CacheIR ops have three different kinds of arguments, each of which comes in a variety of types:

Ids (eg ValueId, ObjId, NumberId): either an IC input or the output of a previous CacheIR op. The value is unknown until the IC executes.
Fields (eg ObjectField, StringField, WeakShapeField): a value that is fixed for this stub, but will usually vary between ICs: for example, the shape of a GuardShape, or the slot index of a LoadFixedSlotResult. The values of these fields are stored separately from the CacheIR sequence itself. The value of a field is known, but we may choose not to bake it in to improve code sharing. See below.
Imms (eg BoolImm, Int32Imm): An immediate that is baked directly into the CacheIR. The value of this field is known at compile time.

In baseline, opcodes with dynamic behaviour (property access, arithmetic, calls, etc…) simply generate a call into an IC chain. Initially, the only entry in an IC chain is a fallback stub. The fallback stub will call into C++ code, which in turn will do two things:

Perform the required operation, using the same code as the C++ interpreter
Using the appropriate CacheIRGenerator for this op, try to generate a CacheIR sequence that can handle the current inputs. If this succeeds, we will use the BaselineCacheIRCompiler to create an IC stub, and attach that stub at the beginning of the IC chain, meaning that the next time this IC is reached, we will try that stub before calling into the fallback. For example, a JSOp::GetProp IC might generate the CacheIR sequence GuardToObject / GuardShape / LoadFixedSlotResult, in which case the BaselineCacheIRCompiler will generate a stub that will guard that the input is an object of the correct shape, then load a property out of a particular fixed slot and return it.

Each fallback stub tracks the state of the corresponding IC chain: Specialized, Megamorphic, or Generic. ICs start in the Specialized state. After attaching 6 ICs, the fallback code will throw away existing ICs and transition to the next state. We also transition if we repeatedly fail to attach any stub. CacheIRGenerators will generate more general code for the megamorphic state, which is generally less efficient but will succeed on a larger variety of inputs. In Generic mode, we will stop trying to attach ICs entirely. Getting stuck in Generic mode is usually a sign that we lack CacheIR support for some operation; if it is happening in hot code that we care about, we generally want to fix that. In some places in CacheIRGenerators, we also check isFirstStub, so that we can emit even more optimistic CacheIR if this is the first time we’ve tried to attach a stub here. Our goal is to attach the most precise/efficient stub that handles all inputs that we see.

The first operation in every IC stub is to increment its entry count. We reset entry counts for existing stubs when prepending a new stub to the chain. Therefore, when WarpOracle is creating a snapshot, if we see an IC chain where the second stub has entry count 0, we know that the most recently attached stub has always succeeded. (Note that the second stub will often be the fallback stub.) In this case, WarpOracle speculatively assumes that the CacheIR sequence associated with that stub will continue to succeed, and take a CacheIRSnapshot of that stub. WarpBuilder will use WarpCacheIRTranspiler to turn the CacheIR for that stub into MIR, effectively inlining that stub into the bytecode.

This design allows a single source of truth for most of our speculative/dynamic optimizations: a CacheIRGenerator can specify the correctness conditions for an operation in a single place, and the rest of the pipeline only needs to understand each op in isolation. It also allows for mechanical rewriting of CacheIR sequences: see, for example, “Stub Folding”, which folds multiple stubs with the same CacheIR that only differ in a single GuardShape stub field together into a single new stub using a GuardMultipleShape op.

In baseline ICs, stub fields are stored in the ICStub itself (along with a pointer to the jitcode, a pointer to the next stub, the entry count, and so on). This means that we can generate a single piece of jitcode for a given CacheIR sequence and reuse it with many different values of stub fields, significantly reducing our compilation overhead. Ion ICs bake stub field values directly into the generated code; this theoretically allows us to generate faster code, but is probably a mistake. In the long run, we would like to share Ion ICs with the same CacheIR in the same way as baseline ICs.

CacheIR ops are defined in CacheIROps.yaml, which is consumed by GenerateCacheIRFiles.py to generate all of the per-op boilerplate. Adding a new CacheIR op only requires a) changing the yaml file, b) implementing the op in either CacheIRCompiler.cpp (shared) or BaselineCacheIRCompiler + IonCacheIRCompiler.cpp (non-shared), c) implementing the op in WarpCacheIRTranspiler.cpp, and d) using the op in a CacheIRGenerator in CacheIR.cpp. If the op can’t be implemented using existing LIR/MIR nodes, it may also require changes in MIROps.yaml, Lowering.cpp, and CodeGenerator.cpp. If the generated jitcode is sufficiently complicated, it may be worthwhile to define it once in a helper in MacroAssembler.cpp and use it in the CacheIRCompiler / CodeGenerator code.