Files files are the mechanism by which tree metadata (notably the build configuration) is defined.

Directories in the tree contain files which declare functionality for their respective part of the tree. This includes things such as the list of C++ files to compile, where to find tests, etc. files are actually Python scripts. However, their execution is governed by special rules. This is explained below. Python Sandbox

As mentioned above, files are Python scripts. However, they are executed in a special Python sandbox that significantly changes and limits the execution environment. The environment is so different, it’s doubtful most files would execute without error if executed by a vanilla Python interpreter (e.g. python

The following properties make execution of files special:

  1. The execution environment exposes a limited subset of Python.

  2. There is a special set of global symbols and an enforced naming convention of symbols.

  3. Some symbols are inherited from previously-executed files.

The limited subset of Python is actually an extremely limited subset. Only a few symbols from __builtin__ are exposed. These include True, False, None, sorted, int, and set. Global functions like import, print, and open aren’t available. Without these, files can do very little. This is by design.

The execution sandbox treats all UPPERCASE variables specially. Any UPPERCASE variable must be known to the sandbox before the script executes. Any attempt to read or write to an unknown UPPERCASE variable will result in an exception being raised. Furthermore, the types of all UPPERCASE variables is strictly enforced. Attempts to assign an incompatible type to an UPPERCASE variable will result in an exception being raised.

The strictness of behavior with UPPERCASE variables is a very intentional design decision. By ensuring strict behavior, any operation involving an UPPERCASE variable is guaranteed to have well-defined side-effects. Previously, when the build configuration was defined in Makefiles, assignments to variables that did nothing would go unnoticed. files fix this problem by eliminating the potential for false promises.

After a file has completed execution, only the UPPERCASE variables are used to retrieve state.

The set of variables and functions available to the Python sandbox is defined by the mozbuild.frontend.context module. The data structures in this module are consumed by the mozbuild.frontend.reader.MozbuildSandbox class to construct the sandbox. There are tests to ensure that the set of symbols exposed to an empty sandbox are all defined in the context module. This module also contains documentation for each symbol, so nothing can sneak into the sandbox without being explicitly defined and documented.

Reading and Traversing Files

The process for reading files roughly consists of:

  1. Start at the root (<topsrcdir>/

  2. Evaluate the file in a new sandbox.

  3. Emit the main context and any sub-contexts from the executed sandbox.

  4. Extract a set of files to execute next.

  5. For each additional file, goto #2 and repeat until all referenced files have executed.

From the perspective of the consumer, the output of reading is a stream of mozbuild.frontend.reader.context.Context instances. Each Context defines a particular aspect of data. Consumers iterate over these objects and do something with the data inside. Each object is essentially a dictionary of all the UPPERCASE variables populated during its execution.


Historically, there was only one context per file. As the number of things tracked by files grew and more and more complex processing was desired, it was necessary to split these contexts into multiple logical parts. It is now common to emit multiple contexts per file.

Build System Reading Mode

The traditional mode of evaluation of files is what’s called build system traversal mode. In this mode, the CONFIG variable in each sandbox is populated from data coming from config.status, which is produced by configure.

During evaluation, files often make decisions conditional on the state of the build configuration. e.g. only compile foo.cpp if feature X is enabled.

In this mode, traversal of files is governed by variables like DIRS and TEST_DIRS. For example, to execute a child directory, foo, you would add DIRS += ['foo'] to a file and foo/ would be evaluated.

Filesystem Reading Mode

There is an alternative reading mode that doesn’t involve the build system and doesn’t use DIRS variables to control traversal into child directories. This mode is called filesystem reading mode.

In this reading mode, the CONFIG variable is a dummy, mostly empty object. Accessing all but a few special variables will return an empty value. This means that nearly all if CONFIG['FOO']: branches will not be taken.

Instead of using content from within the evaluated file to drive traversal into subsequent files, the set of files to evaluate is controlled by the thing doing the reading.

A single file is not guaranteed to be executable in isolation. Instead, we must evaluate all parent files first. For example, in order to evaluate /foo/, one must execute / and have its state influence the execution of /foo/

Filesystem reading mode is utilized to power the Files Metadata feature.

Technical Details

The code for reading files lives in mozbuild.frontend.reader. The Python sandboxes evaluation results (mozbuild.frontend.context.Context) are passed into mozbuild.frontend.emitter, which converts them to classes defined in Each class in this module defines a domain-specific component of tree metadata. e.g. there will be separate classes that represent a JavaScript file vs a compiled C++ file or test manifests. This means downstream consumers of this data can filter on class types to only consume what they are interested in.

There is no well-defined mapping between file instances and the number of classes derived from each. Depending on the content of the file, there may be 1 object derived or 100.

The purpose of the emitter layer between low-level sandbox execution and metadata representation is to facilitate a unified normalization and verification step. There are multiple downstream consumers of the data and many will perform the same actions. This logic can be complicated, so we have a component dedicated to it.

mozbuild.frontend.reader.BuildReader` and mozbuild.frontend.reader.TreeMetadataEmitter` have a stream-based API courtesy of generators. When you hook them up properly, the classes are emitted before all files have been read. This means that downstream errors are raised soon after sandbox execution.

Lots of the code for evaluating Python sandboxes is applicable to non-Mozilla systems. In theory, it could be extracted into a standalone and generic package. However, until there is a need, there will likely be some tightly coupled bits.