Models management
Prepare a model for Firefox
Models that can be used with Firefox should have ONNX weights at different quantization levels.
In order to make sure we are compatible with Transformers.js, we use the conversion script provided by that project, which checks that the model arhitecture will work and has been tested.
To do this, follow these steps:
make sure your model is published in Hugging Face with PyTorch or SafeTensor weights.
clone https://github.com/xenova/transformers.js and checkout branch v3
go into scripts/
create a virtualenv there and install requirements from the local requirements.txt file
Then you can run:
python convert.py --model_id organizationId/modelId --quantize --modes fp16 q8 q4 --task the-inference-task
You will get a new directory in models/organizationId/modelId that includes an onnx directory and other files. Upload everything into Hugging Face.
Congratulations! you have a Firefox-compatible model. You can now try it in about:inference.
Notice that for the encoder-decoder models with two files, you may need to rename decoder_model_quantized.onnx to decoder_model_merged_quantized.onnx, and make similar changes to the fp16, q4 versions. You do not need to rename the encoder models.
Lifecycle
When Firefox uses a model, it will
read metadata stored in Remote Settings
download model files from our hub
store the files in IndexDB
1. Remote Settings
We have two collections in Remote Settings:
ml-onnx-runtime: provides all the WASM files we need to run the inference platform.
ml-inference-options: provides for each taskId a list of running options, such as the modelId.
Running the inference API will download the WASM files if needed, and then see if there’s an entry for the task in ml-inference-options, to grab the options. That allows us to set the default running options for each task.
This is also how we can update a model without changing Firefox’s code: setting a new revision for a model in Remote Settings will trigger a new download for our users.
Records in ml-inference-options are uniquely identified by featureId. When not provided, it falls back to taskName. This collection will provide all the options required for that feature.
For example, the PDF.js image-to-text record is:
{
"featureId": "pdfjs-alt-text"
"dtype":"q8",
"modelId":"mozilla/distilvit",
"taskName":"image-to-text",
"processorId":"mozilla/distilvit",
"tokenizerId":"mozilla/distilvit",
"modelRevision":"v0.5.0",
"processorRevision":"v0.5.0"
}
If you are adding in Firefox a new inference call, create a new unique featureId in FEATURES and add a record in ml-inference-options with the task settings.
By doing this, you will be able to create an engine with this simple call:
const engine = await createEngine({featureId: "pdfjs-alt-text"});
2. Model Hub
Our Model hub follows the same structure than Hugging Face, each file for a model is under a unique URL:
https://model-hub.mozilla.org/<organization>/<model>/<revision>/<path>
Where: - organization and name are the model id. example “ mozilla/distivit” - revision is the branch or version - path is the path to the file.
Model files downloaded from the hub are stored in IndexDB so users don’t need to download them again.
Model files
Models consists of several files like its configuration, tokenizer, training metadata, and weights.
Below are the most common files you’ll encounter:
1. Model Weights
pytorch_model.bin
: Contains the model’s weights for PyTorch models. It is a serialized file that holds the parameters of the neural network.tf_model.h5
: TensorFlow’s version of the model weights.flax_model.msgpack
: For models built with the Flax framework, this file contains the model weights in a format used by JAX and Flax.onnx
: A subdirectory containing ONNX weights files in different quantization levels. They are the one our platform uses
2. Model Configuration
The config.json
file contains all the necessary configurations for the model architecture,
such as the number of layers, hidden units, attention heads, activation functions, and more.
This allows the Hugging Face library to reconstruct the model exactly as it was defined.
3. Tokenizer Files
vocab.txt
orvocab.json
: Vocabulary files that map tokens (words, subwords, or characters) to IDs. Different tokenizers (BERT, GPT-2, etc.) will have different formats.tokenizer.json
: Stores the full tokenizer configuration and mappings.tokenizer_config.json
: This file contains settings that are specific to the tokenizer used by the model, such as whether it is case-sensitive or the special tokens it uses (e.g., [CLS], [SEP], etc.).
4. Preprocessing Files
special_tokens_map.json
: Maps the special tokens (like padding, CLS, SEP, etc.) to the token IDs used by the tokenizer.added_tokens.json
: If any additional tokens were added beyond the original vocabulary (like custom tokens or domain-specific tokens), they are stored in this file.
5. Training Metadata
training_args.bin
: Contains the arguments that were used during training, such as learning rates, batch size, and other hyperparameters. This file allows for easier replication of the training process.trainer_state.json
: Captures the state of the trainer, such as epoch information and optimizer state, which can be useful for resuming training.optimizer.pt
: Stores the optimizer’s state for PyTorch models, allowing for a resumption of training from where it left off.
6. Model Card
README.md
or model_card.json
. The model card provides documentation about the model, including details about its intended use, training data, performance metrics, ethical considerations, and any limitations. This can either be a README.md
or structured as a model_card.json
.
7. Tokenization and Feature Extraction Files
merges.txt
: For byte pair encoding (BPE) tokenizers, this file contains the merge operations used to split words into subwords.preprocessor_config.json
: Contains configuration details for any pre-processing or feature extraction steps applied to the input before passing it to the model.
Versioning
The revision field is used to determine what version of the model should be downloaded from the hub. You can start by serving the main branch but once you publish your model, you should start to version it.
The version scheme we use is pretty loose. It can be can be main or a version following a extended semver:
[v]MAJOR.MINOR[.PATCH][.(alpha|beta|pre|post|rc|)NUMBER]
We don’t provide any sorting function.
Examples:
v1.0
v2.3.4
1.2.1
1.0.0-beta1
1.0.0.alpha2
1.0.0.rc1
To version a model, you can push a tag on Hugging Face using git tag v1.0 && git push –tags and on the GCP bucket, create a new directory where you copy the model files.