How to perftest a model

For each model running inside Firefox, we want to determine its performance in terms of speed and memory usage and track it over time.

To do so, we use the Perfherder infrastructure to gather the performance metrics.

Adding a new performance test is done in two steps: 1. making it work locally 2. add it in perfherder

Run locally

To test the performance of a model, you can add in the tests/browser a new file with the following structure and adapt it to your needs:

"use strict";

// unfortunately we have to write a full static structure here
// see https://bugzilla.mozilla.org/show_bug.cgi?id=1930955
const perfMetadata = {
  owner: "GenAI Team",
  name: "ML Test Model",
  description: "Template test for latency for ml models",
  options: {
    default: {
      perfherder: true,
      perfherder_metrics: [
        { name: "pipeline-ready-latency", unit: "ms", shouldAlert: true },
        { name: "initialization-latency", unit: "ms", shouldAlert: true },
        { name: "model-run-latency", unit: "ms", shouldAlert: true },
        { name: "pipeline-ready-memory", unit: "MB", shouldAlert: true },
        { name: "initialization-memory", unit: "MB", shouldAlert: true },
        { name: "model-run-memory", unit: "MB", shouldAlert: true },
        { name: "total-memory-usage", unit: "MB", shouldAlert: true },
      ],
      verbose: true,
      manifest: "perftest.toml",
      manifest_flavor: "browser-chrome",
      try_platform: ["linux", "mac", "win"],
    },
  },
};

requestLongerTimeout(10);

add_task(async function test_ml_generic_pipeline() {
  const options = {
    taskName: "feature-extraction",
    modelId: "Xenova/all-MiniLM-L6-v2",
    modelHubUrlTemplate: "{model}/{revision}",
    modelRevision: "main",
  };

  const args = ["The quick brown fox jumps over the lazy dog."];
  await perfTest("example", options, args);
});

Then add the file in perftest.toml and rebuild with ./mach build.

The test downloads models it uses from the local disk, so you need to prepare them.

We provide a script to automate this.

$ mach python toolkit/components/ml/tests/tools/create_local_hub.py --list-models
Available git-based models from the YAML:

- xenova-all-minilm-l6-v2 -> path-prefix: onnx-models/Xenova/all-MiniLM-L6-v2/main/
- mozilla-ner -> path-prefix: onnx-models/Mozilla/distilbert-uncased-NER-LoRA/main/
- mozilla-intent -> path-prefix: onnx-models/Mozilla/mobilebert-uncased-finetuned-LoRA-intent-classifier/main/
- mozilla-autofill -> path-prefix: onnx-models/Mozilla/tinybert-uncased-autofill/main/
- distilbart-cnn-12-6 -> path-prefix: onnx-models/Mozilla/distilbart-cnn-12-6/main/
- qwen2.5-0.5-instruct -> path-prefix: onnx-models/Mozilla/Qwen2.5-0.5B-Instruct/main/
- mozilla-smart-tab-topic -> path-prefix: onnx-models/Mozilla/smart-tab-topic/main/
- mozilla-smart-tab-emb -> path-prefix: onnx-models/Mozilla/smart-tab-embedding/main/

(Use `--model <key>` to clone one of these repositories.)

You can then use –model to download locally models, by specifying the local MOZ_ML_LOCAL_DIR directory, via the env var or command line argument :

$ mach python toolkit/components/ml/tests/tools/create_local_hub.py --model mozilla-smart-tab-emb --fetches-dir ~/ml-fetches
Found existing file /Users/tarekziade/Dev/fetches/ort-wasm-simd-threaded.jsep.wasm, verifying checksum...
Existing file's checksum matches. Skipping download.
Updated Git hooks.
Git LFS initialized.
Cloning https://huggingface.co/Mozilla/smart-tab-embedding into '/Users/tarekziade/Dev/fetches/onnx-models/Mozilla/smart-tab-embedding/main...
Cloning in '/Users/tarekziade/Dev/fetches/onnx-models/Mozilla/smart-tab-embedding/main'...
Checked out revision '2278e76f67ada584cfd3149fd2661dad03674e4d' in '/Users/tarekziade/Dev/fetches/onnx-models/Mozilla/smart-tab-embedding/main'.

Once done, you should then be able to run it locally with :

$ MOZ_ML_LOCAL_DIR=~/ml-fetches ./mach perftest toolkit/components/ml/tests/browser/browser_ml_engine_perf.js --mochitest-extra-args=headless

Notice that MOZ_ML_LOCAL_DIR is an absolute path to the root directory.

Add in the CI

To add the test in the CI you need to add an entry in

taskcluster/kinds/perftest/linux.yml
taskcluster/kinds/perftest/windows11.yml
taskcluster/kinds/perftest/macos.yml

With a unique name that starts with ml-perf

Example for Linux:

ml-perf:
    fetches:
        fetch:
            - ort.wasm
            - ort.jsep.wasm
            - ort-training.wasm
            - xenova-all-minilm-l6-v2
    description: Run ML Models Perf Tests
    treeherder:
        symbol: perftest(linux-ml-perf)
        tier: 2
    attributes:
        batch: false
        cron: false
    run-on-projects: [autoland, mozilla-central]
    run:
        command: >-
            mkdir -p $MOZ_FETCHES_DIR/../artifacts &&
            cd $MOZ_FETCHES_DIR &&
            python3 python/mozperftest/mozperftest/runner.py
            --mochitest-binary ${MOZ_FETCHES_DIR}/firefox/firefox-bin
            --flavor mochitest
            --output $MOZ_FETCHES_DIR/../artifacts
            toolkit/components/ml/tests/browser/browser_ml_engine_perf.js

You also need to add the models your test uses (like the ones you’ve downloaded locally) by adding entries in taskcluster/kinds/fetch/onnxruntime-web-fetch.yaml and adapting the fetches section.

Once this is done, try it out with:

$ ./mach try perf --single-run --full --artifact

You should then see the results in treeherder.