WebExtensions AI API ==================== .. note:: The extension developer is responsible to comply with `Mozilla's add-on policies <https://extensionworkshop.com/documentation/publish/add-on-policies/>`_ as well as regulatory rules when providing AI features, such as the `EU AI Act <https://www.europarl.europa.eu/thinktank/en/document/EPRS_BRI(2021)698792>`_. The Firefox AI Platform API can be used from web extensions via a trial API we've added in 134. This API is enabled by default in Nightly. For Beta and Release, toggle the following flags in `about:config`: - `browser.ml.enable` → true - `extensions.ml.enabled` → true WebExtensions that use the `trialML` optional permission will be able to use the API. The permission is added to your manifest.json file as follows: .. code-block:: json { "optional_permissions": ["trialML"], } The WebExtensions inference API wraps the Firefox AI API and comes in four endpoints under the `browser.trial.ml` namespace: - **createEngine**: creates an inference engine. - **runEngine**: runs an inference engine. - **onProgress**: listener for engine events - **deleteCachedModels**: delete model(s) files Below is a full example of using the engine to summarize a content: .. code-block:: javascript // 1. Initialize the event listener browser.trial.ml.onProgress.addListener(progressData => { console.log(progressData); }); // 2. Create the engine, may trigger downloads. await browser.trial.ml.createEngine({ modelHub: "huggingface", taskName: "summarization", }); // 3. Call the engine const text = 'The tower is 324 metres (1,063 ft) tall, about the same height as an 81-storey building, ' + 'and the tallest structure in Paris. Its base is square, measuring 125 metres (410 ft) on each side. ' + 'During its construction, the Eiffel Tower surpassed the Washington Monument to become the tallest ' + 'man-made structure in the world, a title it held for 41 years until the Chrysler Building in New ' + 'York City was finished in 1930. It was the first structure to reach a height of 300 metres. Due to ' + 'the addition of a broadcasting aerial at the top of the tower in 1957, it is now taller than the ' + 'Chrysler Building by 5.2 metres (17 ft). Excluding transmitters, the Eiffel Tower is the second ' + 'tallest free-standing structure in France after the Millau Viaduct.'; const res = await browser.trial.ml.runEngine({ args: [text], }); // 4. Get the results. console.log(res[0]["summary_text"]); // 5. Delete the downloaded model files await browser.trial.ml.deleteCachedModels(); The `createEngine` call will trigger downloads in case the model files are not already cached in IndexDB. This means that the first call to `createEngine` may last for a while, which need to be taken into account when building the web extension. Subsequent calls will be much faster. Engine arguments ---------------- When calling that API, the object you pass to it can contain the following arguments (a subset of the arguments of the platform API): - **taskName**: The name of the task the pipeline is configured for. MANDATORY - **modelHub**: The model hub to use, can be huggingface or mozilla. When used, modelHubRootUrl and modelHubUrlTemplate are ignored. - **modelId**: The identifier for the specific model to be used by the pipeline. - **modelRevision**: The revision for the specific model to be used by the pipeline. - **tokenizerId**: The identifier for the tokenizer associated with the model, used for pre-processing inputs. - **tokenizerRevision**: The revision for the tokenizer associated with the model, used for pre-processing inputs. - **processorId**: The identifier for any processor required by the model, used for additional input processing. - **processorRevision**: The revision for any processor required by the model, used for additional input processing. - **dtype**: quantization level - **device**: device to use (wasm or gpu) Besides `taskName`, all other arguments are optional, and the API will pick sane defaults. Notice that model files can be very large, and it’s recommended to use quantized versions to reduce the size of the downloads. We also have not activated all tasks for this first version because we have not yet implemented a streaming API for the inference tasks, making it impractical to run tasks that run on audio, video or large amounts of data. Default models -------------- Below is a list of supported tasks and their default models that will be picked if you don't provide one. - **text-classification**: Xenova/distilbert-base-uncased-finetuned-sst-2-english - **token-classification**: Xenova/bert-base-multilingual-cased-ner-hrl - **question-answering**: Xenova/distilbert-base-cased-distilled-squad - **fill-mask**: Xenova/bert-base-uncased - **summarization**: Xenova/distilbart-cnn-6-6 - **translation**: Xenova/t5-small - **text2text-generation**: Xenova/flan-t5-small - **text-generation**: Xenova/gpt2 - **zero-shot-classification**: Xenova/distilbert-base-uncased-mnli - **image-to-text**: Mozilla/distilvit - **image-classification**: Xenova/vit-base-patch16-224 - **image-segmentation**: Xenova/detr-resnet-50-panoptic - **zero-shot-image-classification**: Xenova/clip-vit-base-patch32 - **object-detection**: Xenova/detr-resnet-50 - **zero-shot-object-detection**: Xenova/owlvit-base-patch32 - **document-question-answering**: Xenova/donut-base-finetuned-docvqa - **image-to-image**: Xenova/swin2SR-classical-sr-x2-64 - **depth-estimation**: Xenova/dpt-large - **feature-extraction**: Xenova/all-MiniLM-L6-v2 - **image-feature-extraction**: Xenova/vit-base-patch16-224-in21k Any model in Hugging Face that is compatible with Transformers.js should work. You can browse them using `this link <https://huggingface.co/models?library=transformers.js&sort=trending>`_. Once the engine is created, the `runEngine` API will execute. To know what arguments to pass to args and options, you can refer to the `Transformers.js documentation <https://huggingface.co/docs/transformers.js/index#tasks>`_. In practice, `args` is the first argument passed to the Transformers.js pipeline API, and `options` the second. So the example below: .. code-block:: javascript const gen = await pipeline('summarization', 'Xenova/distilbart-cnn-6-6'); const output = await gen(text, {max_new_tokens: 100}); Becomes: .. code-block:: javascript await browser.trial.ml.createEngine({ modelHub: "huggingface", taskName: "summarization", modelId: "Xenova/distilbart-cnn-6-6" }); const output = await browser.trial.ml.runEngine({ args: [text], }); Limitations ----------- This trial API comes with a few limitations. Beside restricting a few tasks, Firefox will not authorize web extensions to download any model that is not in our model hub, or in the organizations that are allowed in Hugging Face. The two blessed organizations in Hugging Face for now are `Mozilla <https://huggingface.co/Mozilla>`_ and `Xenova <https://huggingface.co/Xenova>`_ which provide over a thousand models to play with. We are planning to add more organizations in the future and provide a process for web extension developers to ask for their models to be added in our list. Extensions are also not able to run several engines in parallel to avoid resource conflicts. This means that if you want to run different tasks, it needs to be done in sequence. This limitation might be relaxed in the future as well. Last, but not least, if the device memory resources are getting too low, engine running in an extension might be deleted and an error will be thrown. Full example ------------ We've implemented a full example that leverages our `image-to-text model` to generate a caption on a right click. :ref:`See the README <Trial Inference API Extension Example>`.