Machine Learning

This component is an experimental machine learning local inference engine based on Transformers.js and the ONNX runtime.

In the example below, an image is converted to text using the image-to-text task.

const {PipelineOptions, EngineProcess } = ChromeUtils.importESModule("chrome://global/content/ml/EngineProcess.sys.mjs");

// First we create a pipeline options object, which contains the task name
// and any other options needed for the task
const options = new PipelineOptions({taskName: "image-to-text" });

// Next, we create an engine parent object via EngineProcess
const engineParent = await EngineProcess.getMLEngineParent();

// We then create the engine object, using the options
const engine = engineParent.getEngine(options);

// At this point we are ready to do some inference.

// We need to get the image as an array buffer and wrap it into a request object
const response = await fetch("https://huggingface.co/datasets/mishig/sample_images/resolve/main/football-match.jpg");
const buffer = await response.arrayBuffer();
const mimeType = response.headers.get('Content-Type');
const request = {
  data: buffer,
  mimeType: mimeType
};

// Finally, we run the engine with the request object
const res = await engine.run(request);

// The result is a string containing the text extracted from the image
console.log(res);

Supported Inference Tasks

The following tasks are supported by the machine learning engine:

imageToText(request, model, tokenizer, processor)

Converts an image to text using a machine learning model.

Arguments:
  • request (object) – The request object containing image data.

  • request.imageUrl (string) – The URL of the image to process. Either imageUrl or data must be provided, but not both.

  • request.data (ArrayBuffer) – The raw image data to process. Either data or imageUrl must be provided, but not both.

  • request.mimeType (string) – The MIME type of the image data.

  • model (object) – The model used for inference.

  • tokenizer (object) – The tokenizer used for decoding.

  • processor (object) – The processor used for preparing image data.

Returns:

Promise.<object> – The result object containing the processed text.