Content Classifier Service

The Content Classifier Service (toolkit/components/content-classifier/) is the anti-tracking component that classifies network channels against adblock-format filter lists delivered through Remote Settings. It is a parallel classification path layered alongside the older URL Classifier and its safebrowsing-format hash tables: same set of features (trackers, social trackers, fingerprinters, cryptominers, email trackers, plus allow-list/exception features and dedicated test_block / test_annotate features), but driven by full adblock syntax rules evaluated by a Rust engine wrapping the adblock crate.

This page is a reference for how the service is wired up internally: where list bytes live, how they get turned into engines, how a channel classification request flows through it, and which invariants the code depends on.

Components

File

Role

nsIContentClassifierService.idl

XPCOM contract surfaced to JS: onListsChanged(updated, removed), getFeatureNames(), and the test-only NS_CONTENT_CLASSIFIER_FILTER_LISTS_LOADED_TOPIC observer notification that fires after every rebuild.

nsIContentClassifierRemoteSettingsClient.idl

JS-side contract: init, shutdown, getListBytes(listName).

ContentClassifierService.{h,cpp}

The singleton C++ service. Owns the feature table, the per-feature engine map, the four mode-keyed active-engine lists, the mutex, pref/Nimbus observers, async-shutdown blocker, and the build thread.

ContentClassifierRemoteSettingsClient.sys.mjs

Wraps the content-classifier-lists Remote Settings collection. Owns the on-disk attachment cache, registers a sync listener, and pulls bytes on demand.

content_classifier_engine/ (Rust crate)

Wraps the adblock crate (v0.12.1, full-regex-handling + single-thread features) behind a small FFI: engine_from_rules, check_network_request_preparsed, engine_destroy, plus init/teardown for an nsIEffectiveTLDService-backed domain resolver.

ContentClassifierEngine.{h,cpp}

Thread-safe refcounted C++ wrapper around the Rust FFI engine. Extracts request metadata from an nsIChannel-derived ContentClassifierRequest and calls into Rust.

components.conf, moz.build

Component registration and build setup (cbindgen generates content_classifier_ffi.h).

Features and prefs

The static kFeatures[] table (ContentClassifierService.cpp) is the single source of truth for which feature names exist, which Remote Settings list IDs roll up into each feature’s engine, and how matches are reported to the channel. Each entry carries:

  • mName — the identifier used in prefs.

  • mListIds — one or more Remote Settings record names whose attachments are concatenated into the feature’s engine rules.

  • mClassificationFlag — the nsIClassifiedChannel::ClassificationFlags bit set on the channel for an annotation match.

  • mLoadedState / mReplacedState / mAllowedStatensIWebProgressListener STATE_LOADED_* / STATE_REPLACED_* / STATE_ALLOWED_* values logged into the content blocking log. mLoadedState == 0 denotes an annotate-without-notify feature.

  • mBlockingErrorCodeNS_ERROR_*_URI passed to UrlClassifierCommon::SetBlockedContent for a cancellation; NS_OK means the feature has no blocking variant and is only ever an annotation.

  • mExceptionOnly — true if the feature contains only allowlist / exception rules. This means it must be last in a list of features. A console warning will yell at you for this.

Enable switches (per mode):

  • privacy.trackingprotection.content.protection.enabled

  • privacy.trackingprotection.content.annotation.enabled

Feature selection (comma-separated feature names):

  • privacy.trackingprotection.content.protection.engines

  • privacy.trackingprotection.content.protection.engines.pbmode

  • privacy.trackingprotection.content.annotation.engines

  • privacy.trackingprotection.content.annotation.engines.pbmode

Test-only lists fetched over HTTP (used by the test_block / test_annotate features so tests don’t need a live Remote Settings collection):

  • privacy.trackingprotection.content.protection.test_list_urls

  • privacy.trackingprotection.content.annotation.test_list_urls

All of the above prefs are mapped onto Nimbus feature variables in toolkit/components/nimbus/FeatureManifest.yaml.

Threading model

Three thread types appear in this code, and the rebuild and classify paths both deliberately move work between them:

  • Main thread. All init, pref observers, Remote Settings sync callbacks, and final channel-side decisions (MaybeCancelChannel, MaybeAnnotateChannel) run here.

  • mBuildThread (an nsISerialEventTarget task queue, created in Init). The CPU-heavy half of an engine rebuild runs here: Engine::from_rules calls (the actual adblock parser) happen with no lock held, and the lock-protected InstallEngine / PopulateAllActiveEnginesFromPreferenceSnapshot / PruneInactiveEngines steps run here too, just briefly under mLock.

  • URL-classifier worker thread. ClassifyForCancel and ClassifyForAnnotate run here, called from netwerk/url-classifier/AsyncUrlChannelClassifier.cpp. Both acquire mLock briefly to snapshot the active-engine list pointer and then release it before crossing the FFI.

The mozilla::Mutex mLock is non-recursive. Reacquiring it while already held will deadlock the calling thread. The header enforces this by:

  • Marking mInitPhase, mEngines, mFeatureVersions, mUpdateGeneration, and the four active-engine arrays as MOZ_GUARDED_BY(mLock).

  • Annotating InstallEngine, PopulateAllActiveEnginesFromPreferenceSnapshot, and PruneInactiveEngines with MOZ_REQUIRES(mLock).

  • Releasing mLock before any call into the engine FFI (so a long classification cannot stall a rebuild and vice versa).

You may be tempted to use a RWLock. This will give you less than you think because we really only have one classifying thread. Worse yet, I don’t remember if the engine lookup is threadsafe.

List load and engine rebuild

A rebuild is triggered by any of:

  • Initial InitRSClient() (first time the service sees an active RS feature).

  • A Remote Settings sync push (onSync in the JS client).

  • A pref change: master enable, an engines selection pref, or one of the test_list_urls prefs.

onListsChanged(updated, removed) on the main thread calls ProcessListChanges, which takes a fresh EnginesPrefsSnapshot of the current pref state, walks the active features named in that snapshot, and selects every feature that either has no engine yet or whose mListIds overlap updatedremoved. That set goes to UpdateFeatures.

UpdateFeatures (main thread) bumps mUpdateGeneration (global) and the per-feature mFeatureVersions entry for every feature it’s about to rebuild — both under mLock. It then fires FetchEngineDataForFeature to get the rule lists. The MozPromise<> returned by each fetch is collected via MozPromise::AllSettled; when all of them resolve, the collected rule arrays plus the captured generation and per-feature versions are dispatched onto mBuildThread.

On mBuildThread, with no lock held, we build the rule engines.

The same mBuildThread task then reacquires mLock and performs the install / populate / prune step under it:

  • For each freshly built engine, compare the captured per-feature version to the current mFeatureVersions entry. If a newer rebuild has been issued since this one was dispatched, the captured version is stale and the engine is dropped on the floor. Otherwise it’s stored into mEngines via InstallEngine.

  • After all installs, compare the captured mUpdateGeneration to the current one. Only if it’s still the latest do we run PopulateAllActiveEnginesFromPreferenceSnapshot (rebuild the four per-mode active-engine arrays from mEngines, in pref order) and PruneInactiveEngines (drop entries from mEngines not referenced by any active-engine array).

This versioning-and-recheck pattern is the safety invariant for concurrent rebuilds: two rebuilds racing through mBuildThread can never have the older one’s snapshot overwrite the newer one’s results, because the older one’s captured generation no longer matches by the time it tries to commit.

Finally a small task is dispatched back to the main thread to fire NS_CONTENT_CLASSIFIER_FILTER_LISTS_LOADED_TOPIC (test-only, gated on the privacy.trackingprotection.content.testing pref), which is how the browser tests await rebuild completion. These need to be debounced.

Channel classification

A channel classification request enters from netwerk/url-classifier/AsyncUrlChannelClassifier.cpp on the URL-classifier worker thread. The caller has already constructed a ContentClassifierRequest on the main thread that extracts the URL, the schemeless site and source schemeless site (via nsIEffectiveTLDService), the request type (mapped from ExtContentPolicyType to an adblock type string), the third-party flag (via mozIThirdPartyUtil), and the PBM flag.

ClassifyForCancel and ClassifyForAnnotate both acquire mLock, pick the appropriate active-engine array based on PBM and mode, and call ClassifyWithEngines. The lock is released before returning the result.

ClassifyWithEngines takes an aIndependentEngines flag that controls how engine evaluation chains:

  • Cancel (aIndependentEngines = false). Threads a matchedSoFar flag through every CheckNetworkRequest call so exception-only engines see the propagated matched_rule. Stops iterating when the aggregated status reaches ImportantHit or ImportantException — either of those pins the outcome and further engines can’t change it — but otherwise continues so a trailing exception can still demote an earlier hit.

  • Annotate (aIndependentEngines = true). Each engine sees previously_matched_rule = false, so each evaluates its own rules in isolation and MaybeAnnotateChannel can attribute matches to every feature whose rules fired.

ContentClassifierEngine::CheckNetworkRequest short-circuits to a Miss for first-party requests before crossing the FFI. For genuine third-party requests, it builds the preparsed request fields once and calls content_classifier_engine_check_network_request_preparsed. The Rust side constructs an adblock::Request via Request::preparsed, calls Engine::check_network_request_subset(req, previously_matched_rule, false), and writes back matched, important, and an optional exception rule string.

Each per-engine result is folded into a ContentClassifierResult via Accumulate. The status enum is ordered (Miss < Hit < Exception < ImportantHit < ImportantException), and Accumulate promotes monotonically: any Exception promotes the aggregate over a Hit, and any Important value pins the status against later non-Important results. Really, the status enum only matters for annotation.

The worker thread dispatches the result back to the main thread, which then calls either MaybeCancelChannel (consults ChannelClassifierUtils::IsAllowListed, finds the first matched feature whose mBlockingErrorCode is non-NS_OK, hands off to ChannelClassifierUtils::MaybeBlockChannel) or MaybeAnnotateChannel (iterates the engine-result list and calls ChannelClassifierUtils::AnnotateChannel for each matched feature with a non-zero mLoadedState, applying the feature’s classification flag and loaded state to the channel).