Content Classifier Service
The Content Classifier Service (toolkit/components/content-classifier/) is
the anti-tracking component that classifies network channels against
adblock-format filter lists delivered through Remote Settings. It is a
parallel classification path layered alongside the older URL Classifier and
its safebrowsing-format hash tables: same set of features (trackers, social
trackers, fingerprinters, cryptominers, email trackers, plus
allow-list/exception features and dedicated test_block / test_annotate
features), but driven by full adblock syntax rules evaluated by a Rust
engine wrapping the adblock crate.
This page is a reference for how the service is wired up internally: where list bytes live, how they get turned into engines, how a channel classification request flows through it, and which invariants the code depends on.
Components
File |
Role |
|---|---|
|
XPCOM contract surfaced to JS: |
|
JS-side contract: |
|
The singleton C++ service. Owns the feature table, the per-feature engine map, the four mode-keyed active-engine lists, the mutex, pref/Nimbus observers, async-shutdown blocker, and the build thread. |
|
Wraps the |
|
Wraps the |
|
Thread-safe refcounted C++ wrapper around the Rust FFI engine. Extracts request metadata from an |
|
Component registration and build setup (cbindgen generates |
Features and prefs
The static kFeatures[] table (ContentClassifierService.cpp) is the single
source of truth for which feature names exist, which Remote Settings list
IDs roll up into each feature’s engine, and how matches are reported to the
channel. Each entry carries:
mName— the identifier used in prefs.mListIds— one or more Remote Settings record names whose attachments are concatenated into the feature’s engine rules.mClassificationFlag— thensIClassifiedChannel::ClassificationFlagsbit set on the channel for an annotation match.mLoadedState/mReplacedState/mAllowedState—nsIWebProgressListenerSTATE_LOADED_* / STATE_REPLACED_* / STATE_ALLOWED_* values logged into the content blocking log.mLoadedState == 0denotes an annotate-without-notify feature.mBlockingErrorCode—NS_ERROR_*_URIpassed toUrlClassifierCommon::SetBlockedContentfor a cancellation;NS_OKmeans the feature has no blocking variant and is only ever an annotation.mExceptionOnly— true if the feature contains only allowlist / exception rules. This means it must be last in a list of features. A console warning will yell at you for this.
Enable switches (per mode):
privacy.trackingprotection.content.protection.enabledprivacy.trackingprotection.content.annotation.enabled
Feature selection (comma-separated feature names):
privacy.trackingprotection.content.protection.enginesprivacy.trackingprotection.content.protection.engines.pbmodeprivacy.trackingprotection.content.annotation.enginesprivacy.trackingprotection.content.annotation.engines.pbmode
Test-only lists fetched over HTTP (used by the test_block /
test_annotate features so tests don’t need a live Remote Settings
collection):
privacy.trackingprotection.content.protection.test_list_urlsprivacy.trackingprotection.content.annotation.test_list_urls
All of the above prefs are mapped onto Nimbus feature variables in
toolkit/components/nimbus/FeatureManifest.yaml.
Threading model
Three thread types appear in this code, and the rebuild and classify paths both deliberately move work between them:
Main thread. All init, pref observers, Remote Settings sync callbacks, and final channel-side decisions (
MaybeCancelChannel,MaybeAnnotateChannel) run here.mBuildThread(annsISerialEventTargettask queue, created inInit). The CPU-heavy half of an engine rebuild runs here:Engine::from_rulescalls (the actual adblock parser) happen with no lock held, and the lock-protectedInstallEngine/PopulateAllActiveEnginesFromPreferenceSnapshot/PruneInactiveEnginessteps run here too, just briefly undermLock.URL-classifier worker thread.
ClassifyForCancelandClassifyForAnnotaterun here, called fromnetwerk/url-classifier/AsyncUrlChannelClassifier.cpp. Both acquiremLockbriefly to snapshot the active-engine list pointer and then release it before crossing the FFI.
The mozilla::Mutex mLock is non-recursive. Reacquiring it while
already held will deadlock the calling thread. The header enforces this
by:
Marking
mInitPhase,mEngines,mFeatureVersions,mUpdateGeneration, and the four active-engine arrays asMOZ_GUARDED_BY(mLock).Annotating
InstallEngine,PopulateAllActiveEnginesFromPreferenceSnapshot, andPruneInactiveEngineswithMOZ_REQUIRES(mLock).Releasing
mLockbefore any call into the engine FFI (so a long classification cannot stall a rebuild and vice versa).
You may be tempted to use a RWLock. This will give you less than you think because we really only have one classifying thread. Worse yet, I don’t remember if the engine lookup is threadsafe.
List load and engine rebuild
A rebuild is triggered by any of:
Initial
InitRSClient()(first time the service sees an active RS feature).A Remote Settings sync push (
onSyncin the JS client).A pref change: master enable, an engines selection pref, or one of the
test_list_urlsprefs.
onListsChanged(updated, removed) on the main thread calls
ProcessListChanges, which takes a fresh EnginesPrefsSnapshot of the
current pref state, walks the active features named in that snapshot, and
selects every feature that either has no engine yet or whose
mListIds overlap updated ∪ removed. That set goes to
UpdateFeatures.
UpdateFeatures (main thread) bumps mUpdateGeneration (global) and the
per-feature mFeatureVersions entry for every feature it’s about to
rebuild — both under mLock. It then fires
FetchEngineDataForFeature to get the rule lists.
The MozPromise<> returned by each fetch is
collected via MozPromise::AllSettled; when all of them resolve, the
collected rule arrays plus the captured generation and per-feature
versions are dispatched onto mBuildThread.
On mBuildThread, with no lock held, we build the rule engines.
The same mBuildThread task then reacquires mLock and performs the
install / populate / prune step under it:
For each freshly built engine, compare the captured per-feature version to the current
mFeatureVersionsentry. If a newer rebuild has been issued since this one was dispatched, the captured version is stale and the engine is dropped on the floor. Otherwise it’s stored intomEnginesviaInstallEngine.After all installs, compare the captured
mUpdateGenerationto the current one. Only if it’s still the latest do we runPopulateAllActiveEnginesFromPreferenceSnapshot(rebuild the four per-mode active-engine arrays frommEngines, in pref order) andPruneInactiveEngines(drop entries frommEnginesnot referenced by any active-engine array).
This versioning-and-recheck pattern is the safety invariant for concurrent
rebuilds: two rebuilds racing through mBuildThread can never have the
older one’s snapshot overwrite the newer one’s results, because the
older one’s captured generation no longer matches by the time it tries
to commit.
Finally a small task is dispatched back to the main thread to fire
NS_CONTENT_CLASSIFIER_FILTER_LISTS_LOADED_TOPIC (test-only, gated on the
privacy.trackingprotection.content.testing pref), which is how the
browser tests await rebuild completion. These need to be debounced.
Channel classification
A channel classification request enters from
netwerk/url-classifier/AsyncUrlChannelClassifier.cpp on the URL-classifier
worker thread. The caller has already constructed a ContentClassifierRequest
on the main thread that extracts the URL, the schemeless site and source
schemeless site (via nsIEffectiveTLDService), the request type (mapped
from ExtContentPolicyType to an adblock type string), the third-party
flag (via mozIThirdPartyUtil), and the PBM flag.
ClassifyForCancel and ClassifyForAnnotate both acquire mLock, pick
the appropriate active-engine array based on PBM and mode, and call
ClassifyWithEngines. The lock is released before returning the result.
ClassifyWithEngines takes an aIndependentEngines flag that controls
how engine evaluation chains:
Cancel (
aIndependentEngines = false). Threads amatchedSoFarflag through everyCheckNetworkRequestcall so exception-only engines see the propagatedmatched_rule. Stops iterating when the aggregated status reachesImportantHitorImportantException— either of those pins the outcome and further engines can’t change it — but otherwise continues so a trailing exception can still demote an earlier hit.Annotate (
aIndependentEngines = true). Each engine seespreviously_matched_rule = false, so each evaluates its own rules in isolation andMaybeAnnotateChannelcan attribute matches to every feature whose rules fired.
ContentClassifierEngine::CheckNetworkRequest short-circuits to a Miss
for first-party requests before crossing the FFI. For genuine
third-party requests, it builds the preparsed request fields once and
calls content_classifier_engine_check_network_request_preparsed. The
Rust side constructs an adblock::Request via Request::preparsed,
calls Engine::check_network_request_subset(req, previously_matched_rule, false), and writes back matched, important, and an optional
exception rule string.
Each per-engine result is folded into a ContentClassifierResult via
Accumulate. The status enum is ordered (Miss < Hit < Exception <
ImportantHit < ImportantException), and Accumulate promotes
monotonically: any Exception promotes the aggregate over a Hit, and any
Important value pins the status against later non-Important results.
Really, the status enum only matters for annotation.
The worker thread dispatches the result back to the main thread, which
then calls either MaybeCancelChannel (consults
ChannelClassifierUtils::IsAllowListed, finds the first matched feature
whose mBlockingErrorCode is non-NS_OK, hands off to
ChannelClassifierUtils::MaybeBlockChannel) or MaybeAnnotateChannel
(iterates the engine-result list and calls
ChannelClassifierUtils::AnnotateChannel for each matched feature with a
non-zero mLoadedState, applying the feature’s classification flag and
loaded state to the channel).