Rust Component Observability
There are several systems in place to help teams gain observability into Rust components. This document describes how to set them up.
Component Errors
Components are encouraged to use the error_support crate to help handle Rust errors.
error_support expects crates to define 2 different error enums:
The
Errorenum is used internally by the crate, which means it’s used for most of the code.The
ApiErrorenum is returned from the public API. This consists of a thin layer that forwards to an internal function and convertsErrortoApiError. Public errors generally have fewer variants and fields, since they only expose data that consumers care about.
Crates should implement the error_support::GetErrorHandling trait
which defines how internal errors get converted to public errors.
This conversion is also an opportunity for crates to log errors and/or report them to our error tracking system.
error_support also provides the handle_error macro, which can be used to auto-convert the error types.
The example crate shows how this looks in practice:
error.rs defines the
ErrorandApiErrorenums and has theGetErrorHandlingimplementation.lib.rs defines the
ExampleComponenttype which shows howhandle_errorcan be used for top-level public functions. The bodies of each function evaluate to theResult<T, Error>type and that macro converts this toResult<T, ApiError>
Where do error reports go?
Applications have flexibility in how these reports are handled.
The mobile clients previously used Sentry, but we are in the process of moving all platforms to report these errors using the
rust-component-errors Glean ping
During this transition process, the Glean error ping is currently only wired up on Android, errors will not be reported for iOS or Desktop. Also, Android is also sending error reports to both Sentry and the Glean. Work is currently in-progress to fix all of this.
Reporting errors directly
Use error_support::report_error! to record errors directly, outside the error conversion process.
It uses the same formatting as println!.
Metrics
Use Glean to record metrics for your components. Unfortunately, Glean metrics can not be recorded from Rust directly, but there is currently work in progress to fix this. In the meantime, you’ll need to use a workaround to record your metrics.
On Kotlin, this usually means creating wrapper code in application-services (example). On Swift, the wrapper code lives in firefox-ios (example) On Desktop, we don’t have a great story at the moment.
Defining new metrics
Create a
metrics.yamlfile if it doesn’t exist and add new entries to it. See the Glean book for details. It’s often helpful to use an existingmetrics.yamlfile from another component as a template.Open a PR to get data review for your metrics.
If you created a new
metrics.yamlfile, you may need to hook it up toprobe-scraperso that it’s available there. See https://bugzilla.mozilla.org/show_bug.cgi?id=2019535 for discussion. This issue will probably go away when we move the application-services code into moz-central.
Dashboards
The generate-rust-dashboards tool can be used to generate dashboards for your team’s Rust components. The dashboards can track both component errors and metrics in a single page.
Rust code setup
Add your component as a variant of the
Componentenum. Update the methods below and add a case for your component variant.Add your team config to the all_dashboards(). Use the existing
TeamConfiginstances as a template for your team.
bigquery-etl
Before metrics can be graphed on a dashboard, you’ll need to add them to our bigquery-etl configuration.
This greatly improves performance by aggregating metrics on a daily basis to a new table
The dashboard code uses the aggregate tables in it’s FROM statements,
so you won’t see anything until you complete this step.
Clone the bigquery-etl repository
Update rust_component_metrics/init.py and add entries for your metrics
Open a
bigquery-etlPR and wait for it to be merged. On success, aggregate tables will start being populated the following day. On failure, the sync team will get an Airflow failure notification and should be able to help you fix the error.
Creating a dashboard on Yardstick
Ensure you have a yardstick account by going to Yardstick and logging in using Mozilla SSO. You should have “editor” access and can create, edit, and delete dashboards and alerts. If not, go to the Yardstick Grafana User Guide.
Run
cargo generate-rust-dashboards [team-slug] [output-dir]. This will generate a set of JSON files for each dashboard page in[output-dir]Log in to https://yardstick.mozilla.org/
Create a folder in yardstick for your team (optional, but highly recommended)
From that folder, use the
Newbutton, selectImport, and upload each of the generated JSON files to create the dashboards.
Congratulations, you now have a dashboard for your team’s components!
In the future you can edit the Rust code and re-import the JSON files to update your dashboards. Grafana will ensure this modifies the current dashboards rather than creating a new one.
Currently supported metrics
Dashboards currently support the following metrics:
counter and labeled counter
distribution and labeled distribution
event
If you want to use another metric please open a bugzilla ticket in Application services / General.
It’s likely that it will be easy to implement, we just haven’t done it yet.