Crash Reporter

Overview

The crash reporter is a subsystem to record and manage application crash data.

While the subsystem is known as crash reporter, it helps to think of it more as a process dump manager. This is because the heart of this subsystem is really managing process dump files and these files are created not only from process crashes but also from hangs and other exceptional events.

The crash reporter subsystem is composed of a number of pieces working together.

Breakpad

Breakpad is a library and set of tools to make collecting process information (notably dumps from crashes) easy. Breakpad is a 3rd party project (originally developed by Google) that is imported into the tree.

Dump files

Breakpad produces files called dump files that hold process data (stacks, heap data, etc).

Crash Reporter Client

The crash reporter client is a standalone executable that is launched to handle dump files. This application optionally submits crashes to Mozilla (or the configured server).

Minidump Analyzer

The minidump analyzer is a standalone executable that is launched by the crash reporter client or by the browser itself to extract stack traces from the dump files generated during a crash. It appends the stack traces to the .extra file associated with the crash dump.

Ping Sender

The ping sender is a standalone executable that is launched by the crash reporter client to deliver a crash ping to our telemetry servers. The ping sender is used to speed up delivery of the crash ping which would otherwise have to wait for Firefox to be restarted in order to be sent.

How Main-Process Crash Handling Works

The crash handler is hooked up very early in the Gecko process lifetime. It all starts in XREMain::XRE_mainInit() from nsAppRunner.cpp. Assuming crash reporting is enabled, this startup function registers an exception handler for the process and tells the crash reporter subsystem about basic metadata such as the application name and version.

The registration of the crash reporter exception handler doubles as initialization of the crash reporter itself. This happens in CrashReporter::SetExceptionHandler() from nsExceptionHandler.cpp. The crash reporter figures out what application to use for reporting dumped crashes and where to store these dump files on disk. The Breakpad exception handler (really just a mechanism for dumping process state) is initialized as part of this function. The Breakpad exception handler is a google_breakpad::ExceptionHandler instance and it’s stored as gExceptionHandler.

As the application runs, various other systems may write annotations or notes to the crash reporter to indicate state of the application, help with possible reasons for a current or future crash, etc. These are performed via CrashReporter::RecordAnnotation*(), CrashReporter::RegisterAnnotation*() functions and CrashReporter::AppendAppNotesToCrashReport() from nsExceptionHandler.h.

For well running applications, this is all that happens. However, if a crash or similar exceptional event occurs (such as a hang), we need to write a crash report.

When an event worthy of writing a dump occurs, the Breakpad exception handler is invoked and Breakpad does its thing. When Breakpad has finished, it calls back into CrashReporter::MinidumpCallback() from nsExceptionHandler.cpp to tell the crash reporter about what was written.

MinidumpCallback() performs a number of actions once a dump has been written. It writes a file with the time of the crash so other systems can easily determine the time of the last crash. It supplements the dump file with an extra file containing Mozilla-specific metadata. This data includes the annotations set via CrashReporter::AnnotateCrashReport() as well as time since last crash, whether garbage collection was active at the time of the crash, memory statistics, etc.

If the crash reporter client is enabled, MinidumpCallback() invokes it. It simply tries to create a new crash reporter client process (e.g. crashreporter.exe) with the path to the written minidump file as an argument.

The crash reporter client performs a number of roles. There’s a lot going on, so you may want to look at main() in crashreporter.cpp. First, stack traces are extracted from the dump via the minidump analyzer tool. The resulting traces are appended to the .extra file of the crash together with the SHA256 hash of the minidump file. Once this is done a crash ping is assembled holding the same information as the one generated by the `CrashManager` and it’s sent to the telemetry servers via the ping sender program. The UUID of the ping is then stored in the extra file; the `CrashManager` will later pick it up and generate a new ping with the same UUID so that the telemetry server can deduplicate both pings. Then, the crash reporter client verifies that the dump data is sane. If it isn’t (e.g. required metadata is missing), the dump data is ignored. If dump data looks sane, the dump data is moved into the pending directory for the configured data directory (defined via the MOZ_CRASHREPORTER_DATA_DIRECTORY environment variable or from the UI). Once this is done, the main crash reporter UI is displayed via UIShowCrashUI(). The crash reporter UI is platform specific: there are separate versions for Windows, OS X, and various *NIX presentation flavors (such as GTK). The basic gist is a dialog is displayed to the user and the user has the opportunity to submit this dump data to a remote server.

If a dump is submitted via the crash reporter, the raw dump files are removed from the pending directory and a file containing the crash ID from the remote server for the submitted dump is created in the submitted directory.

If the user chooses not to submit a dump in the crash reporter UI, the dump files are deleted.

And that’s pretty much what happens when a crash/dump is written!

Plugin and Child Process Crashes

Crashes in plugin and child processes are also managed by the crash reporting subsystem.

Child process crashes are handled by the mozilla::dom::CrashReporterParent class defined in dom/ipc. When a child process crashes, the toplevel IPDL actor should check for it by calling TakeMinidump in its ActorDestroy Method: see mozilla::plugins::PluginModuleParent::ActorDestroy and mozilla::plugins::PluginModuleParent::ProcessFirstMinidump. That method is responsible for calling mozilla::dom::CrashReporterParent::GenerateCrashReportForMinidump with appropriate crash annotations specific to the crash. All child-process crashes are annotated with a ProcessType annotation, such as “content” or “plugin”.

Once the minidump file has been generated the mozilla::dom::CrashReporterHost is notified of the crash. It will first try to extract the stack traces from the minidump file using the minidump analyzer. Then the stack traces will be stored in the extra file together with the rest of the crash annotations and finally the crash will be recorded by calling `CrashService.addCrash()`. This last step adds the crash to the `CrashManager` database and automatically sends a crash ping with information about the crash.

Submission of child process crashes is handled by application code. This code prompts the user to submit crashes in context-appropriate UI and then submits the crashes using CrashSubmit.sys.mjs.

Memory Reports

When a process detects that it is running low on memory, a memory report is saved. If the process crashes, the memory report will be included with the crash report. nsThread::SaveMemoryReportNearOOM() checks to see if the process is low on memory every 30 seconds at most and saves a report every 3 minutes at most. Since a child process cannot actually save to the hard drive, it instead notifies its parent process, which saves the report for it. If a crash does occur, the memory report is moved to the pending directory with the other dump data and an annotation is added to indicate the presence of the report. This happens in nsExceptionHandler.cpp, but occurs in different functions depending on what process crashed. When the main process crashes, this happens in MinidumpCallback(). When a child process crashes, it happens in OnChildProcessDumpRequested(), with the annotation being added in WriteExtraData().

Plugin Hangs

Plugin hangs are handled as crash reports. If a plugin doesn’t respond to an IPC message after 60 seconds, the plugin IPC code will take minidumps of all of the processes involved and then kill the plugin.

In this case, there will be only one .extra file with the crash report metadata, but there will be multiple dump files: at least one for the browser process and one for the plugin process. All of these files are submitted together as a unit. Before submission, the filenames of the files are linked:

  • uuid.extra - annotations, includes the `additional_minidumps` annotation holding a comma-separated list of the additional minidumps

  • uuid.dmp - plugin process dump file

  • uuid-<other>.dmp - other process dump file as listed in additional_minidumps

about:crashes

If the crash reporter subsystem is enabled, the about:crashes page will be registered with the application. This page provides information about previous and submitted crashes.

It is also possible to submit crashes from about:crashes.

Environment variables affecting crash reporting

The exception handler and crash reporter client behavior can be altered by setting certain environment variables, some of these variables are used for testing but quite a few have only internal users.

User-specified environment variables

  • MOZ_CRASHREPORTER - The opposite of MOZ_CRASHREPORTER_DISABLE, force crash reporting on even if disabled in application.ini. You must use this to enable crash reporting on debug builds.

  • MOZ_CRASHREPORTER_DISABLE - Disable Breakpad crash reporting completely in non-debug builds. You can use this if you would rather use the JIT debugger on Windows with the symbol server, for example.

  • MOZ_CRASHREPORTER_FULLDUMP - Store full application memory in the minidump, so you can open it in a Microsoft debugger. Don’t submit it to the server. (Windows only.)

  • MOZ_CRASHREPORTER_NO_DELETE_DUMP - Don’t delete the crash report dump file after submitting it to the server. Minidumps will still be moved to the “Crash Reports/pending” directory.

  • MOZ_CRASHREPORTER_NO_REPORT - Save the minidump file but don’t launch the crash reporting UI or send the report to the server. Minidumps will be stored in the user’s profile directory, in a subdirectory named “minidumps”.

  • MOZ_CRASHREPORTER_SHUTDOWN - Save the minidump and then force the application to close. This is useful for content crashes that don’t normally close the chrome (main application) processes. This variable would cause the application to close as well.

  • MOZ_CRASHREPORTER_URL - Sets the URL that the crash reporter will submit reports to.

Environment variables used internally

  • MOZ_CRASHREPORTER_AUTO_SUBMIT - When set causes the crash reporter client to skip the UI flow and submit the crash report directly.

  • MOZ_CRASHREPORTER_DATA_DIRECTORY - Platform dependent data directory, the pending crash reports will be stored in a subdirectory of this path. This overrides the default one generated by the client’s code.

  • MOZ_CRASHREPORTER_DUMP_ALL_THREADS - When set to 1 stack traces for all threads are generated and sent in the crash ping, when not set only the trace for the crashing thread will be generated instead.

  • MOZ_CRASHREPORTER_EVENTS_DIRECTORY - Path of the directory holding the crash event files.

  • MOZ_CRASHREPORTER_PING_DIRECTORY - Path of the directory holding the pending crash ping files.

  • MOZ_CRASHREPORTER_RESTART_ARG_<n> - Each of these variable specifies one of the arguments that had been passed to the application, starting with the first after the executable, the crash reporter client uses them for restarting it.

  • MOZ_CRASHREPORTER_RESTART_XUL_APP_FILE - If a XUL app file was specified when starting the app it has to be stored in this variable so that the crash reporter client can restart the application.

  • MOZ_CRASHREPORTER_STRINGS_OVERRIDE - Overrides the path used to load the .ini file holding the strings used in the crash reporter client UI.

Other topics