JIT profiling with samply
TL;DR: On Linux and macOS, if you have a local Firefox or Spidermonkey shell
build with default options, then you can use
samply
to get profiles like this shell
profile or this browser
profile (but with working source and
assembly views) by running the following commands:
Firefox:
samply record PERF_SPEW_DIR=/tmp IONPERF=src MOZ_DISABLE_CONTENT_SANDBOX=1 MOZ_USE_PERFORMANCE_MARKER_FILE=1 JIT_OPTION_onlyInlineSelfHosted=true python3 ./mach run`
JS shell:
samply record PERF_SPEW_DIR=/tmp IONPERF=src ~/code/obj-shell/dist/bin/js --enable-ic-frame-pointers --only-inline-selfhosted index.js`
Motivation
For JIT profiling, the Gecko profiler currently has some shortcomings:
It’s not available in the JS shell.
It doesn’t label all JIT frames correctly, such as IC frames or trampolines.
It doesn’t allow accessing the assembly code of jitted functions.
It doesn’t allow accessing the JavaScript source code or IR code.
samply
is an alternative to the Gecko
profiler which can address these shortcomings, at least on Linux and macOS,
until these features are available in the Gecko profiler. It does this by
leveraging Spidermonkey’s support for Jitdump, and it presents profiles in the
familiar Firefox Profiler UI.
Additionally, on Linux, the Linux tool perf
can be used to get even more
control. For example, perf
can sample performance counters such as cache
misses, instructions executed, or page faults. It can also unwind with frame
pointers whereas samply currently always uses DWARF unwinding.
To record and view a profile, you have three options:
samply record
: easiest to use, Firefox Profiler UIperf record
+samply import perf.data
: more control + Firefox Profiler UIperf record
+perf report
/perf annotate
: if you’re already familiar with perf’s UI and don’t mind it
Setup
To install samply, follow the installation instructions in samply
’s README
file.
To install perf
(optional), install linux-perf-tools
with your distribution’s
package manager.
Environment Variables and flags
Environment variables that must be defined for perf JIT profiling:
PERF_SPEW_DIR
: Location of jitdump output files. Making this directory a tmpfs filesystem could help reduce overhead.IONPERF
: Valid options include:func
,src
,ir
,ir-ops
.IONPERF=func
will disable all annotation and only function names will be available. It is the fastest option.IONPERF=ir
will enable IR annotation.IONPERF=ir-ops
will enable IR annotation with operand support. Requires –enable-jitspew and adds additional overhead to “ir”.IONPERF=src
will enable source code annotation. With samply, this works well in most cases. When usingperf annotate
, it only works if perf can read the source file locally, so it only really works well in the JS shell.
The following environment variables and flags are useful, too:
JIT_OPTION_onlyInlineSelfHosted=true
and --only-inline-selfhosted
(for the
browser and the shell, respectively) make it so that all function calls show up
in the profile, by disabling inlining. In the future, we hope to have inlining
information in Jitdump so that we can see inlined JS function without disabling
inlining.
MOZ_DISABLE_CONTENT_SANDBOX=1
is needed when profiling the browser, so that
content processes can write the Jitdump file to the disk.
MOZ_USE_PERFORMANCE_MARKER_FILE=1
can be used to get PerformanceUserTiming
markers into the profile, which is useful when profiling Speedometer.
Usage
samply record
launches a process, records it, and waits for it to finish. Once
the process has shut down, it opens the profile.
So, with the browser, you use it as follows:
samply record [env-vars] python3 ./mach run [flags]
Run the workload you want to profile in your Firefox build.
Shut down Firefox.
Wait for the profile to appear.
Interact with the profiler.
When done, press Ctrl+C on the terminal to stop the samply server.
And with the JS shell:
samply record [env-vars] obj/dist/bin/js [flags] index.js
Wait for the shell to finish running index.js
Wait for the profile to appear.
Interact with the profiler.
When done, press Ctrl+C on the terminal to stop the samply server.
Hints
On macOS,
samply
cannot record shell scripts or system applications because of signing restrictions. Sosamply record ./mach run
will not work. But ifpython3
is installed from homebrew, thensamply record python3
works. This is why I’ve been writingsamply record python3 ./mach run
rather thansamply record ./mach run
above.Get more samples: In the source view and especially in the assembly view, you’ll often want more samples than what you get from a single run.
In the browser, try running your workload multiple times.
In the shell, run your workload multiple times by using
samply record --iteration-count 10 --reuse-threads
.