Doing performance work with GPUs is harder than with CPUs because of the asynchronous and massively parallel architecture.
PIX - Can do timing of Direct3D calls. Works reasonably well with Firefox. See also Debugging With PIX.
NVIDIA PerfHUD - Last I checked required a special build to be used.
NVIDIA Parallel Nsight - Haven’t tried.
AMD GPU ShaderAnalyzer - Will compile a shader and show the machine code and give static pipeline estimations. Not that useful for Firefox because all of our shaders are pretty simple.
AMD GPU PerfStudio - I had trouble getting this to work, and can’t remember whether I actually did or not.
Open source, works OK.
Doesn’t seem to emit traces on android/Nexus S. Looks like it’s designed for X11-based linux-ARM devices, OMAP3 is mentioned a lot in the docs ...
Accurately Profiling Direct3D API Calls (Direct3D 9) Suggests avoiding normal profilers like xperf and instead measuring the time to flush the command buffer.
Sort of old, but still useful.