Use this workflow before making strong performance claims.
- Reproduce with the local benchmark wrapper.
scripts/run-single-benchmark.sh --module <module> --class <fqcn> --method <benchmarkMethod>- If the benchmark moves but cause is unclear:
- use
--enable-jfrfor benchmark-side JFR capture - or use
async-profiler-java-macosfor cpu / alloc / wall evidence on macOS
- use
- If code shape or JIT behavior is the question:
- use hotspot-jit-forensics
- capture compilation tier, inlining decisions, and method-scoped C2 evidence
- Build the smallest reproducible JMH or app-level benchmark.
- Capture baseline result.
- Change code shape.
- Capture candidate result with same JVM, flags, input size, and warmup assumptions.
- If the delta matters, inspect JIT evidence:
java -XX:+UnlockDiagnosticVMOptions -XX:+LogCompilation -XX:LogFile=jit.xml -XX:+PrintCompilation -jar app.jarIf assembly or per-method diagnostics are needed, move to focused compiler directives and the hotspot-jit-forensics workflow.
If the change introduces generated Java or runtime compilation, do not stop at a single warm benchmark.
Also capture:
- cold compile + first execute time
- warm cached execute time
- generated source size or code-shape proxy
- cache hit/miss behavior if caching is part of the design
- fallback behavior on compile failure or code-size overflow
- classloader / metaspace symptoms if repeated compilation is involved
Report these five items:
- benchmark delta: throughput/latency before vs after
- allocation delta: lower / unchanged / unknown
- JIT evidence: inline success/failure, tier, bailout, intrinsic, vectorization clue, or “not inspected”
- exact command or benchmark selector
- confidence: high / medium / low
If runtime codegen is involved, also report:
- cold compile cost: measured / unknown
- warm cache behavior: hit / miss / unknown
- fallback path exercised: yes / no / unknown
- High: repeatable benchmark delta plus matching profile/JIT evidence
- Medium: repeatable benchmark delta without definitive low-level proof
- Low: one run, noisy run, or JVM explanation not verified
For runtime codegen, confidence drops if cold-start cost, cache reuse, or fallback behavior were not inspected.
Do not stop at “assembly unavailable”.
Still collect:
jit.xml- compiler directives output
PrintCompilation/ inlining diagnostics- async-profiler or JFR evidence
Then say the exact missing piece: for example hsdis not installed or assembly printing not enabled.