| name | high-performance-java |
|---|---|
| description | Use when writing, reviewing, or reshaping HotSpot Java where algorithmic complexity, data-structure choice, throughput, latency, allocation rate, zero-copy, lazy evaluation, non-materialization, primitive collections, performance libraries, intrinsics, SuperWord auto-vectorization, or C2 assembly matter. Also use for advanced algorithmic problem solving in Java, including dynamic programming, graph/range techniques, and cache-aware code shape. Bias toward asymptotic wins first, then specialized hot-path code, then benchmark and JIT evidence. |
Use this skill for Java hot paths and algorithm-heavy Java. Default bias: asymptotic win first, then fewer allocations, fewer copies, less polymorphism, narrower code shape, stronger evidence.
HotSpot-only v1. Baseline assumptions:
- repo baseline: JDK 21
- current local runtime may be newer
- low-level claims stay provisional until benchmark + JIT evidence agree
- algorithm/data-structure claims stay provisional until they match the actual workload constraints
- Identify the workload shape and constraints.
- Pick the algorithm and data structure that change the slope.
- Find the hot loop or hot call chain.
- Write the narrow fast path first.
- Push generic abstraction, materialization, and dispatch out of the loop.
- Benchmark before claiming improvement.
- Inspect HotSpot decisions before claiming JVM-level reasons.
- Prefer an algorithmic win over a micro win.
- Prefer data structures that fit the operation mix, memory budget, and key domain.
- Prefer primitive-friendly layouts before boxed object graphs.
- Prefer zero-copy over copy-transform-copy.
- Prefer reuse over per-item allocation.
- Prefer lazy traversal over full materialization.
- Prefer primitives, flat arrays, and tight counted loops in hot paths.
- Prefer monomorphic calls that inline away.
- Prefer specialized lambda/adaptor code for the active workload.
- Prefer one fast path plus one cold fallback over a single generalized hot path.
- Do not micro-optimize a fundamentally wrong algorithm.
- Do not defend a perf change with style arguments alone.
- Do not claim “faster” without a measurement path.
- Do not claim “JIT will optimize this” without checking inlining / compilation evidence.
- Do not add a specialized library until you know what property it buys: fewer allocations, fewer copies, lower contention, off-heap layout, better primitive support, or a stronger algorithm.
- Do not keep elegant-but-generic stream pipelines in verified hot loops.
- Do not pay interface / visitor / wrapper overhead inside the hottest loop unless evidence shows it disappears.
- Do not default to boxed
Map<K, V>/Set<T>/List<T>shapes when primitive collections or flat arrays better fit the dominant path.
Ask these first:
- What are
N,Q, the update/query ratio, and the memory budget? - Is the main problem asymptotic complexity, cache locality, allocation pressure, branchiness, contention, or I/O?
- What operation dominates: membership, counting, top-k, range query, join, shortest path, DP transition, parsing, encoding?
- Can the key/value/state space stay primitive or bit-packed?
- Can the workload become offline, batched, sorted, prefix-based, or compressed?
- What allocates on the steady-state path?
- What copies bytes, chars, arrays, or collections?
- What materializes intermediate state that could stay streamed or cursor-based?
- What dispatch stays virtual or megamorphic in the inner loop?
- What loop shape blocks scalar replacement, inlining, or SuperWord vectorization?
- What “generic” branch handles cases the active workload never uses?
- Estimate the real workload: input size, query count, mutation pattern, latency target, and memory ceiling.
- Choose the algorithm and data structure before tuning loop syntax.
- Favor contiguous, cache-friendly, primitive-heavy representations when semantics allow.
- For dynamic programming, define state, transition cost, base case, iteration order, and whether state compression is possible.
- For graph/range/string problems, look for offline transforms, prefix structures, monotonic structures, or specialized search before hand-tuning.
Read these only when relevant:
- references/algorithms-data-structures.md for algorithm and data-structure selection.
- references/advanced-coding-techniques.md for dynamic programming and advanced problem-solving patterns.
- Split hot and cold paths.
- Hoist invariant checks and decoding outside the loop.
- Replace generic callback stacks with narrow-path adapters.
- Reuse mutable carriers only when ownership is clear.
- Keep loop bodies predictable, contiguous, and exception-light.
Detailed rules: see references/coding-rules.md.
If you are in this RDF4J repo, use the local benchmark wrapper first:
scripts/run-single-benchmark.sh --module <module> --class <fqcn> --method <benchmarkMethod>If you are outside RDF4J, use JMH or an existing reproducible micro/macro benchmark.
Measurement workflow: see references/evidence-workflow.md.
When a benchmark moves, inspect what HotSpot actually did:
- compilation tier
- inlining success/failure
- intrinsic usage when relevant
- allocation pressure
- assembly / C2 logs when needed
Use sibling skill hotspot-jit-forensics for method-scoped C2 evidence. Use async-profiler-java-macos when wall/cpu/alloc evidence is needed on macOS.
- Prefer the JDK first when it is close enough and operationally simpler.
- Reach for specialized libraries when they remove boxing, copies, parser overhead, contention, or off-heap indirection the JDK cannot.
- Check dependency health before adding a new library.
- Benchmark the library choice against the simplest credible in-repo baseline.
Library reference: references/high-performance-java-libraries.md.
Frame conclusions as:
- hypothesis
- algorithm/data-structure choice
- benchmark result
- JIT/profile evidence
- confidence
If assembly is unavailable, say so and fall back to compilation logs, inlining diagnostics, and profile data.
Use this skill when the user asks to:
- remove allocation pressure from a parser, iterator, encoder, decoder, or query loop
- make a Java path zero-copy or lazy
- choose the right data structure for a Java workload
- solve a dynamic programming, graph, interval, ranking, or range-query problem in Java under performance constraints
- replace boxed collections with primitive or cache-friendly structures
- choose between the JDK and specialized Java performance libraries
- specialize code for one workload instead of many
- explain whether a HotSpot optimization actually happened
- ground a Java perf change in benchmark + C2 evidence
- Algorithms and data structures: references/algorithms-data-structures.md
- Advanced coding techniques: references/advanced-coding-techniques.md
- High-performance Java libraries: references/high-performance-java-libraries.md
- Coding rules: references/coding-rules.md
- Evidence workflow: references/evidence-workflow.md
- JDK version guardrails: references/jdk-21-26-notes.md
When you use this skill, the answer should usually include:
- workload model and asymptotic bottleneck
- algorithm and data-structure recommendation
- hot-path hypothesis
- concrete code-shape recommendation
- library recommendation when a library meaningfully changes the design
- benchmark command or benchmark evidence
- JIT/profile evidence or the missing prerequisite
- a confidence statement tied to the active JDK