Skip to content

Commit c96c294

Browse files
authored
Merge main into develop (#5741)
2 parents 75cf8e9 + 37611ab commit c96c294

40 files changed

Lines changed: 3195 additions & 159 deletions

File tree

Lines changed: 194 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,194 @@
1+
---
2+
name: high-performance-java
3+
description: Use when writing, reviewing, or reshaping HotSpot Java where algorithmic complexity, data-structure choice, throughput, latency, allocation rate, zero-copy, lazy evaluation, non-materialization, runtime specialization, query-engine code generation, Janino, primitive collections, performance libraries, intrinsics, SuperWord auto-vectorization, or C2 assembly matter. Also use for advanced algorithmic problem solving in Java, including dynamic programming, graph/range techniques, cache-aware code shape, and choosing between interpreted, vectorized, and compiled execution paths. Bias toward asymptotic wins first, then the right execution model, then specialized hot-path code, then benchmark and JIT evidence.
4+
---
5+
6+
# High-Performance Java
7+
8+
Use this skill for Java hot paths, algorithm-heavy Java, and JVM-side runtime specialization. Default bias: asymptotic win first, then the right execution model, then fewer allocations, fewer copies, less polymorphism, narrower code shape, stronger evidence.
9+
10+
HotSpot-only v1. Baseline assumptions:
11+
- repo baseline: JDK 21
12+
- current local runtime may be newer
13+
- low-level claims stay provisional until benchmark + JIT evidence agree
14+
- algorithm/data-structure claims stay provisional until they match the actual workload constraints
15+
- runtime codegen claims stay provisional until cold-start cost, warm steady-state behavior, and fallback behavior are all understood
16+
17+
## Core loop
18+
19+
1. Identify the workload shape and constraints.
20+
2. Pick the algorithm and data structure that change the slope.
21+
3. Decide whether the workload should stay interpreted, become vectorized/batched, or justify runtime specialization/code generation.
22+
4. Find the hot loop, hot call chain, or hot operator pipeline.
23+
5. Write the narrow fast path first.
24+
6. Push generic abstraction, materialization, and dispatch out of the loop.
25+
7. Benchmark before claiming improvement.
26+
8. Inspect HotSpot decisions before claiming JVM-level reasons.
27+
28+
## Default coding bias
29+
30+
- Prefer an algorithmic win over a micro win.
31+
- Prefer data structures that fit the operation mix, memory budget, and key domain.
32+
- Prefer the right execution model over reflexively adding code generation.
33+
- Prefer primitive-friendly layouts before boxed object graphs.
34+
- Prefer zero-copy over copy-transform-copy.
35+
- Prefer reuse over per-item allocation.
36+
- Prefer lazy traversal over full materialization.
37+
- Prefer primitives, flat arrays, and tight counted loops in hot paths.
38+
- Prefer monomorphic calls that inline away.
39+
- Prefer specialized lambda/adaptor code for the active workload.
40+
- Prefer one fast path plus one cold fallback over a single generalized hot path.
41+
- Prefer Janino only when generated Java can stay simple, code size can stay bounded, and compile cost can be amortized.
42+
43+
## Hard rules
44+
45+
- Do not micro-optimize a fundamentally wrong algorithm.
46+
- Do not defend a perf change with style arguments alone.
47+
- Do not claim “faster” without a measurement path.
48+
- Do not claim “JIT will optimize this” without checking inlining / compilation evidence.
49+
- Do not add a specialized library until you know what property it buys: fewer allocations, fewer copies, lower contention, off-heap layout, better primitive support, stronger compilation/runtime specialization, or a stronger algorithm.
50+
- Do not introduce Janino or other runtime codegen unless compile latency, cache keys, code-size limits, classloader lifetime, and fallback behavior are explicit.
51+
- Do not compile entire query plans blindly when only a subset of operators is hot or fusible.
52+
- Do not generate fancy modern Java syntax for Janino unless support is verified on the active Janino/runtime combination; conservative generated Java is the default.
53+
- Do not keep elegant-but-generic stream pipelines in verified hot loops.
54+
- Do not pay interface / visitor / wrapper overhead inside the hottest loop unless evidence shows it disappears.
55+
- Do not default to boxed `Map<K, V>` / `Set<T>` / `List<T>` shapes when primitive collections or flat arrays better fit the dominant path.
56+
57+
## Design checklist
58+
59+
Ask these first:
60+
- What are `N`, `Q`, the update/query ratio, and the memory budget?
61+
- Is the main problem asymptotic complexity, cache locality, allocation pressure, branchiness, contention, I/O, or execution-model overhead?
62+
- What operation dominates: membership, counting, top-k, range query, join, shortest path, DP transition, parsing, encoding, filter/projection evaluation, aggregation, or tuple materialization?
63+
- Can the key/value/state space stay primitive or bit-packed?
64+
- Can the workload become offline, batched, sorted, prefix-based, vectorized, or compressed?
65+
- What allocates on the steady-state path?
66+
- What copies bytes, chars, arrays, or collections?
67+
- What materializes intermediate state that could stay streamed or cursor-based?
68+
- What dispatch stays virtual or megamorphic in the inner loop?
69+
- What loop shape blocks scalar replacement, inlining, or SuperWord vectorization?
70+
- What “generic” branch handles cases the active workload never uses?
71+
- How often will a generated shape execute, and can compile cost be amortized?
72+
- Can compiled artifacts be cached by normalized shape, types, nullability, and algorithm choice?
73+
- What is the fallback path for cold queries, oversized generated code, compile failure, or classloader churn?
74+
- What method-size, class-size, or constant-pool limits could the generated code hit?
75+
- Who owns generated classes, caches, and classloaders over time?
76+
77+
## Workflow
78+
79+
### 0) Pick the algorithmic shape
80+
81+
- Estimate the real workload: input size, query count, mutation pattern, latency target, and memory ceiling.
82+
- Choose the algorithm and data structure before tuning loop syntax.
83+
- Favor contiguous, cache-friendly, primitive-heavy representations when semantics allow.
84+
- For dynamic programming, define state, transition cost, base case, iteration order, and whether state compression is possible.
85+
- For graph/range/string problems, look for offline transforms, prefix structures, monotonic structures, or specialized search before hand-tuning.
86+
87+
Read these only when relevant:
88+
- [references/algorithms-data-structures.md](references/algorithms-data-structures.md) for algorithm and data-structure selection.
89+
- [references/advanced-coding-techniques.md](references/advanced-coding-techniques.md) for dynamic programming and advanced problem-solving patterns.
90+
91+
### 1) Choose the execution model before shaping the code
92+
93+
- Ask whether the path should stay interpreted, become vectorized/batched, or justify runtime code generation.
94+
- Prefer interpretation for cold, one-shot, or highly irregular workloads when compile latency will dominate.
95+
- Prefer vectorization/batching when cache-miss hiding, SIMD-friendly processing, or blocking operator boundaries dominate.
96+
- Prefer runtime code generation when the same shape executes repeatedly, per-tuple overhead dominates, and generated code can stay narrow and bounded.
97+
- In query engines, fuse straight pipelines first; split at blocking operators, large mutable state, code-size pressure, or unstable branches.
98+
- If Janino is chosen, keep generated Java conservative, keep helper methods small, and plan for cache + fallback from the start.
99+
100+
Detailed guidance: see [references/codegen-and-janino.md](references/codegen-and-janino.md).
101+
102+
### 2) Shape the code for HotSpot
103+
104+
- Split hot and cold paths.
105+
- Hoist invariant checks and decoding outside the loop.
106+
- Replace generic callback stacks with narrow-path adapters.
107+
- Reuse mutable carriers only when ownership is clear.
108+
- Keep loop bodies predictable, contiguous, and exception-light.
109+
- For generated code, favor explicit loops, primitive locals/fields, simple helper methods, and stable call targets.
110+
111+
Detailed rules: see [references/coding-rules.md](references/coding-rules.md).
112+
113+
### 3) Measure
114+
115+
If you are in this RDF4J repo, use the local benchmark wrapper first:
116+
117+
```bash
118+
scripts/run-single-benchmark.sh --module <module> --class <fqcn> --method <benchmarkMethod>
119+
```
120+
121+
If you are outside RDF4J, use JMH or an existing reproducible micro/macro benchmark.
122+
123+
Measurement workflow: see [references/evidence-workflow.md](references/evidence-workflow.md).
124+
125+
### 4) Explain with JVM evidence
126+
127+
When a benchmark moves, inspect what HotSpot actually did:
128+
- compilation tier
129+
- inlining success/failure
130+
- intrinsic usage when relevant
131+
- allocation pressure
132+
- assembly / C2 logs when needed
133+
134+
Use sibling skill [hotspot-jit-forensics](../hotspot-jit-forensics/SKILL.md) for method-scoped C2 evidence. Use `async-profiler-java-macos` when wall/cpu/alloc evidence is needed on macOS.
135+
136+
### 5) Use libraries intentionally
137+
138+
- Prefer the JDK first when it is close enough and operationally simpler.
139+
- Reach for specialized libraries when they remove boxing, copies, parser overhead, contention, off-heap indirection, or runtime compilation friction the JDK cannot.
140+
- Check dependency health before adding a new library.
141+
- Benchmark the library choice against the simplest credible in-repo baseline.
142+
143+
Library reference: [references/high-performance-java-libraries.md](references/high-performance-java-libraries.md).
144+
145+
### 6) Report honestly
146+
147+
Frame conclusions as:
148+
- hypothesis
149+
- algorithm/data-structure choice
150+
- execution-model choice
151+
- benchmark result
152+
- JIT/profile evidence
153+
- confidence
154+
155+
If assembly is unavailable, say so and fall back to compilation logs, inlining diagnostics, and profile data.
156+
157+
## Trigger examples
158+
159+
Use this skill when the user asks to:
160+
- remove allocation pressure from a parser, iterator, encoder, decoder, or query loop
161+
- make a Java path zero-copy or lazy
162+
- choose the right data structure for a Java workload
163+
- solve a dynamic programming, graph, interval, ranking, or range-query problem in Java under performance constraints
164+
- replace boxed collections with primitive or cache-friendly structures
165+
- choose between the JDK and specialized Java performance libraries
166+
- decide whether a query engine should stay interpreted, become vectorized, or use Janino/runtime code generation
167+
- design generated Java for projections, filters, joins, aggregations, or expression evaluators
168+
- specialize code for one workload instead of many
169+
- explain whether a HotSpot optimization actually happened
170+
- ground a Java perf change in benchmark + C2 evidence
171+
172+
## Reference map
173+
174+
- Algorithms and data structures: [references/algorithms-data-structures.md](references/algorithms-data-structures.md)
175+
- Advanced coding techniques: [references/advanced-coding-techniques.md](references/advanced-coding-techniques.md)
176+
- Codegen and Janino for query engines: [references/codegen-and-janino.md](references/codegen-and-janino.md)
177+
- High-performance Java libraries: [references/high-performance-java-libraries.md](references/high-performance-java-libraries.md)
178+
- Coding rules: [references/coding-rules.md](references/coding-rules.md)
179+
- Evidence workflow: [references/evidence-workflow.md](references/evidence-workflow.md)
180+
- JDK version guardrails: [references/jdk-21-26-notes.md](references/jdk-21-26-notes.md)
181+
182+
## Output contract
183+
184+
When you use this skill, the answer should usually include:
185+
- workload model and asymptotic bottleneck
186+
- execution-model recommendation: interpreted, vectorized/batched, Janino/runtime codegen, or another compilation path
187+
- algorithm and data-structure recommendation
188+
- hot-path hypothesis
189+
- concrete code-shape recommendation
190+
- cache/fallback plan when runtime codegen is part of the design
191+
- library recommendation when a library meaningfully changes the design
192+
- benchmark command or benchmark evidence
193+
- JIT/profile evidence or the missing prerequisite
194+
- a confidence statement tied to the active JDK
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
interface:
2+
display_name: "High-Performance Java"
3+
short_description: "Hot-path Java, plus algorithm/perf-library, execution-model, and runtime-codegen guidance"
4+
default_prompt: "Use $high-performance-java to choose the right algorithm, execution model, data structure, runtime codegen strategy, library, and HotSpot-friendly code shape for a high-performance Java task."

0 commit comments

Comments
 (0)