Skip to content

Commit 02fce3c

Browse files
committed
improve skill
1 parent 3d4fb79 commit 02fce3c

5 files changed

Lines changed: 655 additions & 11 deletions

File tree

.codex/skills/high-performance-java/SKILL.md

Lines changed: 54 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,28 +1,33 @@
11
---
22
name: high-performance-java
3-
description: Use when writing, reviewing, or reshaping HotSpot Java where throughput, latency, allocation rate, zero-copy, lazy evaluation, non-materialization, intrinsics, SuperWord auto-vectorization, or C2 assembly matter. Bias toward specialized hot-path code, then ground claims in benchmarks and JIT evidence.
3+
description: Use when writing, reviewing, or reshaping HotSpot Java where algorithmic complexity, data-structure choice, throughput, latency, allocation rate, zero-copy, lazy evaluation, non-materialization, primitive collections, performance libraries, intrinsics, SuperWord auto-vectorization, or C2 assembly matter. Also use for advanced algorithmic problem solving in Java, including dynamic programming, graph/range techniques, and cache-aware code shape. Bias toward asymptotic wins first, then specialized hot-path code, then benchmark and JIT evidence.
44
---
55

66
# High-Performance Java
77

8-
Use this skill for Java hot paths. Default bias: fewer allocations, fewer copies, less polymorphism, narrower code shape, stronger evidence.
8+
Use this skill for Java hot paths and algorithm-heavy Java. Default bias: asymptotic win first, then fewer allocations, fewer copies, less polymorphism, narrower code shape, stronger evidence.
99

1010
HotSpot-only v1. Baseline assumptions:
1111
- repo baseline: JDK 21
1212
- current local runtime may be newer
1313
- low-level claims stay provisional until benchmark + JIT evidence agree
14+
- algorithm/data-structure claims stay provisional until they match the actual workload constraints
1415

1516
## Core loop
1617

17-
1. Identify the workload shape.
18-
2. Find the hot loop or hot call chain.
19-
3. Write the narrow fast path first.
20-
4. Push generic abstraction, materialization, and dispatch out of the loop.
21-
5. Benchmark before claiming improvement.
22-
6. Inspect HotSpot decisions before claiming JVM-level reasons.
18+
1. Identify the workload shape and constraints.
19+
2. Pick the algorithm and data structure that change the slope.
20+
3. Find the hot loop or hot call chain.
21+
4. Write the narrow fast path first.
22+
5. Push generic abstraction, materialization, and dispatch out of the loop.
23+
6. Benchmark before claiming improvement.
24+
7. Inspect HotSpot decisions before claiming JVM-level reasons.
2325

2426
## Default coding bias
2527

28+
- Prefer an algorithmic win over a micro win.
29+
- Prefer data structures that fit the operation mix, memory budget, and key domain.
30+
- Prefer primitive-friendly layouts before boxed object graphs.
2631
- Prefer zero-copy over copy-transform-copy.
2732
- Prefer reuse over per-item allocation.
2833
- Prefer lazy traversal over full materialization.
@@ -33,15 +38,23 @@ HotSpot-only v1. Baseline assumptions:
3338

3439
## Hard rules
3540

41+
- Do not micro-optimize a fundamentally wrong algorithm.
3642
- Do not defend a perf change with style arguments alone.
3743
- Do not claim “faster” without a measurement path.
3844
- Do not claim “JIT will optimize this” without checking inlining / compilation evidence.
45+
- Do not add a specialized library until you know what property it buys: fewer allocations, fewer copies, lower contention, off-heap layout, better primitive support, or a stronger algorithm.
3946
- Do not keep elegant-but-generic stream pipelines in verified hot loops.
4047
- Do not pay interface / visitor / wrapper overhead inside the hottest loop unless evidence shows it disappears.
48+
- Do not default to boxed `Map<K, V>` / `Set<T>` / `List<T>` shapes when primitive collections or flat arrays better fit the dominant path.
4149

4250
## Design checklist
4351

4452
Ask these first:
53+
- What are `N`, `Q`, the update/query ratio, and the memory budget?
54+
- Is the main problem asymptotic complexity, cache locality, allocation pressure, branchiness, contention, or I/O?
55+
- What operation dominates: membership, counting, top-k, range query, join, shortest path, DP transition, parsing, encoding?
56+
- Can the key/value/state space stay primitive or bit-packed?
57+
- Can the workload become offline, batched, sorted, prefix-based, or compressed?
4558
- What allocates on the steady-state path?
4659
- What copies bytes, chars, arrays, or collections?
4760
- What materializes intermediate state that could stay streamed or cursor-based?
@@ -51,6 +64,18 @@ Ask these first:
5164

5265
## Workflow
5366

67+
### 0) Pick the algorithmic shape
68+
69+
- Estimate the real workload: input size, query count, mutation pattern, latency target, and memory ceiling.
70+
- Choose the algorithm and data structure before tuning loop syntax.
71+
- Favor contiguous, cache-friendly, primitive-heavy representations when semantics allow.
72+
- For dynamic programming, define state, transition cost, base case, iteration order, and whether state compression is possible.
73+
- For graph/range/string problems, look for offline transforms, prefix structures, monotonic structures, or specialized search before hand-tuning.
74+
75+
Read these only when relevant:
76+
- [references/algorithms-data-structures.md](references/algorithms-data-structures.md) for algorithm and data-structure selection.
77+
- [references/advanced-coding-techniques.md](references/advanced-coding-techniques.md) for dynamic programming and advanced problem-solving patterns.
78+
5479
### 1) Shape the code for HotSpot
5580

5681
- Split hot and cold paths.
@@ -84,10 +109,20 @@ When a benchmark moves, inspect what HotSpot actually did:
84109

85110
Use sibling skill [hotspot-jit-forensics](../hotspot-jit-forensics/SKILL.md) for method-scoped C2 evidence. Use `async-profiler-java-macos` when wall/cpu/alloc evidence is needed on macOS.
86111

87-
### 4) Report honestly
112+
### 4) Use libraries intentionally
113+
114+
- Prefer the JDK first when it is close enough and operationally simpler.
115+
- Reach for specialized libraries when they remove boxing, copies, parser overhead, contention, or off-heap indirection the JDK cannot.
116+
- Check dependency health before adding a new library.
117+
- Benchmark the library choice against the simplest credible in-repo baseline.
118+
119+
Library reference: [references/high-performance-java-libraries.md](references/high-performance-java-libraries.md).
120+
121+
### 5) Report honestly
88122

89123
Frame conclusions as:
90124
- hypothesis
125+
- algorithm/data-structure choice
91126
- benchmark result
92127
- JIT/profile evidence
93128
- confidence
@@ -99,21 +134,31 @@ If assembly is unavailable, say so and fall back to compilation logs, inlining d
99134
Use this skill when the user asks to:
100135
- remove allocation pressure from a parser, iterator, encoder, decoder, or query loop
101136
- make a Java path zero-copy or lazy
137+
- choose the right data structure for a Java workload
138+
- solve a dynamic programming, graph, interval, ranking, or range-query problem in Java under performance constraints
139+
- replace boxed collections with primitive or cache-friendly structures
140+
- choose between the JDK and specialized Java performance libraries
102141
- specialize code for one workload instead of many
103142
- explain whether a HotSpot optimization actually happened
104143
- ground a Java perf change in benchmark + C2 evidence
105144

106145
## Reference map
107146

147+
- Algorithms and data structures: [references/algorithms-data-structures.md](references/algorithms-data-structures.md)
148+
- Advanced coding techniques: [references/advanced-coding-techniques.md](references/advanced-coding-techniques.md)
149+
- High-performance Java libraries: [references/high-performance-java-libraries.md](references/high-performance-java-libraries.md)
108150
- Coding rules: [references/coding-rules.md](references/coding-rules.md)
109151
- Evidence workflow: [references/evidence-workflow.md](references/evidence-workflow.md)
110152
- JDK version guardrails: [references/jdk-21-26-notes.md](references/jdk-21-26-notes.md)
111153

112154
## Output contract
113155

114156
When you use this skill, the answer should usually include:
157+
- workload model and asymptotic bottleneck
158+
- algorithm and data-structure recommendation
115159
- hot-path hypothesis
116160
- concrete code-shape recommendation
161+
- library recommendation when a library meaningfully changes the design
117162
- benchmark command or benchmark evidence
118163
- JIT/profile evidence or the missing prerequisite
119164
- a confidence statement tied to the active JDK
Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
interface:
22
display_name: "High-Performance Java"
3-
short_description: "Concise hot-path Java coding skill"
4-
default_prompt: "Use $high-performance-java to write or review a Java hot path with benchmark and HotSpot evidence."
3+
short_description: "Hot-path Java plus algorithm/perf-library guidance"
4+
default_prompt: "Use $high-performance-java to choose the right algorithm, data structure, library, and HotSpot-friendly code shape for a high-performance Java task."
Lines changed: 220 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,220 @@
1+
# Advanced Coding Techniques
2+
3+
Use this reference when the problem needs more than basic loops and collections: dynamic programming, advanced search, state compression, offline transforms, or optimization patterns that materially change runtime.
4+
5+
## Dynamic programming checklist
6+
7+
Before writing code, define:
8+
- state: the minimum information needed to continue
9+
- transition: how one state moves to the next
10+
- base case: the smallest solved states
11+
- order: top-down memoization or bottom-up tabulation
12+
- objective: min, max, count, feasibility, reconstruction
13+
- memory plan: full table, rolling rows, bitset, or sparse map
14+
15+
If any of those are fuzzy, the DP is not ready.
16+
17+
## DP implementation bias in Java
18+
19+
- Prefer flat primitive arrays over nested object graphs.
20+
- Flatten `dp[row][col]` into one array when locality matters.
21+
- Use sentinel values (`INF`, `-1`, impossible masks) instead of wrapper objects.
22+
- Compress dimensions aggressively when a transition only needs prior rows or prior prefixes.
23+
- Use iterative tabulation when recursion depth or call overhead is risky.
24+
- Use memoization when the reachable state space is sparse or pruning is strong.
25+
26+
## Common DP families
27+
28+
### 1D DP
29+
30+
Use for:
31+
- linear decisions
32+
- prefix optimization
33+
- classic knapsack-style transitions
34+
35+
Java notes:
36+
- Often compresses to one array.
37+
- Direction matters: reverse iterate for 0/1 knapsack; forward iterate for unbounded knapsack.
38+
39+
### 2D grid / sequence DP
40+
41+
Use for:
42+
- edit distance
43+
- LCS variants
44+
- path counting
45+
- interval composition
46+
47+
Java notes:
48+
- Two rolling rows often replace the full matrix.
49+
- Keep row-major iteration consistent with memory layout.
50+
51+
### Interval DP
52+
53+
Use for:
54+
- merge cost
55+
- matrix chain multiplication
56+
- optimal parenthesization
57+
- palindrome partitioning
58+
59+
Heuristic:
60+
- Try increasing interval length order.
61+
- Precompute reusable range costs.
62+
63+
### Tree DP
64+
65+
Use for:
66+
- subtree aggregation
67+
- rerooting
68+
- independent set / matching variants on trees
69+
70+
Java notes:
71+
- Iterative traversal can avoid stack overflow.
72+
- Store parent/index arrays once; reuse buffers for passes.
73+
74+
### DAG DP
75+
76+
Use for:
77+
- longest path in DAG
78+
- path counts
79+
- dependency-ordered optimization
80+
81+
Heuristic:
82+
- Topological order first, transitions second.
83+
84+
### Bitmask DP
85+
86+
Use for:
87+
- small `n` subset problems
88+
- travelling-salesman-style state
89+
- assignment and partition variants
90+
91+
Java notes:
92+
- Use `int` masks up to 31 bits, `long` masks up to 63.
93+
- Precompute subset transitions when reused heavily.
94+
- Beware exponential memory growth; consider meet-in-the-middle.
95+
96+
### Digit DP
97+
98+
Use for:
99+
- counting numbers with digit constraints
100+
- lexicographic numeric constraints
101+
102+
State usually includes:
103+
- position
104+
- tight/limited flag
105+
- started/leading-zero flag
106+
- problem-specific accumulator
107+
108+
## DP optimization patterns
109+
110+
### Prefix/suffix acceleration
111+
112+
If a transition scans prior states, ask whether prefix minima/maxima/sums can reduce it from `O(n^2)` to `O(n)`.
113+
114+
### Monotonic queue optimization
115+
116+
Use when transitions need min/max over a sliding window.
117+
118+
### Divide-and-conquer DP optimization
119+
120+
Use when the optimal split point is monotonic across rows or columns.
121+
122+
### Convex hull trick / Li Chao tree
123+
124+
Use when transitions are of the form:
125+
- `dp[i] = min_j(m[j] * x[i] + b[j])`
126+
- `max` variant of the same
127+
128+
Only use when the algebra really matches.
129+
130+
### Bitset DP
131+
132+
Use when boolean subset transitions can become word-parallel bit operations.
133+
134+
Examples:
135+
- subset sum
136+
- knapsack feasibility
137+
- reachability layers
138+
139+
### State compression
140+
141+
Reduce dimensions by:
142+
- keeping only prior row/column
143+
- encoding booleans into bits
144+
- coordinate-compressing sparse values
145+
- using ids instead of objects
146+
147+
## Search and optimization patterns
148+
149+
### Binary search on answer
150+
151+
Use when:
152+
- feasibility is monotonic
153+
- exact objective is hard but checking a threshold is easier
154+
155+
### Meet-in-the-middle
156+
157+
Use when:
158+
- brute force is `2^n`
159+
- `n` is small enough to split into two `2^(n/2)` halves
160+
161+
### Branch and bound
162+
163+
Use when:
164+
- you can compute tight upper/lower bounds
165+
- a good heuristic ordering prunes much of the tree
166+
167+
### Iterative deepening
168+
169+
Use when:
170+
- memory is tight
171+
- solution depth is unknown but usually shallow
172+
173+
### Offline query processing
174+
175+
Use when:
176+
- query order is irrelevant
177+
- sorting queries/events lets you reuse structure updates
178+
179+
## Greedy and exchange-thinking
180+
181+
Before building DP or search, test whether a greedy proof exists:
182+
- local choice stays globally optimal
183+
- exchange argument repairs any non-greedy optimal solution
184+
- matroid-like or interval-scheduling structure is present
185+
186+
If greedy works, it often beats DP both asymptotically and operationally.
187+
188+
## Range and sequence patterns
189+
190+
- Sliding window: monotonic boundary expansion or contraction.
191+
- Two pointers: sorted arrays, pair/triple sums, dedup, partitioning.
192+
- Monotonic stack: next greater/smaller, histogram, span problems.
193+
- Difference arrays: batch range updates.
194+
- Prefix sums / xor / hashes: cheap repeated range queries.
195+
196+
## Java-specific implementation notes
197+
198+
- Avoid recursion for deep graphs, trees, or DP unless the depth bound is small.
199+
- Replace tuple objects with parallel arrays or packed longs in hot paths.
200+
- Pre-size arrays and reusable buffers for repeated test cases.
201+
- Be explicit about overflow; use `long` for counts/costs unless `int` is proven safe.
202+
- Separate correctness code from hot code paths once the algorithm is clear.
203+
204+
## Problem-solving ladder
205+
206+
When stuck, try this order:
207+
1. Can I sort or batch the work?
208+
2. Can I precompute prefix, suffix, or compressed state?
209+
3. Can a different data structure remove a nested loop?
210+
4. Is the problem actually graph, interval, or DP in disguise?
211+
5. Can the state shrink to primitives or bits?
212+
6. Can I prove greedy, monotonicity, or convexity?
213+
214+
## Red flags
215+
216+
- DP state includes fields that do not affect future transitions.
217+
- Memoization key is a heavyweight object when a few ints suffice.
218+
- Full `O(n^2)` table retained even though only one frontier is used.
219+
- Search explores symmetric states repeatedly.
220+
- A library data structure is used where a flat array plus sort is enough.

0 commit comments

Comments
 (0)