Skip to content

Make Call composition explicit; centralize region semantics (refs #1086)#1097

Merged
elazarg merged 12 commits intomainfrom
split-call
Apr 25, 2026
Merged

Make Call composition explicit; centralize region semantics (refs #1086)#1097
elazarg merged 12 commits intomainfrom
split-call

Conversation

@elazarg
Copy link
Copy Markdown
Collaborator

@elazarg elazarg commented Apr 24, 2026

Summary

Addresses two areas from #1086 in 6 conservative, behavior-preserving commits:

  1. CallContract — Move helper argument requirements, return contract, and side-effects out of Call into Call::contract. Identity stays on Call::target (a new CallTarget).
  2. region_semantics module — New src/crab/region_semantics.{hpp,cpp} owns is_region_access_type, primary_kind_variable_for_type, and region_bounds. Replaces the four per-region check_access_* helpers in EbpfChecker with one driver, collapses the parallel switches in ValidAccess and ValidMapKeyValue, and groups the type cases in the transformer's do_load.

Each commit builds and passes tests independently.

Commits

  • 24bddec7 Move helper signature/effects out of Call into CallContract
  • 7646eaf0 Group call identity/diagnostics into CallTarget
  • 04110856 Centralize region offset and bounds in region_semantics
  • 5a3b9de1 Use is_region_access_type to collapse ValidAccess region cases
  • b2ee5942 Group transformer do_load cases by behavior
  • dcae3d26 Rename region_offset_variable to primary_kind_variable_for_type

Notes for reviewers

  • Behavior is unchanged. Bound-message text is preserved character-for-character; the test-data/atomic.yaml and test-data/loop.yaml strings (r2.shared_region_size, packet_size, etc.) still match.
  • Two design hedges I made on the way, both reversible:
    • Did not reconcile the checker/transformer asymmetry on T_SOCKET/T_BTF_ID (checker rejects, transformer havocs). Kept the defensive havoc and added a comment in do_load. Anyone planning to unify those types in a follow-up should add a regression test pinning current behavior first.
    • Did not flatten region_bounds into a uniform record despite the issue's suggestion. The bounds shapes differ structurally enough (stack uses r10.stack_offset - subprogram_stack_size, packet has an optional override, shared/alloc_mem use per-register variables) that a uniform record would force function-typed fields. Kept it as a switch returning a struct.
  • Drop candidates if too large: commits 2 (CallTarget) and 5 (do_load reorganization) are the most droppable. The load-bearing commits are 1 (CallContract) and 3 (region_semantics).

Test plan

  • cmake --build build --parallel clean at every commit
  • ./bin/tests reports 8814 assertions: 8578 passed | 236 failed as expected at every commit (matches main baseline)
  • CI green
  • Reviewer eyes on region_semantics.{hpp,cpp}

🤖 Generated with Claude Code

elazarg and others added 6 commits April 24, 2026 15:09
Group the argument requirements (singles, pairs), return contract
(return_ptr_type, return_nullable, is_map_lookup), and side-effects
(reallocate_packet, alloc_size_reg) under Call::contract.

This separates per-helper semantics from per-site identity/diagnostics
(name, is_supported) and from per-call-site data (stack_frame_prefix),
so that assertion extraction and the abstract transformer consume one
named contract rather than scattered fields. No behavior change.

Refs #1086.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Elazar Gershuni <elazarg@gmail.com>
Move {func, kind, name, is_supported, unsupported_reason} from Call
into a new CallTarget. Call is now a tight composition:
Call = {target: CallTarget, contract: CallContract, stack_frame_prefix}.

CallTarget owns the equality (matches only func+kind, as before);
Call::operator== delegates to it. The TODO comment in syntax.hpp
("move name and signature information somewhere else") is now done.

Refs #1086.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Elazar Gershuni <elazarg@gmail.com>
New module src/crab/region_semantics.{hpp,cpp} owns:
  * region_offset_variable(reg, type)   -- moved from type_to_num
  * region_bounds(type, RegPack, ctx)   -- the bounds rule per region

Replaces the four EbpfChecker::check_access_{stack,context,packet,
shared} private methods with a single require_region_bounds driver
that consults region_bounds. The four cases in ValidAccess and the
two cases in ValidMapKeyValue now go through the same driver, so the
per-region rules are auditable in one place. No behavior change --
diagnostic message text and bound expressions are preserved.

Refs #1086.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Elazar Gershuni <elazarg@gmail.com>
Add is_region_access_type predicate alongside region_offset_variable
and region_bounds in region_semantics.hpp. The five region cases in
EbpfChecker::operator()(ValidAccess) (T_PACKET, T_STACK, T_CTX,
T_SHARED, T_ALLOC_MEM) now share one block: look up the offset
variable, compute access lb/ub, call require_region_bounds, then
fall through to a small inner switch only for per-region nuance
(stack initialization check, shared/alloc_mem null check).

ValidMapKeyValue is converted from an if/else chain to a switch with
an explicit default that throws the same "Only stack, packet, or
shared can be used as a parameter" message.

Adding a new region type (e.g., a future writable-context flavor for
issue #639) is now a 1-line edit to is_region_access_type plus the
matching arm in region_bounds. No behavior change.

Refs #1086.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Elazar Gershuni <elazarg@gmail.com>
Reorganize the type switch in EbpfTransformer::do_load into three
behavior buckets instead of eleven cases:

  * specialized: T_CTX, T_STACK
  * loadable but contents untracked: T_PACKET, T_SHARED, T_ALLOC_MEM,
    T_SOCKET, T_BTF_ID -- a single havoc call
  * not dereferenceable: T_UNINIT, T_MAP, T_MAP_PROGRAMS, T_NUM,
    T_FUNC -- explicit no-op

Removes two dead `addr` computations in T_PACKET and T_SHARED branches
(do_load_packet_or_shared does not consume addr).

Document the intentional checker/transformer asymmetry: T_SOCKET and
T_BTF_ID dereferences are rejected by the checker, but the transformer
keeps the defensive havoc as a sound fallback if the checker is ever
weakened. Deliberately preserved -- not silently reconciled -- per the
caveat in the design discussion.

Refs #1086.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Elazar Gershuni <elazarg@gmail.com>
The function returns more than just region offsets: for T_MAP and
T_MAP_PROGRAMS it returns the file descriptor variable; only for
the bounds-checked region types (T_STACK, T_CTX, T_PACKET, T_SHARED,
T_ALLOC_MEM) does it return an in-region offset. The "region" prefix
clashed with is_region_access_type, which is strictly narrower.

What the function actually returns is the first ("primary") kind
variable from type_to_kinds for the given type -- the in-region
offset for region pointers, the fd for map types. Some types
(T_SHARED, T_STACK, T_ALLOC_MEM) own additional kind variables
(shared_region_size, stack_numeric_size, alloc_mem_size); those are
accessed directly via RegPack and are not returned here.

Also drops the now-redundant get_type_offset_variable trampoline in
type_to_num.cpp and renames TypeToNumDomain::get_type_offset_variable
to primary_kind_variable_for_type for consistency.

No behavior change.

Refs #1086.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Elazar Gershuni <elazarg@gmail.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 24, 2026

Warning

Rate limit exceeded

@elazarg has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 23 minutes and 1 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 23 minutes and 1 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 6bad16cd-81a8-467b-a075-05abf2b5cfd6

📥 Commits

Reviewing files that changed from the base of the PR and between 0d5a724 and e6e4ad8.

📒 Files selected for processing (7)
  • src/crab/ebpf_checker.cpp
  • src/crab/ebpf_transformer.cpp
  • src/crab/region_semantics.cpp
  • src/ir/assertions.cpp
  • src/ir/call_resolver.cpp
  • src/linux/linux_platform.cpp
  • src/printing.cpp
📝 Walkthrough

Walkthrough

Centralizes memory-region bounds checking and defers call-contract resolution: introduces region_semantics with is_region_access_type, primary_kind_variable_for_type, and region_bounds abstractions; refactors call representation into Call (key) and ResolvedCall (resolved metadata) with CallContract for argument/return shaping; moves call resolution from unmarshal-time to on-demand via resolve(call, info).

Changes

Cohort / File(s) Summary
Region Semantics
src/crab/region_semantics.hpp, src/crab/region_semantics.cpp
New module for memory-region abstractions: is_region_access_type predicate, primary_kind_variable_for_type mapping types to variables, region_bounds computing inclusive access bounds with diagnostic strings.
Call Resolution
src/ir/call_resolver.hpp, src/ir/call_resolver.cpp
New module implementing centralized resolve(Call, ProgramInfo) that dispatches helpers/kfuncs/builtins to platform-provided callbacks and returns ResolvedCall with support status, diagnostics, and contract details.
Call Contract & Syntax
src/ir/syntax.hpp
Introduces CallKind::builtin variant, refactors Call to defaulted equality (key-only), adds CallContract (argument/return metadata) and ResolvedCall (resolution result), updates print(InstructionSeq, ...) signature with optional ProgramInfo*.
Call Unmarshal & Parse
src/ir/unmarshal.hpp, src/ir/unmarshal.cpp, src/ir/parse.cpp
Removes make_call helper and full-resolution logic from unmarshal; instruction parsing now directly constructs bare Call{.func, .kind} deferring contract resolution to resolve().
Platform & Kfunc
src/platform.hpp, src/linux/kfunc.hpp, src/linux/kfunc.cpp, src/linux/linux_platform.cpp
Updates kfunc and builtin call handlers to return std::optional<ResolvedCall> instead of std::optional<Call>; make_kfunc_call and builtin resolvers now populate contract fields under res.contract rather than top-level call fields.
Type-to-Num Domain
src/crab/type_to_num.hpp, src/crab/type_to_num.cpp
Replaces get_type_offset_variable with primary_kind_variable_for_type, delegating type-to-variable mapping to prevail::primary_kind_variable_for_type from region_semantics.
eBPF Checker & Transformer
src/crab/ebpf_checker.cpp, src/crab/ebpf_transformer.cpp
Centralizes region bounds enforcement via require_region_bounds(); replaces per-type offset lookups with primary_kind_variable_for_type; refactors call transformation to use resolved contracts; updates pointer load dispatch to use unified havoc behavior.
IR & Verification
src/ir/cfg_builder.cpp, src/ir/assertions.cpp
Updates call support validation and assertion extraction to resolve Call via resolve() and read from ResolvedCall fields; cfg_builder no longer applies stack_frame_prefix to cloned Call instructions.
Printing
src/printing.cpp, src/test/test_print.cpp
Extends print API with optional ProgramInfo* for context-aware call rendering; resolves Call to display helper names and contract-derived argument registers; fallback to cheaper key-only format when info is unavailable.
Tests
src/test/test_marshal.cpp, src/test/test_platform_tables.cpp, src/test/ebpf_yaml.cpp
Updates test construction and validation to resolve Call objects through resolve() and assert resolved properties (ResolvedCall.is_supported, contract.singles/pairs) instead of direct Call fields; test platform stubs updated to return ResolvedCall.

Possibly related issues

Possibly related PRs

Suggested labels

codex, feature

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 15.38% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the two main objectives: explicit Call composition and centralized region semantics (with issue reference).
Description check ✅ Passed The description provides comprehensive context: specific commits, behavior preservation notes, design decisions, and test results confirming the changeset scope.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch split-call

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Call::stack_frame_prefix was written in cfg_builder::pass_inline_local_calls
(symmetric with Exit::stack_frame_prefix) but never read. The transformer's
call.stack_frame_prefix read at line 973 is on a CallLocal&, not a Call&.
The live readers of stack_frame_prefix are Exit (transformer + assertions)
and CallLocal (transformer only).

While here, document the key-value structure that the equality operators
already imply: (func, kind) is the key, and CallTarget's diagnostic fields
plus CallContract are both snapshots of the platform lookup taken at
unmarshal time. `Callx` and `CallBtf` are the key-only sibling instructions
that resolve the contract on demand rather than caching it.

No behavior change.

Refs #1086.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Elazar Gershuni <elazarg@gmail.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/crab/ebpf_checker.cpp`:
- Around line 292-297: The assert dereferencing *offset_var must be replaced
with an explicit verifier failure path: instead of
assert(offset_var.has_value()), check primary_kind_variable_for_type(s.reg,
type) result and if it is empty call the checker/verifier failure routine (or
return an error) with a clear message mentioning the region and type so we fail
closed; only when offset_var.has_value() proceed to call lb_ub_access_pair(s,
*offset_var). Ensure you reference is_region_access_type,
primary_kind_variable_for_type, offset_var, lb_ub_access_pair, s.reg and type
when implementing the early-return failure path.
- Around line 256-260: The T_PACKET branch is calling
require_region_bounds(T_PACKET, access_reg, lb, ub) without supplying the
packet_size() override used by ValidAccess, causing packet-backed map key/value
checks to use the generic region_bounds ceiling; update this branch so
require_region_bounds receives the same packet-size bound used elsewhere (thread
variable_registry.packet_size() or the packet_size() value through the call
path) when computing ub from access_reg.packet_offset + width, or modify
require_region_bounds (and its call-sites) to accept an optional packet_size
override and pass variable_registry.packet_size() here (ensure symbols
mentioned: T_PACKET, access_reg.packet_offset, width, require_region_bounds,
ValidAccess, variable_registry.packet_size()).

In `@src/crab/ebpf_transformer.cpp`:
- Around line 590-615: The no-op branch in do_load (inside
dom.state.join_over_types lambda) lacks an explicit comment explaining why
skipping T_UNINIT, T_MAP, T_MAP_PROGRAMS, T_NUM, and T_FUNC is sound; update the
comment above that case list in ebpf_transformer.cpp to state that
ebpf_checker.cpp::ValidAccess rejects dereferences for each of those types (cite
ValidAccess) so the transformer’s no-op is safe even if the checker is bypassed,
and also add a short note that T_ALLOC_MEM is intentionally handled in the
havoc/packet/shared branch (or mention if this is a precision change vs prior
behavior) so readers understand the design choice and soundness argument.

In `@src/crab/region_semantics.cpp`:
- Around line 68-74: The default branch currently uses assert(false, ...) and
then returns a dummy RegionBounds which disappears in release builds; change
this to an unconditional fail that is always executed (e.g., call
CRAB_ERROR("region_bounds called on non-region type") or throw std::logic_error
with the same message) so the contract is enforced in all builds, and keep or
add a return of a default-initialized RegionBounds only after that error call to
satisfy the compiler; update the default case in the switch inside region_bounds
(and any include for <stdexcept> if you choose throw) to reference RegionBounds
and use the unconditional error instead of assert.

In `@src/ir/syntax.hpp`:
- Around line 195-238: The current Call::operator== only defers to CallTarget's
key-only equality and thus conflates unsupported helper calls with relocated
builtins; update equality so Call objects differ when the target's diagnostics
or the resolved contract differ. Change Call::operator== (not
CallTarget::operator==) to compare the target's identifying key plus diagnostic
fields that affect behavior (at minimum target.is_supported and
target.unsupported_reason or name) and the Call::contract contents (singles,
pairs, return_ptr_type, return_nullable, is_map_lookup, reallocate_packet,
alloc_size_reg) so an unsupported helper and a relocated builtin with a concrete
contract no longer compare equal; locate these symbols in the diff as
Call::operator==, CallTarget, and CallContract and implement the field-wise
comparison (or compare CallContract as a whole) inside Call::operator==.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: e1f1adf5-e7d1-4887-9811-8ceba1f599f6

📥 Commits

Reviewing files that changed from the base of the PR and between 6bbb598 and 7f99e3b.

📒 Files selected for processing (16)
  • src/crab/ebpf_checker.cpp
  • src/crab/ebpf_transformer.cpp
  • src/crab/region_semantics.cpp
  • src/crab/region_semantics.hpp
  • src/crab/type_to_num.cpp
  • src/crab/type_to_num.hpp
  • src/ir/assertions.cpp
  • src/ir/cfg_builder.cpp
  • src/ir/marshal.cpp
  • src/ir/syntax.hpp
  • src/ir/unmarshal.cpp
  • src/linux/kfunc.cpp
  • src/linux/linux_platform.cpp
  • src/printing.cpp
  • src/test/test_marshal.cpp
  • src/test/test_platform_tables.cpp

Comment thread src/crab/ebpf_checker.cpp
Comment thread src/crab/ebpf_checker.cpp
Comment thread src/crab/ebpf_transformer.cpp
Comment thread src/crab/region_semantics.cpp
Comment thread src/ir/syntax.hpp Outdated
elazarg and others added 4 commits April 24, 2026 18:26
The builtins in linux_platform.cpp (memset, memcpy, memmove, memcmp,
extern_unspecified) are resolved by platform.get_builtin_call(func)
rather than platform.get_helper_prototype(func, type). That distinction
was previously implicit: builtin Calls defaulted to CallKind::helper
because the kind field wasn't set. This change makes the resolution
path a first-class property of the Call.

Also fixes the fallback branch in Unmarshaller::makeJmp where a PC was
flagged builtin by info.builtin_call_offsets but the platform has no
resolver; that Call now correctly carries CallKind::builtin.

Sets up a key-only Call in a follow-up: once kind distinguishes the
three resolution paths (helper / kfunc / builtin), a pure (func, kind)
key is sufficient to re-resolve the contract on demand.

Refs #1086.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Elazar Gershuni <elazarg@gmail.com>
Adds src/ir/call_resolver.{hpp,cpp} with:

    ResolvedCall resolve(const Call&, const ProgramInfo&);

dispatching by CallKind:
  * helper  -> make_call(func, platform, type)
  * kfunc   -> platform.resolve_kfunc_call(func, type, ...)
  * builtin -> platform.get_builtin_call(func)

ResolvedCall bundles the key (`call`) with the resolved name, support
status, and CallContract. Failure paths (unknown helper, missing kfunc
resolver, unsupported builtin) return a ResolvedCall with
is_supported=false and a populated unsupported_reason.

No consumer uses resolve() yet; that's the next commit. This commit
only introduces the facade. No behavior change.

Refs #1086.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Elazar Gershuni <elazarg@gmail.com>
Consumer migration:
  * assertions.cpp AssertExtractor::operator()(const Call&) now reads
    contract from resolve(call, info).contract instead of call.contract.
  * EbpfTransformer::operator()(const Call&) resolves once at entry
    and reads name/contract/etc. from the ResolvedCall.
  * cfg_builder::check_instruction_feature_support now takes the full
    ProgramInfo (was just const ebpf_platform_t&) and resolves on the
    fly to check is_supported / unsupported_reason.

Bug fix in resolve_helper: the initial call_resolver.cpp called
make_call() directly, which only consults get_helper_prototype() and
does *not* enforce is_helper_usable(func, program_type). On Linux,
get_helper_prototype() returns a default prototype for unavailable
helpers, which classifies as a supported integer-returning call with
no arguments -- letting helpers that the unmarshal path previously
rejected slip through unchecked. resolve_helper now checks
is_helper_usable() first and reports the same "helper function is
unavailable on this platform" diagnostic the unmarshal used to stamp.

Adds a regression test: calling bpf_get_socket_cookie (helper 46)
from an xdp program -- a combination rejected at the platform-type
gate -- must still throw "rejected: helper function is unavailable
on this platform". Caught by Codex review during development.

No other behavior change: writers still populate the snapshot on
Call (Call.target.*, Call.contract.*); that redundancy is removed
in the next commit.

Refs #1086.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Elazar Gershuni <elazarg@gmail.com>
The IR Call instruction is now just the two fields that form the key:

    struct Call {
        int32_t func{};
        CallKind kind{CallKind::helper};
        constexpr bool operator==(const Call&) const = default;
    };

Everything else -- helper name, is_supported/unsupported_reason, and
the entire CallContract -- lives on ResolvedCall, produced on demand by
resolve(call, program_info). The snapshot the IR used to cache is gone;
there is one source of truth (platform.get_helper_prototype /
resolve_kfunc_call / get_builtin_call) and resolution is a lookup.

Moved pieces:
  * The ~150-line helper-argument parsing that lived in
    Unmarshaller::makeCall is now in call_resolver.cpp resolve_helper,
    where the contract is consumed.
  * Unmarshaller::makeJmp returns Call{.func=imm, .kind=...} only; the
    is_supported/is_helper_usable checks moved to resolve_helper.
  * make_call(imm, platform, type) is removed. Callers construct Call
    directly (FuncConstraint, Callx handling, parse.cpp).
  * platform typedefs: get_builtin_call and resolve_kfunc_call now
    return std::optional<ResolvedCall>. get_builtin_call_linux and
    make_kfunc_call (kfunc.cpp) updated.
  * pass_resolve_kfunc_calls stores Label -> Call (key only) in its map,
    discarding the ResolvedCall; downstream consumers resolve on demand.

Printing:
  * operator<<(ostream, Instruction) for Call is cheap (no Program): it
    prints "r0 = call:<func>" with a kind prefix.
  * CommandPrinterVisitor now takes an optional ProgramInfo*. When set,
    Call prints rich ("r0 = bpf_map_lookup_elem:1(map_fd r1, map_key
    r2)"). DetailedPrinter does the same via Program::info().
  * print(InstructionSeq, ...) gains an optional ProgramInfo* tail
    argument. test_print.cpp passes the ELF program's info so rich
    printing matches the golden .asm files.

Tests updated to construct Call keys directly and call resolve() when
they want to inspect name/contract.

Refs #1086.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Elazar Gershuni <elazarg@gmail.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
src/ir/assertions.cpp (1)

76-147: 🧹 Nitpick | 🔵 Trivial

LGTM — resolves on demand, preserves per-arg assertion logic.

The switch from call.singles/pairs to resolved.contract.singles/pairs is mechanical and soundness-preserving: the same assertions are produced for any supported call. For unsupported calls, contract is empty so no preconditions are asserted here — this is only sound because unsupported calls are already rejected earlier (by cfg_builder::check_instruction_feature_support, analogous to the CallBtf handling at line 159). Worth a short comment on line 77 locking in that invariant for future auditors, since the assumption is no longer visible from the Call type itself.

As per coding guidelines: "Encode the assumptions an analysis relies on — preconditions, lattice properties, monotonicity — directly in code comments, assertions, or type-level checks before trusting experiments."

💡 Optional annotation
     vector<Assertion> operator()(const Call& call) const {
+        // Invariant: cfg_builder has already rejected unsupported calls before this point,
+        // so an empty resolved.contract here corresponds to a supported call with no preconditions.
         const ResolvedCall resolved = resolve(call, info);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ir/assertions.cpp` around lines 76 - 147, Add a short explanatory comment
at the start of operator()(const Call& call) (right after the call to
resolve(call, info)) that documents the relied-upon invariant: unsupported calls
have been rejected earlier (via cfg_builder::check_instruction_feature_support)
so using resolved.contract.singles/pairs here is safe; optionally add a debug
assert that the overall call feature support has been checked (or that
resolved.contract is empty only for unsupported calls) to make the assumption
explicit for future auditors.
src/crab/ebpf_transformer.cpp (2)

819-826: 🛠️ Refactor suggestion | 🟠 Major

Encode the "call is supported" precondition explicitly.

operator()(const Call&) assumes cfg_builder already rejected unsupported calls, so resolved.contract is meaningful. If a Call with is_supported == false ever slips through (e.g., from a new call site), the default-constructed contract quietly yields T_NUM on r0 (line 958) with no diagnostic. Per coding guidelines, preconditions should be encoded as assertions.

🛡️ Proposed fix
     const ResolvedCall resolved = resolve(call, context.program_info());
+    assert(resolved.is_supported && "Call reached transformer but was not marked supported; cfg_builder should have rejected it");
     std::optional<Reg> maybe_fd_reg{};

As per coding guidelines: "Encode the assumptions an analysis relies on—preconditions, lattice properties, monotonicity—directly in code comments, assertions, or type-level checks before trusting experiments."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/crab/ebpf_transformer.cpp` around lines 819 - 826, operator()(const
Call&) assumes the call is supported but doesn't enforce it; add an explicit
precondition check/assertion so we never proceed with a default-constructed
contract. Before using resolved.contract (and before the loop over
resolved.contract.singles), assert or guard that the call is supported (e.g.,
check call.is_supported() or resolved.contract.is_supported) and bail out or
report if not; this ensures EbpfTransformer::operator()(const Call&),
resolve(call, ...), and subsequent uses of resolved.contract/r0 cannot operate
on an unsupported/default contract.

877-905: ⚠️ Potential issue | 🔴 Critical

Move addr/width computation into the T_STACK branch — soundness bug.

primary_kind_variable_for_type(param.mem) calls get_type() once before the join, selecting a single type (e.g., T_SHARED). If param.mem can be both T_STACK and T_SHARED, the addr computed at line 886 corresponds to T_SHARED's primary variable. When join_over_types then iterates and hits T_STACK at line 894, stack.havoc and stack.store_numbers use an address derived from the wrong type's variable, causing incorrect memory state updates for stack-typed pointers.

Move the call to primary_kind_variable_for_type(param.mem, type) (free function with explicit type) inside the T_STACK branch and compute addr/width there, exactly as done at lines 854–859:

Pattern to follow
dom.state = dom.state.join_over_types(param.mem, [&](TypeToNumDomain& state, const TypeEncoding type) {
    if (type == T_STACK) {
        const auto offset = primary_kind_variable_for_type(param.mem, type);
        if (!offset.has_value()) {
            return;
        }
        const Interval addr = state.values.eval_interval(*offset);
        const Interval width = state.values.eval_interval(reg_pack(param.size).svalue);
        // havoc and store_numbers here
    }
});
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/crab/ebpf_transformer.cpp` around lines 877 - 905, The addr/width are
computed before the join using
dom.state.primary_kind_variable_for_type(param.mem) which picks a single type,
causing wrong addresses when param.mem can be T_STACK; move the
primary_kind_variable_for_type(param.mem, type) call and the Interval addr/width
computations into the T_STACK branch inside the dom.state.join_over_types lambda
(i.e., compute const auto offset = primary_kind_variable_for_type(param.mem,
type); check has_value(); then compute const Interval addr =
state.values.eval_interval(*offset) and const Interval width =
state.values.eval_interval(reg_pack(param.size).svalue)) and then call
stack.havoc(state.values, ...) and stack.store_numbers(addr, width) there,
removing the outer addr/width usage and the store_numbers call outside the join.
♻️ Duplicate comments (2)
src/crab/ebpf_checker.cpp (2)

291-296: ⚠️ Potential issue | 🟠 Major

Replace the assert with an explicit verifier failure path.

If is_region_access_type(type) and primary_kind_variable_for_type() ever drift, release builds dereference an empty optional here instead of failing closed with a verifier error.

As per coding guidelines: "When in doubt, favor explicit error handling and early exits to surface problems instead of deferring to implicit behavior in C++ code."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/crab/ebpf_checker.cpp` around lines 291 - 296, The code currently asserts
that primary_kind_variable_for_type(s.reg, type) has a value inside the
is_region_access_type(type) branch, which can lead to undefined behavior in
release builds; replace the assert with an explicit verifier failure path: check
if offset_var.has_value() immediately after calling
primary_kind_variable_for_type, and if not, invoke the verifier error/early-exit
mechanism used in this module (e.g., return verifier_fail(...) or call
report_verifier_error with a clear message including s.reg and type), ensuring
you do not dereference offset_var before returning; then proceed to call
lb_ub_access_pair(s, *offset_var) only when the optional is present.

255-258: ⚠️ Potential issue | 🟠 Major

Pass the packet-size override through this packet branch.

ValidAccess threads variable_registry.packet_size() into packet bounds, but this ValidMapKeyValue path calls require_region_bounds(T_PACKET, ...) without it. Packet-backed key/value checks can therefore use a different ceiling than real packet dereferences.

As per coding guidelines: "Any change that affects analysis results should spell out the argument for soundness: what inputs are assumed, what invariants are maintained, and how the change preserves them."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/crab/ebpf_checker.cpp` around lines 255 - 258, The T_PACKET branch in
ValidMapKeyValue uses lb = access_reg.packet_offset and ub = lb + width but
calls require_region_bounds(T_PACKET, access_reg, lb, ub) without forwarding the
packet-size override, causing mismatched ceilings versus ValidAccess; update
this branch to thread the same packet-size variable used by ValidAccess
(variable_registry.packet_size()) into the bounds check (i.e., obtain the
packet_size override from the VariableRegistry and pass it through to
require_region_bounds), and if require_region_bounds needs an extra parameter
add/propagate a packet_size argument through the call sites and adjust its
signature/implementation to use that override for computing/verifying bounds to
preserve the same invariants and soundness.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/crab/ebpf_transformer.cpp`:
- Around line 1268-1271: The code calls .value() on
primary_kind_variable_for_type(src_reg, src_type) (and
primary_kind_variable_for_type(src_reg, type)) without checking the optional,
which can throw when that function returns nullopt; update both sites (the
ArithBinOp::ADD call involving state.values->apply and the later similar use) to
mirror the existing dst_offset guard: store the src optional in a local (e.g.,
src_offset) and wrap the apply()/use in an if (src_offset) { ...
src_offset.value() ... } block, or otherwise check/handle the optional before
dereferencing; ensure you reference primary_kind_variable_for_type,
join_over_types, state.values->apply, ArithBinOp::ADD, bin.dst and src_reg when
making the change.

In `@src/ir/call_resolver.cpp`:
- Around line 185-188: The platform callback return value should not be trusted
to have ResolvedCall.call set correctly; after calling
info.platform->resolve_kfunc_call(call.func, info.type, &why_not) capture the
returned ResolvedCall into a local (e.g. auto r = *c), set r.call = call, and
then return r instead of returning *c directly; apply the same pattern for
resolve_builtin (capture, set .call = call, then return) so the call-field
invariant is enforced in this central place rather than relying on platform
implementations.
- Around line 154-159: The bound check inside the PTR_TO_*_MEM case in
call_resolver.cpp is unreachable because args has fixed size 7 and the outer
loop uses i < args.size() - 1; remove the redundant if (args.size() - i < 2)
block or replace it with a static_assert or a clarifying comment to indicate it
is intentionally defensive; update the case handling for
EBPF_ARGUMENT_TYPE_CONST_SIZE / EBPF_ARGUMENT_TYPE_CONST_SIZE_OR_ZERO and
mention the EBPF_ARGUMENT_TYPE_DONTCARE sentinel to make the real "missing size"
condition explicit (refer to helper_prototype_name, args, and the PTR_TO_*_MEM
case to locate the code).
- Line 197: The diagnostic message returned by make_unsupported currently says
"helper function is unavailable on this platform" but should say "builtin
function is unavailable on this platform"; update the call to make_unsupported
(the site using make_unsupported(call, std::to_string(call.func), ...)) to use
the corrected string and then update the test expectations in test_marshal.cpp:
change the expected string at the assertion around line 927 to "builtin function
is unavailable on this platform" and adjust the substring matchers used at the
assertions around lines 1419 and 1434 to match the new "builtin" wording.
- Around line 61-69: resolve_helper currently dereferences info.platform
(calling info.platform->is_helper_usable, get_helper_prototype, etc.) without a
null check; add the same defensive guard used in resolve_kfunc/resolve_builtin:
if info.platform is null, return make_unsupported(call,
std::to_string(call.func) or empty name, "platform unavailable") (or otherwise
return the same unsupported result shape those functions use). Specifically, in
resolve_helper check if (!info.platform) before calling
info.platform->is_helper_usable/get_helper_prototype, and handle the early
return by calling make_unsupported (using the helper name resolution logic
around get_helper_prototype only after confirming info.platform is non-null).

In `@src/ir/syntax.hpp`:
- Around line 198-208: Call currently collapses kfunc identity to (func, kind)
losing module info; add a module identifier field to struct Call (e.g., int32_t
module{}) and update its equality (operator==) to include this field, then
update places that lower or resolve calls—specifically the lowering in
src/ir/cfg_builder.cpp that assigns to r->call and the resolution logic in
src/platform.hpp—to populate and use the new module field so BTF id + module
remain distinct (also ensure any constructors/initializers that create Call
values set the module appropriately).

In `@src/printing.cpp`:
- Around line 536-574: The cheap/fallback branch in operator()(Call const& call)
loses resolved helper/kfunc names when program_info_ is null; fix by preferring
any already-attached resolved name on the Call before falling back to the
kind_label: if Call has a stored name field (e.g., call.name or
call.resolved_name or metadata on Call), print "r0 = <that_name>:<call.func>"
and only if that name is empty use the kind_label logic; alternatively, ensure
the caller of operator()(Call) (e.g., operator<<(Instruction) / print_dot /
print_invariants_filtered) passes a ProgramInfo into this printer so
resolve(call, *program_info_) can be used instead of the generic kind_label
fallback.

---

Outside diff comments:
In `@src/crab/ebpf_transformer.cpp`:
- Around line 819-826: operator()(const Call&) assumes the call is supported but
doesn't enforce it; add an explicit precondition check/assertion so we never
proceed with a default-constructed contract. Before using resolved.contract (and
before the loop over resolved.contract.singles), assert or guard that the call
is supported (e.g., check call.is_supported() or resolved.contract.is_supported)
and bail out or report if not; this ensures EbpfTransformer::operator()(const
Call&), resolve(call, ...), and subsequent uses of resolved.contract/r0 cannot
operate on an unsupported/default contract.
- Around line 877-905: The addr/width are computed before the join using
dom.state.primary_kind_variable_for_type(param.mem) which picks a single type,
causing wrong addresses when param.mem can be T_STACK; move the
primary_kind_variable_for_type(param.mem, type) call and the Interval addr/width
computations into the T_STACK branch inside the dom.state.join_over_types lambda
(i.e., compute const auto offset = primary_kind_variable_for_type(param.mem,
type); check has_value(); then compute const Interval addr =
state.values.eval_interval(*offset) and const Interval width =
state.values.eval_interval(reg_pack(param.size).svalue)) and then call
stack.havoc(state.values, ...) and stack.store_numbers(addr, width) there,
removing the outer addr/width usage and the store_numbers call outside the join.

In `@src/ir/assertions.cpp`:
- Around line 76-147: Add a short explanatory comment at the start of
operator()(const Call& call) (right after the call to resolve(call, info)) that
documents the relied-upon invariant: unsupported calls have been rejected
earlier (via cfg_builder::check_instruction_feature_support) so using
resolved.contract.singles/pairs here is safe; optionally add a debug assert that
the overall call feature support has been checked (or that resolved.contract is
empty only for unsupported calls) to make the assumption explicit for future
auditors.

---

Duplicate comments:
In `@src/crab/ebpf_checker.cpp`:
- Around line 291-296: The code currently asserts that
primary_kind_variable_for_type(s.reg, type) has a value inside the
is_region_access_type(type) branch, which can lead to undefined behavior in
release builds; replace the assert with an explicit verifier failure path: check
if offset_var.has_value() immediately after calling
primary_kind_variable_for_type, and if not, invoke the verifier error/early-exit
mechanism used in this module (e.g., return verifier_fail(...) or call
report_verifier_error with a clear message including s.reg and type), ensuring
you do not dereference offset_var before returning; then proceed to call
lb_ub_access_pair(s, *offset_var) only when the optional is present.
- Around line 255-258: The T_PACKET branch in ValidMapKeyValue uses lb =
access_reg.packet_offset and ub = lb + width but calls
require_region_bounds(T_PACKET, access_reg, lb, ub) without forwarding the
packet-size override, causing mismatched ceilings versus ValidAccess; update
this branch to thread the same packet-size variable used by ValidAccess
(variable_registry.packet_size()) into the bounds check (i.e., obtain the
packet_size override from the VariableRegistry and pass it through to
require_region_bounds), and if require_region_bounds needs an extra parameter
add/propagate a packet_size argument through the call sites and adjust its
signature/implementation to use that override for computing/verifying bounds to
preserve the same invariants and soundness.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 99cae832-5365-4276-ae79-f2f6be31a94b

📥 Commits

Reviewing files that changed from the base of the PR and between 7f99e3b and 0d5a724.

📒 Files selected for processing (19)
  • src/crab/ebpf_checker.cpp
  • src/crab/ebpf_transformer.cpp
  • src/ir/assertions.cpp
  • src/ir/call_resolver.cpp
  • src/ir/call_resolver.hpp
  • src/ir/cfg_builder.cpp
  • src/ir/parse.cpp
  • src/ir/syntax.hpp
  • src/ir/unmarshal.cpp
  • src/ir/unmarshal.hpp
  • src/linux/kfunc.cpp
  • src/linux/kfunc.hpp
  • src/linux/linux_platform.cpp
  • src/platform.hpp
  • src/printing.cpp
  • src/test/ebpf_yaml.cpp
  • src/test/test_marshal.cpp
  • src/test/test_platform_tables.cpp
  • src/test/test_print.cpp
💤 Files with no reviewable changes (1)
  • src/ir/unmarshal.hpp

Comment thread src/crab/ebpf_transformer.cpp
Comment thread src/ir/call_resolver.cpp
Comment thread src/ir/call_resolver.cpp Outdated
Comment thread src/ir/call_resolver.cpp
Comment thread src/ir/call_resolver.cpp Outdated
Comment thread src/ir/syntax.hpp
Comment thread src/printing.cpp
Real issues caught during review:

  * ebpf_checker.cpp ValidAccess: replace assert with throw_fail on
    the "region type has no primary kind variable" path so mismatched
    is_region_access_type / primary_kind_variable_for_type fail closed
    in release builds instead of UB-dereferencing.

  * region_semantics.cpp region_bounds: the default branch now throws
    std::logic_error unconditionally instead of relying on assert(false)
    (stripped under NDEBUG) plus a dummy RegionBounds return.

  * call_resolver.cpp:
      - resolve_helper guards info.platform nullptr before dereferencing,
        matching resolve_kfunc / resolve_builtin.
      - resolve_kfunc and resolve_builtin stamp the caller-provided Call
        into the returned ResolvedCall.call rather than trusting every
        platform implementation to set it.
      - Fix "helper function is unavailable on this platform" diagnostic
        on the builtin path -- was misleading; now reads "builtin
        function is unavailable on this platform".
      - Remove the dead args.size() - i < 2 bound check (unreachable
        because args has 7 entries and the loop bound is i < 6) in favor
        of a clarifying comment.

  * ebpf_transformer.cpp:
      - assert that resolved.is_supported on entry to operator()(Call),
        making the cfg_builder reject gate's precondition explicit.
      - Expand the no-op comment in do_load to cite the matching checker
        rejections that make the skip sound.
      - Annotate the Bin::Op::ADD "num += ptr" branch with a note that
        primary_kind_variable_for_type is purely type-indexed, so a
        present dst_offset implies src_offset too (justifies the .value()
        unwrap the reviewer flagged).

  * assertions.cpp operator()(Call): add invariant comment tying the
    empty-contract-is-fine logic to the cfg_builder reject gate.

  * printing.cpp: print_dot now renders Call instructions richly
    (resolving through the program) so DOT output matches what
    print_program / print_invariants already produce. Dedupe the two
    parallel "rich Call print" implementations (DetailedPrinter::print_call
    and CommandPrinterVisitor's info-backed branch) into a single
    print_instruction_rich helper.

Findings intentionally left unchanged (replies will be posted on the PR):

  * ebpf_checker.cpp ValidMapKeyValue T_PACKET not passing packet_size():
    not a regression. Pre-refactor code passed {} (no override), my code
    does the same. The asymmetry with ValidAccess pre-dates this PR.

  * syntax.hpp Call equality "conflates helper and builtin": no longer
    true once CallKind::builtin exists (commit 112d664). Call{func,
    helper} and Call{func, builtin} have distinct kinds and distinct
    resolutions.

  * syntax.hpp "kfunc module is dropped": pre-existing. The platform
    resolve_kfunc_call callback has always taken btf_id alone, never
    module. Not a regression introduced by this PR.

Refs #1086.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Elazar Gershuni <elazarg@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant