Skip to content

test(jsonl): add jsonl_parser schema coverage and harden null usage#22

Merged
wpak-ai merged 3 commits intomasterfrom
test/ccc1-jsonl-parser-schema
May 6, 2026
Merged

test(jsonl): add jsonl_parser schema coverage and harden null usage#22
wpak-ai merged 3 commits intomasterfrom
test/ccc1-jsonl-parser-schema

Conversation

@clean6378-max-it
Copy link
Copy Markdown
Collaborator

@clean6378-max-it clean6378-max-it commented May 5, 2026

Closes #21

Summary

Adds comprehensive pytest coverage for utils/jsonl_parser.py (CCC1) and hardens assistant parsing when message.usage is JSON null or otherwise not a plain object.

Motivation

jsonl_parser.py is the structural core for untrusted, schema-evolving Claude Code JSONL. Direct tests lock in _parse_tool_result dispatch, helper behavior, and integration paths so silent regressions from schema drift are caught early.

Changes

  • New: tests/test_jsonl_parser.py covering:
    • _parse_tool_result for all major result shapes and fallbacks
    • _normalize_content, _extract_text, _extract_images
    • _infer_title, _strip_system_tags
    • _process_user, _process_assistant, _process_system, _track_file_activity
    • parse_session integration (empty files, unknown types, sidechains, snapshots, wall time, malformed lines)
    • quick_session_info for small vs large sessions
  • Fix: _process_assistant treats non-dict usage as {} so usage: null does not crash.

Testing

pytest tests/test_jsonl_parser.py
pytest tests/
Notes
Underscore imports are confined to utils.jsonl_parser, consistent with existing parser tests.
CI automation for pytest is tracked separately (CCC2).

<!-- This is an auto-generated comment: release notes by coderabbit.ai -->
## Summary by CodeRabbit

* **Tests**
  * Added a comprehensive test suite for JSONL parsing: tool results, content normalization, title inference, message handling, file activity, session parsing, and extensive edge-case coverage (missing keys, nested results, images, sidechains, compaction, malformed/partial entries).

* **Bug Fixes**
  * Improved parsing robustness to safely handle missing or unexpected message/usage data, preventing errors during session extraction.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Add tests/test_jsonl_parser.py covering _parse_tool_result dispatch,
content helpers, user/assistant/system paths, file activity, parse_session
integration, quick_session_info, and malformed entries. Treat non-dict
message.usage in _process_assistant as empty to avoid crashes on null usage.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 5, 2026

Caution

Review failed

Failed to post review comments

📝 Walkthrough

Walkthrough

Adds a large pytest suite for utils/jsonl_parser and hardens message/usage access by introducing a safe _entry_message(entry) helper and ensuring usage is a dict in assistant processing and quick_session_info first-pass extraction.

Changes

Parser Test Coverage & Defensive Guard

Layer / File(s) Summary
Data Shape / Helper
utils/jsonl_parser.py
Add _entry_message(entry) to safely extract entry["message"] as a dict; replace direct entry.get("message", {}) usages.
Core Guard
utils/jsonl_parser.py
In _process_assistant, ensure usage is a dict (reset to {} when not) before accessing token/cache fields. Update quick_session_info first-pass to use _entry_message.
Test Helpers
tests/test_jsonl_parser.py (_fresh_metadata, _write_jsonl)
Add deterministic metadata fixture and temp-jsonl writer used by tests.
Unit Tests — Parse Tool Result
tests/test_jsonl_parser.py (parse tool/result tests)
Comprehensive tests covering bash, file edits/read/write, glob, grep, web search/fetch, task variants, todo_write, user_input, plan, unknown shapes, non-dict/null inputs, and slug preservation.
Unit Tests — Content Helpers
tests/test_jsonl_parser.py (_normalize_content, _extract_text, _extract_images)
Tests for string/list/dict/mixed inputs, nested tool_result images, base64 image extraction, and skipping non-image items.
Unit Tests — Title & System Tag Helpers
tests/test_jsonl_parser.py (_infer_title, _strip_system_tags)
Tests for title inference (first user text, 100-char truncation, Untitled fallback) and system tag stripping behavior.
Unit Tests — User Processing
tests/test_jsonl_parser.py (_process_user tests)
Tests for first-entry metadata capture, handling missing message, and toolUseResult image extraction.
Unit Tests — Assistant & System Processing
tests/test_jsonl_parser.py (_process_assistant, _process_system tests)
Covers content normalization, synthetic-model exclusion, thinking-block joining, tool_use counting, api_error flagging, stop_reason/service_tier accumulation, ephemeral token sums, compaction boundary handling and missing compactMetadata safety.
Unit Tests — File Activity & Integration
tests/test_jsonl_parser.py (_track_file_activity, parse_session tests)
File read/write/edit tracking, bash/web logging, end-to-end parse_session scenarios (empty skeleton, unknown type ignored, sidechains, file-history snapshots, entry counts, wall-time, invalid JSON lines, missing keys, cleanup).
Unit Tests — Quick Session Info & Malformed Entries
tests/test_jsonl_parser.py (quick_session_info, malformed tests)
Tests for small/large-file tail-read timestamp, no-user-entry Untitled behavior, and partial/invalid entry cases to validate robustness.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 I nibble lines of jsonl with delight,
Guards snug the messages through the night.
Tests sprout clover where edge-cases play,
Parsers hum softly at break of day.
Hop on — the session folds up right.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 4.40% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'test(jsonl): add jsonl_parser schema coverage and harden null usage' accurately describes both main changes: comprehensive test coverage for jsonl_parser and hardening of null/non-dict usage handling.
Linked Issues check ✅ Passed The PR comprehensively addresses all coding requirements from issue #21: tests cover all 14 _parse_tool_result dispatch arms, helper functions, system/user/assistant processing, file activity tracking, integration scenarios, and malformed entries; null usage hardening is implemented in _process_assistant.
Out of Scope Changes check ✅ Passed All changes are directly scoped to issue #21 objectives: test suite covers specified parser functions and scenarios, and utility changes harden message/usage handling as required. No unrelated modifications detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch test/ccc1-jsonl-parser-schema

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
utils/jsonl_parser.py (1)

140-142: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guard message before dereferencing in both user/assistant processors.

If an entry contains "message": null, both processors call .get(...) on None and abort parse_session for a recoverable malformed line. Apply the same dict-guard pattern here too.

💡 Suggested fix
 def _process_user(entry: dict, messages: list, metadata: dict):
@@
-    msg = entry.get("message", {})
+    msg = entry.get("message", {})
+    if not isinstance(msg, dict):
+        msg = {}
     content = msg.get("content", [])
@@
 def _process_assistant(entry: dict, messages: list, metadata: dict):
@@
-    msg = entry.get("message", {})
+    msg = entry.get("message", {})
+    if not isinstance(msg, dict):
+        msg = {}
     model = msg.get("model", "")

Also applies to: 173-174

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@utils/jsonl_parser.py` around lines 140 - 142, The processors in
utils/jsonl_parser.py currently assume entry["message"] is a dict and call
entry.get("message", {}) without guarding against message being None; update
both user and assistant processing blocks to first coerce message to a dict
(e.g., msg = entry.get("message") or {} ) before calling msg.get("content", [])
and passing to _extract_text, and apply the same pattern where similar code
exists (the other processor and the parse_session usage) so a null "message" no
longer raises when dereferenced.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/test_jsonl_parser.py`:
- Around line 718-730: Add two regression tests in tests/test_jsonl_parser.py
alongside test_missing_usage_dict_no_crash: one that passes a record with
"message": null and another that passes a record with a non-dict usage (e.g.,
"usage": "invalid" or a number) to ensure parse_session handles these malformed
shapes without crashing and still sets metadata.total_input_tokens to 0;
reference the existing test function name test_missing_usage_dict_no_crash and
the parser entrypoint parse_session to locate where to add the new tests and
mirror the try/finally cleanup pattern used there.

---

Outside diff comments:
In `@utils/jsonl_parser.py`:
- Around line 140-142: The processors in utils/jsonl_parser.py currently assume
entry["message"] is a dict and call entry.get("message", {}) without guarding
against message being None; update both user and assistant processing blocks to
first coerce message to a dict (e.g., msg = entry.get("message") or {} ) before
calling msg.get("content", []) and passing to _extract_text, and apply the same
pattern where similar code exists (the other processor and the parse_session
usage) so a null "message" no longer raises when dereferenced.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4b102107-1dd1-4277-936f-9231732d0eb4

📥 Commits

Reviewing files that changed from the base of the PR and between 3b72afd and e6df529.

📒 Files selected for processing (2)
  • tests/test_jsonl_parser.py
  • utils/jsonl_parser.py

Comment thread tests/test_jsonl_parser.py
@clean6378-max-it clean6378-max-it requested a review from timon0305 May 6, 2026 15:50
Comment thread tests/test_jsonl_parser.py Outdated
Add tests/test_jsonl_parser.py for tool-result dispatch, helpers, parse_session,
quick_session_info, and malformed entries (null/non-dict message and usage).
Introduce _entry_message() so only dict messages are read; keep usage as a dict
before token accounting. Apply the same normalization in quick_session_info.
Use full-string == assertions in TestStripSystemTags; drop unused pytest import;
add regression for string-shaped assistant message.
@clean6378-max-it clean6378-max-it requested a review from timon0305 May 6, 2026 17:43
@timon0305 timon0305 requested a review from wpak-ai May 6, 2026 17:53
@wpak-ai wpak-ai merged commit 09de881 into master May 6, 2026
2 checks passed
@wpak-ai wpak-ai deleted the test/ccc1-jsonl-parser-schema branch May 6, 2026 17:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

# CCC1 — Parser test coverage: schema variants

3 participants