Skip to content

[codex] Split LLM repair tasks and optional panels#26

Open
michaelmwu wants to merge 1 commit intomainfrom
codex/llm-dom-review-tags-clean
Open

[codex] Split LLM repair tasks and optional panels#26
michaelmwu wants to merge 1 commit intomainfrom
codex/llm-dom-review-tags-clean

Conversation

@michaelmwu
Copy link
Copy Markdown
Member

@michaelmwu michaelmwu commented May 6, 2026

Summary

Adds task scoping to optional place LLM repair so callers can independently enable:

  • dom_repair for generic Google Maps field repair
  • display_translation for English-readable address/category display fields

Also makes Reviews and About tab collection optional through library flags and CLI switches:

  • collect_reviews=False / --skip-reviews
  • collect_about=False / --skip-about

Updates docs and tests for downstream refresh flows that want to avoid unnecessary LLM calls and extra tab clicks.

Validation

  • ./scripts/lint.sh
  • ./scripts/typecheck.sh
  • uv run python -m unittest discover -s tests

Copilot AI review requested due to automatic review settings May 6, 2026 16:19
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 6, 2026

Warning

Rate limit exceeded

@michaelmwu has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 53 minutes and 58 seconds before requesting another review.

To continue reviewing without waiting, purchase usage credits in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 542786ee-7c8f-43a7-a1d3-1db96a37d9cc

📥 Commits

Reviewing files that changed from the base of the PR and between 1163b37 and d05bd80.

📒 Files selected for processing (15)
  • AGENTS.md
  • README.md
  • docs/ARCHITECTURE.md
  • docs/USAGE.md
  • src/gmaps_scraper/__init__.py
  • src/gmaps_scraper/cli.py
  • src/gmaps_scraper/display_fields.py
  • src/gmaps_scraper/llm.py
  • src/gmaps_scraper/models.py
  • src/gmaps_scraper/place_scraper.py
  • tests/test_cli.py
  • tests/test_display_fields.py
  • tests/test_llm.py
  • tests/test_place_scraper.py
  • tests/test_public_api.py
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch codex/llm-dom-review-tags-clean

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@michaelmwu michaelmwu force-pushed the codex/llm-dom-review-tags-clean branch from e19db60 to 2c2cc71 Compare May 6, 2026 16:21
@michaelmwu michaelmwu changed the title [codex] Add LLM-assisted place scraping [codex] Split LLM repair tasks and optional panels May 6, 2026
@chatgpt-codex-connector
Copy link
Copy Markdown

💡 Codex Review

from_lines = _extract_review_count_from_lines(lines)
if from_lines is not None:
return from_lines
return _parse_review_count(snapshot.get("review_count"))

P2 Badge Prefer structured review count before text fallback

When the DOM selector already extracted snapshot['review_count'], this now still scans all panel_text/body_text first and returns the first matching count it sees. On Google Maps pages that include other cards or ratings in the same rendered text before the place's own count, a valid structured count can be overwritten with an unrelated review count; the text scan should only be a fallback when the structured field is missing or unparsable.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@michaelmwu michaelmwu force-pushed the codex/llm-dom-review-tags-clean branch 2 times, most recently from 2d6ea58 to 88d14df Compare May 6, 2026 16:35
@chatgpt-codex-connector
Copy link
Copy Markdown

💡 Codex Review

if collect_reviews and isinstance(dom_snapshot, Mapping):

P2 Badge Gate the review signal wait behind collect_reviews

When collect_reviews=False (or --skip-reviews) this guard skips the Reviews tab click, but _ensure_review_signal(...) still runs unconditionally just above the overview snapshot. For overview-only refreshes, pages without an immediate review signal still pay the review-specific polling/reload path before extraction, defeating the skip option and potentially altering the DOM before the overview scrape; guard that call with collect_reviews as well.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@michaelmwu michaelmwu force-pushed the codex/llm-dom-review-tags-clean branch from 88d14df to 1903e67 Compare May 6, 2026 16:51
@chatgpt-codex-connector
Copy link
Copy Markdown

💡 Codex Review

if collect_reviews and screenshot_path is not None:
_write_place_screenshot(page, screenshot_path)

P2 Badge Keep screenshots when reviews are skipped

When collect_reviews=False (CLI --skip-reviews) and a screenshot_path/--screenshot-output-dir is provided, this guard prevents _write_place_screenshot from running at all, so the scrape succeeds but the requested screenshot artifact is silently missing. The screenshot option is independent of review collection and was written unconditionally before this change, so it should still be emitted for overview-only refreshes that skip the Reviews tab.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@michaelmwu michaelmwu force-pushed the codex/llm-dom-review-tags-clean branch from 1903e67 to c89589a Compare May 6, 2026 16:55
@michaelmwu michaelmwu force-pushed the codex/llm-dom-review-tags-clean branch from c89589a to d05bd80 Compare May 6, 2026 16:58
@chatgpt-codex-connector
Copy link
Copy Markdown

💡 Codex Review


P2 Badge Update diagnostics after display repair

When repair_place_display_fields() is used on a PlaceDetails returned by scrape_place, this replace() call preserves the original diagnostics object even after filling *_display_en fields. In the documented refresh flow, the serialized result can still report needs_address_display_en/needs_category_display_en and llm_used: false, which can mislead downstream quality gates or cause repeated repair attempts despite the display fields being fixed.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant