Skip to content

508-dev/gmaps-scraper

Repository files navigation

Google Places and Saved Lists Scraper

Extract structured data from Google Maps saved lists and individual place pages.

gmaps-scraper reads Google Maps runtime data and rendered place-page DOM, then returns typed Python objects or JSON. It is designed for cache refreshes, place enrichment, and debugging when Google Maps' page structure changes.

What It Extracts

  • Saved list metadata, owner/collaborators, and place entries
  • Place facts such as name, category, rating, review count, address, website, phone, plus code, coordinates, Google price range, summarized admission or lodging quotes, and photos
  • Review topic chips with Google-provided mention counts
  • Visible review snippets
  • About-tab sections such as Accessibility and Service options
  • Optional English-readable display fields for non-Latin address/category text
  • Diagnostics, evidence hashes, and optional debug artifacts

Requirements

  • Python 3.14
  • uv
  • curl_cffi for the primary saved-list fetch path
  • cloakbrowser for browser-backed place scraping and fallback

Install

This project is intended to be consumed directly from source rather than from PyPI.

uv add git+https://github.com/michaelmwu/gmaps-scraper.git

If you vendor the package, also add the runtime dependencies:

uv add curl-cffi cloakbrowser

Quickstart

Scrape a saved list:

uv run gmaps-scraper "https://maps.app.goo.gl/MG2Vd5pWBkL7hXL18"

Scrape one place:

uv run gmaps-scraper \
  "https://www.google.com/maps/place/Den/@35.6731762,139.7127216,17z" \
  --kind place

Batch scrape places:

uv run gmaps-scraper \
  --kind place \
  --input places.txt \
  --session-dir "$CONDUCTOR_ROOT_PATH/.gmaps-scraper/session" \
  --max-retries 2 \
  --stagger-ms 500 \
  --output place-results.json

Build a Maps search URL from downstream context:

from gmaps_scraper import build_maps_search_url

place_url = build_maps_search_url("Analogue, Singapore", gl="sg")

gmaps-scraper does not infer guide region. If your downstream app knows the place should be in Singapore, Taipei, Hanoi, or another region, put that context in the query or pass a Google place ID.

Enable optional LLM repair only when deterministic extraction is thin or needs English-readable display normalization:

OPENAI_API_KEY=...
LLM_MODEL=gpt-5-mini

uv run gmaps-scraper \
  --kind place \
  --input places.txt \
  --llm-repair \
  --llm-cache-dir "$CONDUCTOR_ROOT_PATH/.gmaps-scraper/llm-cache"

Library Usage

from pathlib import Path

from gmaps_scraper import (
    BrowserSessionConfig,
    build_maps_search_url,
    cached_place_repairer,
    llm_cache_namespace_from_env,
    openai_compatible_place_repairer_from_env,
    scrape_place,
    scrape_saved_list,
)

saved_list = scrape_saved_list("https://maps.app.goo.gl/MG2Vd5pWBkL7hXL18")

place_url = build_maps_search_url("Analogue, Singapore")
place = scrape_place(
    place_url,
    browser_session=BrowserSessionConfig(
        profile_dir=Path(".gmaps-scraper/session"),
    ),
)

repaired_place = scrape_place(
    place.source_url,
    llm_fallback=cached_place_repairer(
        openai_compatible_place_repairer_from_env(),
        cache_dir=Path(".gmaps-scraper/llm-cache"),
        cache_namespace=llm_cache_namespace_from_env(),
    ),
    llm_tasks=("dom_repair", "display_translation"),
)

Downstream Refresh Pattern

For low-cost refreshes, downstream consumers should scrape deterministically first, reuse prior English display fields when raw fields are unchanged, and only then call optional LLM repair if a raw non-Latin address/category still needs normalization.

from pathlib import Path

from gmaps_scraper import (
    cached_place_repairer,
    llm_cache_namespace_from_env,
    needs_display_en,
    openai_compatible_place_repairer_from_env,
    repair_place_display_fields,
    reuse_place_display_fields,
    scrape_place,
)

fresh = scrape_place(place_url, llm_policy="never")
fresh = reuse_place_display_fields(fresh, previous_place)

needs_translation = (
    needs_display_en(fresh.address) and fresh.address_display_en is None
) or (
    needs_display_en(fresh.category) and fresh.category_display_en is None
)

if needs_translation:
    repairer = cached_place_repairer(
        openai_compatible_place_repairer_from_env(),
        cache_dir=Path(".gmaps-scraper/llm-cache"),
        cache_namespace=llm_cache_namespace_from_env(),
    )
    translated = repair_place_display_fields(
        fresh,
        repairer=repairer,
        evidence={"city": "Singapore", "country": "Singapore"},
    )
    fresh = translated

The downstream app owns the spend policy. gmaps-scraper owns the generic Google Maps repair prompt, evidence shape, script detection, translation memory, and cache mechanics.

repair_place_display_fields does not scrape Google Maps. It only repairs English-readable display fields from an existing PlaceDetails; optional evidence is caller-provided context such as guide city/country.

Reviews and About attributes are collected by default. Use collect_reviews=False or collect_about=False in library code, or --skip-reviews / --skip-about in the CLI, when a refresh only needs overview facts.

Documentation

  • Usage Guide: CLI options, batch scraping, debug artifacts, LLM setup, cache behavior, and downstream refresh examples
  • Architecture: extraction layers, diagnostics, LLM repair, translation memory, and session/concurrency design
  • Contributing: local development, tests, PR expectations, and translation-memory promotion

Development

uv sync --dev
./scripts/lint.sh
./scripts/typecheck.sh
uv run python -m unittest discover -s tests

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages