Skip to content

Latest commit

 

History

History
117 lines (92 loc) · 3.09 KB

File metadata and controls

117 lines (92 loc) · 3.09 KB
title Handling Dynamic Content
description Learn how to handle JavaScript-heavy websites and dynamic content with html2rss using browser-based extraction strategies.

import { Code } from "@astrojs/starlight/components";

Some websites load their content dynamically using JavaScript. Static fetch paths may not see this content reliably.

Solution

Use a browser-based extraction strategy when JavaScript-heavy pages do not work with default static fetching.

browserless is common for this workflow, and botasaurus is an alternate browser-based strategy when you run a Botasaurus scrape API.

Keep the strategy at the top level and put request-specific options under request:

<Code code={strategy: browserless request: max_redirects: 5 max_requests: 6 browserless: preload: wait_after_ms: 5000 channel: url: https://example.com/app selectors: items: selector: .article title: selector: h2 url: selector: a extractor: href} lang="yaml" />

When to Use Browser-Based Extraction

A browser-based extraction strategy is necessary when:

  • Content loads after page load - JavaScript fetches data from APIs
  • Single Page Applications (SPAs) - React, Vue, Angular apps
  • Infinite scroll - Content loads as you scroll
  • Dynamic forms - Content changes based on user interaction

Preload Actions

For dynamic sites, rendering once is often not enough. Use request.browserless.preload to wait, click, or scroll before the HTML snapshot is taken.

Wait Before Capturing Dynamic Content

<Code code={strategy: browserless request: browserless: preload: wait_after_ms: 4000} lang="yaml" />

Click "Load More" Buttons

<Code code={strategy: browserless request: browserless: preload: wait_after_ms: 3000 click_selectors: - selector: ".load-more" max_clicks: 3 wait_after_ms: 250} lang="yaml" />

Scroll Infinite Lists

<Code code={strategy: browserless request: browserless: preload: scroll_down: iterations: 5 wait_after_ms: 200 wait_after_ms: 2500} lang="yaml" />

These preload steps can be combined in a single config when a site needs several interactions before all items appear.

Performance Considerations

Browser-based extraction is slower than default static HTTP fetching because it:

  • Launches a headless Chrome browser
  • Renders the full page with JavaScript
  • Takes more memory and CPU resources

Use static HTTP fetching for static content and switch to browser-based extraction when needed. See the Strategy Reference for concrete transports, defaults, and environment requirements.

Related Topics