| title | Handling Dynamic Content |
|---|---|
| description | Learn how to handle JavaScript-heavy websites and dynamic content with html2rss using browser-based extraction strategies. |
import { Code } from "@astrojs/starlight/components";
Some websites load their content dynamically using JavaScript. Static fetch paths may not see this content reliably.
Use a browser-based extraction strategy when JavaScript-heavy pages do not work with default static fetching.
browserless is common for this workflow, and botasaurus is an alternate browser-based strategy when you run a Botasaurus scrape API.
Keep the strategy at the top level and put request-specific options under request:
<Code
code={strategy: browserless request: max_redirects: 5 max_requests: 6 browserless: preload: wait_after_ms: 5000 channel: url: https://example.com/app selectors: items: selector: .article title: selector: h2 url: selector: a extractor: href}
lang="yaml"
/>
A browser-based extraction strategy is necessary when:
- Content loads after page load - JavaScript fetches data from APIs
- Single Page Applications (SPAs) - React, Vue, Angular apps
- Infinite scroll - Content loads as you scroll
- Dynamic forms - Content changes based on user interaction
For dynamic sites, rendering once is often not enough. Use request.browserless.preload to wait, click, or scroll before the
HTML snapshot is taken.
<Code
code={strategy: browserless request: browserless: preload: wait_after_ms: 4000}
lang="yaml"
/>
<Code
code={strategy: browserless request: browserless: preload: wait_after_ms: 3000 click_selectors: - selector: ".load-more" max_clicks: 3 wait_after_ms: 250}
lang="yaml"
/>
<Code
code={strategy: browserless request: browserless: preload: scroll_down: iterations: 5 wait_after_ms: 200 wait_after_ms: 2500}
lang="yaml"
/>
These preload steps can be combined in a single config when a site needs several interactions before all items appear.
Browser-based extraction is slower than default static HTTP fetching because it:
- Launches a headless Chrome browser
- Renders the full page with JavaScript
- Takes more memory and CPU resources
Use static HTTP fetching for static content and switch to browser-based extraction when needed. See the Strategy Reference for concrete transports, defaults, and environment requirements.
- Strategy Reference - Complete strategy documentation
- Troubleshooting - Common issues with dynamic content
- Advanced Features - Performance optimization tips