llmfood

Generate LLM-friendly Markdown from Docusaurus HTML builds, implementing the llms.txt convention.

Overview

llmfood converts a Docusaurus static HTML build into clean Markdown files optimized for LLM consumption. It:

Discovers all pages in a Docusaurus build directory
Resolves client-side content that doesn't exist in static HTML (GitHub code references, remote content, mermaid diagrams)
Converts each HTML page to Markdown, stripping Docusaurus chrome (breadcrumbs, pagination, TOC, footers)
Generates llms.txt — a structured index linking to all converted .md files
Generates custom files — aggregated Markdown files matching URL patterns (e.g., llms-full.txt)

Installation

npm install llmfood
# or
bun add llmfood

Usage

Docusaurus Plugin (recommended)

Add llmfood as a Docusaurus plugin for zero-config integration. It runs automatically after docusaurus build:

// docusaurus.config.js
module.exports = {
  plugins: [
    [
      "llmfood/docusaurus",
      {
        sectionOrder: ["guides", "api", "concepts"],
        sectionLabels: { guides: "Guides", api: "API Reference" },
        customFiles: [
          {
            filename: "llms-full.txt",
            title: "Full Documentation",
            description: "Complete documentation in a single file",
            includePatterns: [/.*/],
          },
        ],
      },
    ],
  ],
};

The plugin automatically derives baseUrl, buildDir, siteTitle, and siteDescription from your Docusaurus config. It also sets docsDir to {siteDir}/docs by default, enabling source file scanning for mermaid diagrams and remote content resolution.

Standalone

import { generateLlmsMarkdown } from "llmfood";

await generateLlmsMarkdown({
  baseUrl: "https://docs.example.com",
  buildDir: "./build",
  siteTitle: "My Docs",
  siteDescription: "Documentation for my project",
  docsDir: "./docs", // optional: enables source file scanning
  sectionOrder: ["guides", "api", "concepts"],
  sectionLabels: { guides: "Guides", api: "API Reference" },
  ignorePatterns: [/\/blog\//],
  customFiles: [
    {
      filename: "llms-full.txt",
      title: "Full Documentation",
      description: "Complete documentation in a single file",
      includePatterns: [/.*/],
    },
  ],
});

Standalone HTML to Markdown

You can also use the converter directly:

import { htmlToMarkdown } from "llmfood";

const markdown = htmlToMarkdown(docusaurusHtmlString);

Content Resolution

Some Docusaurus plugins render content client-side, so the static HTML contains placeholders instead of real content. When docsDir is set, llmfood scans MDX source files and resolves these automatically:

Pattern	Source detection	Resolution
GitHub code references	`CodeBlock` JSX, fenced ```lang reference, and `children`/`src`/`srcUrl`/`source` attributes	Fetches code from `raw.githubusercontent.com` with line ranges
Remote content	`url="..."` or `url={expr}` in MDX	Fetches remote markdown (JSX expressions via `resolveRemoteUrl`)
Mermaid diagrams	```mermaid blocks in MDX	Injects mermaid source into HTML (client-side renders leave none)
YouTube embeds	`<iframe>` with YouTube URL in HTML	Converts to `[title](youtube-url)` markdown link

Source scanning also resolves imported MDX snippets (import Foo from "./_snippet.mdx"), substitutes ${props.x} expressions using caller prop values, and matches files by frontmatter id when the slug differs from the filename.

All external fetches run in parallel with a concurrency limit of 6.

API

`generateLlmsMarkdown(config)`

Processes an entire Docusaurus build and generates llms.txt plus any custom files.

`LlmfoodConfig`

Property	Type	Required	Description
`baseUrl`	`string`	Yes	Base URL for generated links (e.g., `https://docs.example.com`)
`buildDir`	`string`	Yes	Path to the Docusaurus build output directory
`customFiles`	`CustomLlmFile[]`	No	Custom aggregated output files to generate
`docsDir`	`string`	No	Path to docs source directory (enables mermaid + remote content resolution)
`ignorePatterns`	`RegExp[]`	No	URL patterns to exclude (root `/` is always excluded)
`postProcessHtml`	`(html, context) => string`	No	Hook to transform HTML before markdown conversion
`postProcessMarkdown`	`(md, context) => string`	No	Hook to transform markdown after conversion
`resolveRemoteUrl`	`(expr) => string`	No	Resolve JSX expressions (e.g., `getBenchmarkURL(...)`) to fetch URLs
`rootContent`	`string`	No	Additional content to include at the top of `llms.txt`
`sectionLabels`	`Record<string, string>`	No	Custom display labels for URL sections
`sectionOrder`	`string[]`	No	Ordering for sections in `llms.txt`
`siteDescription`	`string`	No	Site description shown in `llms.txt`
`siteTitle`	`string`	No	Site title shown in `llms.txt`
`verbose`	`boolean`	No	Log individual skipped pages with reasons

Both hooks receive a ProcessContext with { urlPath: string } and may return a Promise.

`CustomLlmFile`

Property	Type	Required	Description
`filename`	`string`	Yes	Output filename (e.g., `llms-full.txt`)
`includePatterns`	`RegExp[]`	Yes	URL patterns to include in this file
`description`	`string`	No	Description shown at the top of the file
`title`	`string`	No	Title shown at the top of the file

`htmlToMarkdown(html)`

Converts a Docusaurus HTML string to clean Markdown. Expects the content to be wrapped in an <article> tag.

Returns an empty string if no <article> element is found.

Supported Docusaurus Elements

The converter handles these Docusaurus-specific elements:

Prism code blocks — preserves language and syntax highlighting structure
Admonitions — converts to :::type [title] syntax (tip, warning, info, caution, danger, note, important)
Tabs — renders each tab panel with its label as a bold heading
Details/Summary — preserves as HTML <details> elements
KaTeX math — converts to $$...$$ (block) and $...$ (inline) syntax
Images — converts to standard Markdown, skipping data URIs
Tables — converts to GFM table syntax with alignment support (:---:, ---:)
Strikethrough — converts <del> and <s> to ~~text~~
YouTube iframes — converts to markdown links with video title
Mermaid code blocks — preserves as fenced mermaid code blocks (when source is available)

Pages that can't be converted are tracked and summarized. Set verbose: true to see individual skipped pages with reasons (redirects, empty pages, missing files, errors).

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.github/workflows		.github/workflows
.husky		.husky
src		src
tests		tests
.gitignore		.gitignore
.lintstagedrc.js		.lintstagedrc.js
.prettierignore		.prettierignore
.prettierrc.js		.prettierrc.js
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
biome.jsonc		biome.jsonc
bun.lock		bun.lock
bunfig.toml		bunfig.toml
mechanics.svg		mechanics.svg
package.json		package.json
tsconfig.build.json		tsconfig.build.json
tsconfig.json		tsconfig.json
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

llmfood

Overview

Installation

Usage

Docusaurus Plugin (recommended)

Standalone

Standalone HTML to Markdown

Content Resolution

API

`generateLlmsMarkdown(config)`

`LlmfoodConfig`

`CustomLlmFile`

`htmlToMarkdown(html)`

Supported Docusaurus Elements

License

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

llmfood

Overview

Installation

Usage

Docusaurus Plugin (recommended)

Standalone

Standalone HTML to Markdown

Content Resolution

API

generateLlmsMarkdown(config)

LlmfoodConfig

CustomLlmFile

htmlToMarkdown(html)

Supported Docusaurus Elements

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`generateLlmsMarkdown(config)`

`LlmfoodConfig`

`CustomLlmFile`

`htmlToMarkdown(html)`

Packages