Skip to content

fix(markitdown-ocr): make PyMuPDF an optional dependency to fix AGPL licensing concern#1717

Open
octo-patch wants to merge 1 commit intomicrosoft:mainfrom
octo-patch:fix/issue-1675-pymupdf-optional
Open

fix(markitdown-ocr): make PyMuPDF an optional dependency to fix AGPL licensing concern#1717
octo-patch wants to merge 1 commit intomicrosoft:mainfrom
octo-patch:fix/issue-1675-pymupdf-optional

Conversation

@octo-patch
Copy link
Copy Markdown

Fixes #1675

Problem

markitdown-ocr declared PyMuPDF>=1.24.0 as a required dependency, but PyMuPDF is licensed under AGPL-3.0. Since the plugin itself is MIT-licensed, this mismatch was not disclosed anywhere. Any user who installed markitdown-ocr silently acquired an AGPL transitive dependency, which can affect the licensing requirements of applications that distribute the software.

PyMuPDF is only used in one place: a fallback path in _pdf_converter_with_ocr.py that handles malformed PDFs that pdfplumber cannot open (e.g. truncated EOF). This is an edge case, not the primary conversion path.

Solution

  • Move PyMuPDF from dependencies to an optional [pymupdf] extra in pyproject.toml
  • Add an [all] convenience extra that bundles both [llm] and [pymupdf]
  • Add a clear license notice in README.md explaining the AGPL implications and showing how to install with or without PyMuPDF

The existing import fitz call is already inside a try/except, so the fallback path degrades gracefully when PyMuPDF is not installed — no code changes needed.

Testing

  • pip install markitdown-ocr installs without PyMuPDF; standard PDF/DOCX/PPTX/XLSX conversion works normally
  • pip install 'markitdown-ocr[pymupdf]' enables the malformed-PDF fallback
  • pip install 'markitdown-ocr[all]' pulls in both openai and PyMuPDF extras

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Use of Pymupdf

1 participant