fix(markitdown-ocr): make PyMuPDF an optional dependency to fix AGPL licensing concern#1717
Open
octo-patch wants to merge 1 commit intomicrosoft:mainfrom
Open
fix(markitdown-ocr): make PyMuPDF an optional dependency to fix AGPL licensing concern#1717octo-patch wants to merge 1 commit intomicrosoft:mainfrom
octo-patch wants to merge 1 commit intomicrosoft:mainfrom
Conversation
…GPL licensing concern (fixes microsoft#1675)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #1675
Problem
markitdown-ocrdeclaredPyMuPDF>=1.24.0as a required dependency, but PyMuPDF is licensed under AGPL-3.0. Since the plugin itself is MIT-licensed, this mismatch was not disclosed anywhere. Any user who installedmarkitdown-ocrsilently acquired an AGPL transitive dependency, which can affect the licensing requirements of applications that distribute the software.PyMuPDF is only used in one place: a fallback path in
_pdf_converter_with_ocr.pythat handles malformed PDFs thatpdfplumbercannot open (e.g. truncated EOF). This is an edge case, not the primary conversion path.Solution
PyMuPDFfromdependenciesto an optional[pymupdf]extra inpyproject.toml[all]convenience extra that bundles both[llm]and[pymupdf]README.mdexplaining the AGPL implications and showing how to install with or without PyMuPDFThe existing
import fitzcall is already inside atry/except, so the fallback path degrades gracefully when PyMuPDF is not installed — no code changes needed.Testing
pip install markitdown-ocrinstalls without PyMuPDF; standard PDF/DOCX/PPTX/XLSX conversion works normallypip install 'markitdown-ocr[pymupdf]'enables the malformed-PDF fallbackpip install 'markitdown-ocr[all]'pulls in both openai and PyMuPDF extras