--- title: olmOCR Markdown Converter emoji: 📝 colorFrom: yellow colorTo: blue sdk: gradio sdk_version: 3.50.2 app_file: app.py python_version: 3.11 license: mit --- # olmOCR Markdown Converter This Space uses the `olmOCR` model pipeline to convert PDFs (including scientific papers) into markdown `.txt` files that retain document structure, headers, and basic math formatting — ready for Calibre/Kindle or downstream parsing. - ✅ Vision + text anchor OCR pipeline (via `olmOCR`) - ✅ Extracts semantic structure via PDF TOC - ✅ Outputs clean `.txt` in markdown format - ✅ Hugging Face **Gradio Space with GPU support** ## Example Use Upload a scientific paper in PDF and download a markdown `.txt` version with preserved headers and inline structure. --- Built by [@BenedictRichardLeonardi](https://huggingface.co/BenedictRichardLeonardi) using [olmOCR](https://huggingface.co/allenai/olmOCR-7B-0225-preview)