Browser-based, private, and built for developers and data professionals who need reliable structured JSON from PDF files.
🔒
100% Private & Secure
Your PDF is parsed entirely in your browser using PDF.js. It is never uploaded to any server. Your documents and their contents remain on your device at all times — no exceptions.
📄
Full Document Mode
Extract the complete document structure: metadata, pages, paragraphs, and individual lines — all nested into a clean hierarchical JSON object ready for programmatic processing or API ingestion.
☰
Line-by-Line Extraction
Convert every text line in the PDF into a flat JSON array with page numbers attached. Ideal for processing logs, reports, or structured plain-text documents line by line.
🔤
Word-by-Word Extraction
Output every individual word as a JSON array item with its source page. Useful for NLP pipelines, word frequency analysis, text tokenisation, and custom search index building.
🗂️
Metadata Extraction
Extract only document metadata — title, author, creator, producer, PDF version, page count, and creation date — without processing content. Perfect for document cataloguing workflows.
{ }
Pretty Print & Minify
Toggle between indented, human-readable JSON and compact minified output with a single click. Choose indent size (2 or 4 spaces) to match your team's code style preferences.
📐
Page Range Control
Extract only the pages you need using the Start Page and End Page controls. Process a single page, a chapter, or the full document — without converting unnecessary content.
⎘
Copy & Download
Copy the entire JSON output to your clipboard with one click, or download it as a clean .json file. Use it directly in your code editor, API client, or data pipeline.
🔄
Re-extract Without Reloading
Change extraction mode, toggle options, or adjust page range and click "Re-extract" — your PDF stays loaded and re-processes instantly without requiring another file upload.