AI Document Processing
Intelligent document understanding
What is AI Document Processing?
Go beyond simple text extraction with Gemini 3's native vision capabilities for comprehensive document understanding.
visibility
Native Vision
Understands text, images, diagrams, charts, and tables together
description
Large Documents
Process PDFs up to 1000 pages or 50MB in a single request
code
Structured Output
Extract data into JSON format for downstream applications
compare
Multi-Document
Compare and analyze multiple PDFs simultaneously
Upload & Analyze Document
Upload a PDF to analyze with AI-powered document understanding.
Technical Specifications
Understanding the capabilities and limits of document processing.
check_circle What It Can Do
- File Size: Up to 50MB per PDF
- Page Count: Up to 1000 pages per document
- Resolution: Pages scaled to 3072×3072 max (preserving aspect ratio)
- Multi-file: Process multiple PDFs simultaneously
- Formats: PDF (best), also accepts TXT, MD, HTML (text-only)
- Vision: Understands charts, diagrams, tables, images, layouts
tips_and_updates Best Practices
- Rotate pages to correct orientation before uploading
- Avoid blurry or low-quality scans
- Use Files API for documents larger than 10MB
- Place text prompts after the document in requests
- PDFs work best - other formats lose visual context
- Native text is extracted and not charged separately
- Set media_resolution (low/medium/high) per document