A unified interface for state-of-the-art multimodal and document AI models. Select a model, upload an image or video, and enter a query to begin.
Examples
Query | Upload Image |
---|
1 5120
0.1 2
0.05 1
1 1000
1 2
Output
- Nanonets-OCR-s: Transforms documents into structured markdown with intelligent content recognition.
- SmolDocling-256M: An efficient multimodal model for converting documents to structured formats.
- MonkeyOCR-Recognition: Adopts a Structure-Recognition-Relation paradigm for efficient document processing.
- Typhoon-OCR-7B: A bilingual (Thai/English) document parsing model for real-world documents.
- Thyme-RL: Generates and executes code for image processing and complex reasoning tasks.
⚠️ Note: Performance on video inference tasks is experimental and may vary between models.