A unified interface for state-of-the-art multimodal and document AI models. Select a model, upload an image or video, and enter a query to begin.

🤖 Select Model
Examples
Query Upload Image
1 5120
0.1 2
0.05 1
1 1000
1 2

Output

  • Nanonets-OCR-s: Transforms documents into structured markdown with intelligent content recognition.
  • SmolDocling-256M: An efficient multimodal model for converting documents to structured formats.
  • MonkeyOCR-Recognition: Adopts a Structure-Recognition-Relation paradigm for efficient document processing.
  • Typhoon-OCR-7B: A bilingual (Thai/English) document parsing model for real-world documents.
  • Thyme-RL: Generates and executes code for image processing and complex reasoning tasks.

⚠️ Note: Performance on video inference tasks is experimental and may vary between models.

Report a Bug