DoclingTools enable an Agent to convert documents from multiple input formats (PDF, DOCX, PPTX, XLSX, HTML, images, audio, video, etc.) into output formats like Markdown, JSON, YAML, HTML, DocTags, and more using the Docling library.Documentation Index
Fetch the complete documentation index at: https://agno-v2-team-approvals.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Prerequisites
The following example requires thedocling library.
- macOS:
brew install ffmpeg - Ubuntu:
sudo apt-get install ffmpeg - Windows: Download from ffmpeg.org
Example
The following agent converts a PDF to Markdown:OCR Configuration
Configure OCR settings for scanned PDFs or documents with embedded images.cookbook/91_tools/docling_tools/ocr_example.py
Toolkit Params
| Parameter | Type | Default | Description |
|---|---|---|---|
converter | DocumentConverter | None | Pre-configured Docling DocumentConverter instance |
max_chars | int | None | Maximum characters in output |
allowed_input_formats | List[str] | None | Restrict accepted input formats (e.g. ["pdf", "docx"]) |
format_options | Dict[Any, Any] | None | Custom format options passed to the converter |
pdf_pipeline_options | PdfPipelineOptions | None | Full PDF pipeline configuration object |
pdf_enable_ocr | bool | None | Enable OCR processing for PDFs |
pdf_ocr_engine | str | None | OCR engine: auto, easyocr, tesseract, tesseract_cli, ocrmac, rapidocr |
pdf_ocr_lang | List[str] | None | OCR language codes (e.g. ["en", "pt"]) |
pdf_force_full_page_ocr | bool | None | Force OCR on every page regardless of text layer |
pdf_enable_table_structure | bool | None | Enable table structure recognition in PDFs |
pdf_enable_picture_description | bool | None | Enable picture description extraction |
pdf_enable_picture_classification | bool | None | Enable picture classification |
pdf_document_timeout | float | None | Timeout in seconds for PDF processing |
pdf_enable_remote_services | bool | None | Enable remote services for PDF processing |
enable_convert_to_markdown | bool | True | Register the convert_to_markdown function |
enable_convert_to_text | bool | True | Register the convert_to_text function |
enable_convert_to_html | bool | True | Register the convert_to_html function |
enable_convert_to_html_split_page | bool | True | Register the convert_to_html_split_page function |
enable_convert_to_json | bool | True | Register the convert_to_json function |
enable_convert_to_yaml | bool | True | Register the convert_to_yaml function |
enable_convert_to_doctags | bool | True | Register the convert_to_doctags function |
enable_convert_to_vtt | bool | True | Register the convert_to_vtt function |
enable_convert_string_content | bool | True | Register the convert_string_content function |
enable_list_supported_parsers | bool | True | Register the list_supported_parsers function |
all | bool | False | Enable all conversion functions when set to True |
Toolkit Functions
| Function | Description |
|---|---|
convert_to_markdown | Converts a document (file path or URL) to Markdown. Accepts source, optional headers for URL requests, raises_on_error, max_num_pages, and max_file_size. |
convert_to_text | Converts a document to plain text. Same parameters as convert_to_markdown. |
convert_to_html | Converts a document to HTML. Same parameters as convert_to_markdown. |
convert_to_html_split_page | Converts a document to HTML with page-level splitting. Same parameters as convert_to_markdown. |
convert_to_json | Converts a document to JSON. Same parameters as convert_to_markdown. |
convert_to_yaml | Converts a document to YAML. Same parameters as convert_to_markdown. |
convert_to_doctags | Converts a document to DocTags format. Same parameters as convert_to_markdown. |
convert_to_vtt | Converts a document (including audio/video) to VTT subtitle format. Same parameters as convert_to_markdown. |
convert_string_content | Converts raw string content (Markdown or HTML) to another format. Accepts content, source_format (default "markdown"), output_format (default "markdown"), and optional name. |
list_supported_parsers | Lists all supported Docling input parsers and any active format restrictions. |