← Back to PRs

#23739: feat: Docling RAG extension — native document processing for OpenClaw

by ihsanmokhlisse open 2026-02-22 16:56 View on GitHub →
size: XL
## Summary - **Problem:** OpenClaw cannot read PDFs, Word documents, spreadsheets, or presentations. Users who send documents via email/WhatsApp/Telegram get no response because the agent has no document understanding. - **Why it matters:** Document processing is the #1 capability gap vs ChatGPT/Claude. Every enterprise use case (legal, finance, HR, support) starts with "can it read my documents?" - **What changed:** New `docling-rag` extension using IBM's Docling library for document conversion + keyword search. 8 files, 42 tests. Registers 4 agent tools, auto-manages docling-serve lifecycle. - **What did NOT change:** No existing code modified. Entirely new extension under `extensions/docling-rag/`. ## Change Type (select all) - [x] Feature ## Scope (select all touched areas) - [x] Skills / tool execution - [x] Memory / storage - [x] Integrations ## Linked Issue/PR - Related #23200 (Docling-powered document processing feature request) ## What the extension provides ### 4 Agent Tools | Tool | What it does | |---|---| | `ingest_document` | Convert any document (PDF, DOCX, PPTX, XLSX, HTML, images, LaTeX, CSV) to structured chunks and store them | | `search_knowledge` | Search across all ingested documents by keyword, returns ranked results with citations | | `list_documents` | List all documents in the knowledge base with metadata | | `remove_document` | Remove a document from the knowledge base | ### Supported Formats PDF, DOCX, PPTX, XLSX, HTML, HTM, CSV, MD, TXT, LaTeX, PNG, JPG, JPEG, TIFF, BMP, WebP ### Architecture ``` extensions/docling-rag/ ├── package.json Plugin package ├── openclaw.plugin.json Plugin manifest with config schema ├── index.ts Plugin entry — registers tools + service ├── index.test.ts 42 tests └── src/ ├── types.ts Types + constants ├── docling-client.ts HTTP client for docling-serve REST API ├── server-manager.ts Manages docling-serve subprocess lifecycle └── store.ts Document + chunk storage with keyword search ``` ### How it works 1. User provides a document path via the `ingest_document` tool 2. Extension sends the file to docling-serve `/v1/chunk/hybrid/file` endpoint 3. Docling converts the document to structured markdown, preserving tables and layout 4. HybridChunker splits it into semantic chunks respecting section boundaries 5. Chunks are stored locally with metadata (page, section, document name) 6. Agent can search with `search_knowledge` — returns ranked results with citations ### Docling-serve management The extension auto-manages docling-serve as a subprocess: - **Lazy start:** Docling only starts when the first document is ingested (zero resources if never used) - **Fallback chain:** tries `docling-serve` CLI first, then Docker container - **Clean shutdown:** stops with the gateway - **External mode:** can connect to an externally-managed docling-serve via `doclingServeUrl` config ### Configuration ```json5 { "plugins": { "docling-rag": { "enabled": true, "config": { "doclingServeUrl": "http://127.0.0.1:5001", "autoManage": true, "storePath": "~/.openclaw/data/docling-rag" } } } } ``` ## User-visible / Behavior Changes 4 new agent tools available when the extension is enabled. No changes to existing behavior. ## Security Impact (required) - New permissions/capabilities? `Yes` — reads files from disk via paths provided by the agent - Secrets/tokens handling changed? `No` - New/changed network calls? `Yes` — HTTP calls to local docling-serve instance (loopback only by default) - Command/tool execution surface changed? `Yes` — spawns docling-serve subprocess (when auto-manage enabled) - Data access scope changed? `Yes` — reads document files, stores chunks locally - Risk + mitigation: - File access: agent provides paths; extension validates existence and supported format - Network: docling-serve binds to 127.0.0.1 only — no external exposure - Storage: chunks stored with 0600/0700 permissions, same pattern as auth-profiles - Subprocess: docling-serve is an IBM open-source tool (MIT license), spawned via `spawn` (no shell) ## Evidence - [x] 42 tests, all passing - [x] 0 lint errors (oxlint) - [x] 0 format issues (oxfmt) - [x] Test breakdown: DoclingClient (9), ServerManager (4), DocumentStore (20), SUPPORTED_EXTENSIONS (9) - [x] Tests use real temp directories for storage persistence verification ## Human Verification (required) - Verified: All 42 tests pass locally - Edge cases: missing files, unsupported formats, empty queries, empty store, server errors, duplicate ingestion, document removal - What I did **not** verify: Live docling-serve integration (requires `pip install docling-serve`). Tests mock the HTTP responses. ## Compatibility / Migration - Backward compatible? `Yes` — new extension, no existing behavior changed - Config/env changes? `Yes` — new `plugins.docling-rag` config section (only used if enabled) - Migration needed? `No` ## Failure Recovery (if this breaks) - Disable: set `plugins.docling-rag.enabled: false` in config - The extension is fully isolated — no shared state with other components ## Risks and Mitigations - Risk: docling-serve not installed - Mitigation: clear error message with install instructions for pip and Docker - Risk: large documents consume memory during processing - Mitigation: docling-serve handles memory management; extension only receives the chunked output - Risk: docling-serve REST API changes - Mitigation: client wraps API calls with error handling; response parsing is defensive ## AI-Assisted - [x] This PR was AI-assisted (Claude) - [x] Fully tested (42 tests) - [x] I understand what the code does - [x] Verified against Docling documentation and source code (docling v2.74.0, docling-serve v1.13.0, docling-mcp v1.3.4) Made with [Cursor](https://cursor.com) <!-- greptile_comment --> <h3>Greptile Summary</h3> This PR adds a new `docling-rag` extension that provides native document processing capabilities to OpenClaw. The extension integrates IBM's Docling library to convert documents (PDF, Word, Excel, PowerPoint, HTML, images) into searchable chunks stored locally with keyword-based retrieval. **Key changes:** - New extension under `extensions/docling-rag/` with 4 agent tools (`ingest_document`, `search_knowledge`, `list_documents`, `remove_document`) - Auto-managed `docling-serve` subprocess lifecycle with lazy-start pattern (CLI/Docker fallback) - Local JSON-based storage for documents and chunks with keyword search - Comprehensive test suite with 42 tests covering all components - No modifications to existing OpenClaw code — entirely additive **Architecture:** - `DoclingClient` handles HTTP communication with docling-serve REST API - `DoclingServerManager` manages subprocess lifecycle (lazy-start, graceful shutdown) - `DocumentStore` provides JSON file-based storage with keyword search - Plugin registers 4 tools and 1 service via standard OpenClaw plugin SDK **Security considerations:** - File access limited to agent-provided paths with format validation - Subprocess spawned without shell (`spawn` with args array) - Network binding to 127.0.0.1 (localhost only) - File permissions set to 0600/0700 for stored data <h3>Confidence Score: 4/5</h3> - This PR is safe to merge with minor considerations — well-tested, isolated extension with no impact on existing functionality - The implementation follows OpenClaw extension patterns correctly, includes comprehensive tests (42 tests covering all components), uses secure subprocess spawning without shell injection, and is completely isolated from existing code. Score reflects that the extension adds new capabilities without risk to core functionality. Not a 5 due to lack of live docling-serve integration testing (tests mock HTTP responses) and the subprocess management introduces a new external dependency. - No files require special attention — all components follow established patterns <sub>Last reviewed commit: fc916e0</sub> <!-- greptile_other_comments_section --> <!-- /greptile_comment -->

Most Similar PRs