3 months ago 3 months ago

Tutorial: RAG-Anything + LightRAG Multi-Modal Ingestion

This tutorial walks you through layering RAG-Anything on top of an existing LightRAG instance using Claude Code, extending your knowledge graph to handle scanned PDFs, embedded images, LaTeX equations, and bar charts. You will learn how MinerU parses complex documents into content-type buckets, how text and image segments route through separate model paths, and how the resulting outputs merge into a single unified index. The verified Act 2 also flags a breaking architecture change in MinerU v3.0.0 and expands the content-type map beyond what the video covers.

by marketingagent.io 3 months ago3 months ago

343views

Most RAG systems fail the moment you hand them a scanned PDF, a chart, or an image-heavy report — they simply cannot process non-text content. This tutorial shows you how to layer RAG-Anything on top of an existing LightRAG instance using Claude Code, extending your knowledge graph to handle scanned PDFs, embedded images, LaTeX equations, and bar charts. By the end, you will query a unified knowledge graph through the same Claude Code workflow you already know, with zero changes to how you ask questions.

LightRAG: the graph-based RAG foundation that RAG-Anything builds on — 32K stars and counting

Confirm you have a running Docker LightRAG instance from the prior setup. RAG-Anything is a wrapper around LightRAG, not a replacement — the existing knowledge graph, vector database, and LightRAG web UI all carry over intact.
Open your terminal and navigate into the existing LightRAG project directory.
Run the one-shot Claude Code prompt provided in the Chase AI community. Claude Code handles the full RAG-Anything installation on top of your existing LightRAG stack without requiring manual dependency management.

The 4-stage RAG-Anything pipeline: Document Parsing → Content Analysis → Knowledge Graph → Intelligent Retrieval

After installation, Claude Code updates the storage path in the RAG-Anything config to point at your existing Docker LightRAG data directory, keeping both systems reading from the same source.
Claude Code updates the LLM model config — switching from GPT-4o mini to GPT-4.1 nano — and sets text-embedding-3-large as the embedding model.

Warning: this step may differ from current official documentation — see the verified version below.

Claude Code patches the embedding double-wrap bug present in the RAG-Anything GitHub example scripts. The upstream examples wrap the embedding call inside a redundant outer layer that causes failures at ingest time; the fix removes it.

Warning: this step may differ from current official documentation — see the verified version below.

MinerU, the document parsing engine RAG-Anything depends on, downloads and installs locally. MinerU runs entirely on your machine at no cost, using specialized sub-models — including PaddleOCR — to classify and extract content from complex documents.

The test document: a financial report mixing text, tables, LaTeX equations, and bar charts — exactly the multimodal challenge RAG-Anything solves

To ingest a non-text document, tell Claude Code to use the RAG-Anything skill and pass it the file path. That single instruction triggers the full pipeline.
MinerU parses the document and segments it by content type — headers, body text, tables, LaTeX equations, and charts each receive their own classification. MinerU does not read or interpret the content; it identifies structure only.

RAG-Anything segments the document by content type: each section gets its own extraction treatment

Every segment that can be converted to text — including OCR’d scan content and LaTeX equations — routes into the text bucket and is sent to the configured LLM, which extracts embeddings, entities, and relationships from each chunk.
Segments that cannot be converted to text (charts, diagrams, embedded images) are captured as screenshots and sent to the LLM as vision input. The vision model extracts the same three outputs: embeddings, entities, and relationships.

Content-type routing revealed: text goes to the language model, LaTeX gets specialized parsing, charts become screenshots for vision models

RAG-Anything merges the four resulting outputs — two vector databases and two knowledge graphs, one pair from the text path and one from the image path — into a single vector database and a single knowledge graph.
That combined RAG-Anything index merges with the existing LightRAG knowledge graph and vector database, producing one unified store containing both text-native and multi-modal documents.

Extracted entities and relations flow into LightRAG's dual store: vector database plus knowledge graph — Extracted entities and relations flow into LightRAG’s dual store: vector database plus knowledge graph

Query the unified system exactly as before — the same LightRAG API, the same Claude Code workflow, and the same question format all work without modification.

How does this compare to the official docs?

The video gets the integration running fast, but the official RAG-Anything documentation defines canonical configuration options, supported model backends, and MinerU version requirements that determine whether this setup holds in production.

Here’s What the Official Docs Show

The video walks a solid path through this integration, and the core steps hold up well against the repositories. What the docs add is a critical architecture update for MinerU, a broader content-type map that expands what your pipeline actually handles, and a licensing flag you’ll want before shipping to production.

Step 1 — Confirm a running Docker LightRAG instance

The video’s approach here matches the current docs exactly. One addition worth flagging: LightRAG ships two Compose files — docker-compose.yml and docker-compose-full.yml. Which one your prior setup used determines how storage paths mount in step 4. Check before you proceed.

Step 2 — Navigate to the project directory

No official documentation was found for this step — proceed using the video’s approach and verify independently.

Step 3 — Run the Claude Code installation prompt

The video’s approach here matches the current docs exactly. RAG-Anything v1.2.10 is pip-installable, requires Python 3.10+, and is architecturally built on LightRAG — the README’s own BASED ON LIGHTRAG badge confirms the relationship the video describes.

One documentation note: the Claude Code screenshots captured for this step show the claude.ai consumer web application, not the CLI tool. The Claude Code terminal agent is documented separately at docs.anthropic.com/en/docs/claude-code.

Step 4 — Update storage path config

No official documentation was found for this step — proceed using the video’s approach and verify independently.

Step 5 — Switch LLM and embedding models

No official documentation was found for this step — proceed using the video’s approach and verify independently.

The OpenAI Platform screenshots captured show only the authentication page. GPT-4.1 nano and text-embedding-3-large from this step remain unverified against official model documentation.

Step 6 — Patch the embedding double-wrap bug

The video’s approach here matches the current docs exactly. The RAG-Anything examples/ commit history shows fix(examples): use openai_embed.func to prevent ... — this is a documented, known issue and the fix is already applied to the repository’s example scripts.

Step 7 — Install MinerU

MinerU is confirmed as the correct document parsing engine for this pipeline — the video’s tool selection matches the docs. As of April 2, 2026, however, the correct invocation model is service-based: MinerU v3.0.0 (released March 29, 2026) replaced direct binary execution with mineru-api orchestration. When --api-url is not provided, MinerU automatically starts a local temporary service. The video’s description of MinerU as a standalone locally installed tool reflects pre-3.0 behavior.

Additional flag: MinerU is licensed under AGPL-3.0. Any modifications to MinerU itself must be open-sourced under the same terms. The video does not mention this — it is a material consideration for commercial deployments.

Step 8 — Trigger ingestion via Claude Code

No official documentation was found for this step — proceed using the video’s approach and verify independently.

Step 9 — MinerU parses and segments the document

The video’s approach here matches the current docs exactly. MinerU’s repository tags (pdf-extractor-rag, layout-analysis, document-analysis) confirm the structural segmentation role the video describes. One addition: PaddleOCR is installed as part of MinerU’s dependency chain and handles OCR on scanned content. Its C++ and Java components may add platform-specific build complexity beyond a straightforward pip install.

Step 10 — Text segments route to the LLM

No official documentation was found for this step — proceed using the video’s approach and verify independently.

Step 11 — Image segments route to the vision model

The video’s approach here matches the current docs exactly. The RAG-Anything README confirms: “When documents include images, the system seamlessly integrates them into VLM for advanced multimodal analysis.” One important extension: the video describes the pipeline as processing “text and image buckets,” but the RAG-Anything README System Overview explicitly lists six distinct content types — text, images, tables, equations, charts, and multimedia. Tables and equations are separate processing categories with their own routing, not subcategories of the text path.

Steps 12–14 — Merge outputs and query

No official documentation was found for these steps — proceed using the video’s approach and verify independently.

Useful Links

GitHub – HKUDS/RAG-Anything — Official repository with v1.2.10 release, pip install instructions, and the examples/ embedding-fix commit
RAG-Anything README — System Overview section listing all six multimodal content types and VLM query mode details
GitHub – HKUDS/LightRAG — LightRAG source including both Docker Compose configurations and v1.4.13 release
GitHub – opendatalab/MinerU — MinerU source with v3.0.0 release notes documenting the mineru-api service architecture change and AGPL-3.0 license
MinerU README — v3.0.0 changelog detailing the shift from standalone invocation to service-based orchestration
GitHub – PaddlePaddle/PaddleOCR — PaddleOCR v3.4.0, the OCR engine bundled within MinerU for scanned document processing
PaddleOCR Documentation — Official docs describing PaddleOCR as the “bedrock for building intelligent RAG and Agentic applications”
OpenAI Platform — API reference for verifying model names used in step 5 (authentication required)
Docker Docs — Official Docker documentation for Compose configuration reference relevant to LightRAG deployment