> ## Documentation Index > Fetch the complete documentation index at: https://agno-v2-studio-tools-doc.mintlify.site/llms.txt > Use this file to discover all available pages before exploring further. # Chunking > Split documents into smaller pieces for effective vector search. Chunking divides content into smaller pieces before embedding and storing in a vector database. The strategy you choose affects search quality and retrieval accuracy. ```python theme={null} from agno.knowledge.chunking.semantic_chunking import SemanticChunking from agno.knowledge.reader.pdf_reader import PDFReader reader = PDFReader( chunking_strategy=SemanticChunking(), ) ``` ## Why Chunking Matters Consider processing a recipe book with different strategies: | Strategy | Result | | ----------------------- | ------------------------------------------------ | | Fixed Size (5000 chars) | May split recipes mid-instruction | | Semantic | Keeps complete recipes together based on meaning | | Document | Each page becomes a chunk | The right strategy returns complete, relevant results. The wrong one returns fragments. ## Available Strategies Split into uniform chunks by character count Split at natural breakpoints based on meaning Split using multiple separators hierarchically Preserve document structure (sections, pages) Split by heading structure Each row becomes a chunk AI determines optimal boundaries Split at function and class boundaries using AST analysis Build your own strategy ## Using with Readers Pass a chunking strategy to any reader: ```python theme={null} from agno.knowledge.knowledge import Knowledge from agno.knowledge.chunking.fixed_size_chunking import FixedSizeChunking from agno.knowledge.reader.pdf_reader import PDFReader from agno.vectordb.pgvector import PgVector reader = PDFReader( chunking_strategy=FixedSizeChunking(chunk_size=3000), ) knowledge = Knowledge( vector_db=PgVector(table_name="docs", db_url=db_url), ) knowledge.insert(path="documents/", reader=reader) ``` ## Choosing a Strategy | Content Type | Recommended Strategy | Why | | ---------------- | -------------------- | --------------------------------------- | | General text | Semantic | Maintains meaning and context | | Structured docs | Document | Preserves sections and hierarchy | | Markdown files | Markdown | Respects heading structure | | CSV/tabular data | CSV Row | Each row is a logical unit | | Source code | Code | Splits at function and class boundaries | | Mixed content | Recursive | Handles multiple separator types | | Need consistency | Fixed Size | Predictable chunk dimensions | Each reader has a sensible default, but you can override it based on your content and retrieval needs. ## Configuration Most strategies accept configuration options: ```python theme={null} # Fixed size with overlap FixedSizeChunking( chunk_size=5000, # Characters per chunk overlap=200, # Overlap between chunks ) # Semantic with threshold SemanticChunking( similarity_threshold=0.7, # Lower = more splits ) # Recursive with custom separators RecursiveChunking( separators=["\n\n", "\n", ". ", " "], chunk_size=4000, ) ``` ## Chunk Size Guidelines | Chunk Size | Trade-off | | ----------------------- | ---------------------------------------- | | Small (1000-3000 chars) | More precise retrieval, may lose context | | Default (5000 chars) | Balanced precision and context | | Large (8000+ chars) | More context, less targeted results | Smaller chunks work better for specific questions. Larger chunks work better when context matters. ## Next Steps Split content by meaning Uniform chunk sizes Configure readers with chunking How chunking affects search