π§Core Library
The Kallia core library provides the fundamental building blocks for semantic document processing. This section covers the main components, their functionality, and how to use them programmatically.
Overview
The core library consists of several key modules:
Documents: Document conversion and processing
Chunker: Semantic chunking and content segmentation
Memories: Conversation memory management
Models: Data structures and validation
Utils: Utility functions and helpers
Installation
Using pip
pip install kallia
From Source
git clone https://github.com/kallia-project/kallia.git
cd kallia
pip install -e .
Requirements
Python 3.11 or higher
FastAPI 0.115.14
Docling 2.41.0
Pydantic for data validation
Core Components
Documents Module
The Documents
class handles document conversion from various formats to structured markdown.
Key Features
PDF to markdown conversion using Docling
Configurable page selection
Image captioning support
OCR capabilities
Scalable image processing
Usage
from kallia_core.documents import Documents
# Convert a PDF page to markdown
markdown_content = Documents.to_markdown(
source="document.pdf",
page_number=1,
temperature=0.7,
max_tokens=4000,
include_image_captioning=True
)
Parameters
source
(str): Path or URL to the documentpage_number
(int): Specific page to process (default: 1)temperature
(float): AI model temperature for processing (default: 0.0)max_tokens
(int): Maximum tokens for AI processing (default: 8192)include_image_captioning
(bool): Enable image description generation (default: False)
Chunker Module
The Chunker
class performs semantic segmentation of text content into meaningful chunks.
Key Features
Semantic understanding of content structure
Automatic summary generation
Question-answer pair creation
Context-aware segmentation
Optimized for retrieval tasks
Usage
from kallia_core.chunker import Chunker
# Create semantic chunks from text
chunks = Chunker.create(
text=markdown_content,
temperature=0.7,
max_tokens=4000
)
# Each chunk contains:
for chunk in chunks:
print(f"Original: {chunk.original_text}")
print(f"Summary: {chunk.concise_summary}")
print(f"Question: {chunk.question}")
print(f"Answer: {chunk.answer}")
Chunk Structure
Each chunk is a Chunk
object with the following properties:
original_text
: The original text segmentconcise_summary
: AI-generated summaryquestion
: Generated question about the contentanswer
: Answer to the generated question
Memories Module
The Memories
class extracts and manages conversational context and insights.
Key Features
Short-term conversation context
Long-term insight extraction
Pattern recognition
Contextual relationship mapping
Memory persistence
Usage
from kallia_core.memories import Memories
from kallia_core.models import Message
# Create conversation messages
messages = [
Message(role="user", content="What is this document about?"),
Message(role="assistant", content="This document discusses semantic processing..."),
Message(role="user", content="How does the chunking work?"),
Message(role="assistant", content="The chunking algorithm analyzes...")
]
# Generate memories
memories = Memories.create(
messages=messages,
temperature=0.7,
max_tokens=4000
)
Next Steps
Explore the REST API for web service integration
Learn about Docker deployment for production use
Check out use cases for practical examples
Support
For technical issues or questions about the core library:
GitHub Issues: https://github.com/kallia-project/kallia/issues
Email: ck@kallia.net
Last updated