🌐REST API
Kallia provides a comprehensive RESTful API built with FastAPI for document processing, semantic chunking, and memory management. This guide covers all available endpoints, request/response formats, and usage examples.
Base URL
When running locally:
http://localhost:8000API Overview
The Kallia API provides four main endpoints:
POST /documents- Complete document processing pipelinePOST /markdownify- Document to markdown conversionPOST /chunks- Text to semantic chunks conversionPOST /memories- Conversation memory generation
Authentication
Currently, the API does not require authentication. For production deployments, consider implementing authentication middleware.
Content Type
All requests should use Content-Type: application/json.
Endpoints
1. Process Documents
Endpoint: POST /documents
Complete document processing pipeline that converts a document to markdown and creates semantic chunks.
Request
Parameters
url(string, required): Path or URL to the documentpage_number(integer, optional): Page number to process (default: 1)temperature(float, optional): AI model temperature 0.0-1.0 (default: 0.0)max_tokens(integer, optional): Maximum tokens for processing (default: 8192)include_image_captioning(boolean, optional): Enable image descriptions (default: false)
Response
Example
2. Convert to Markdown
Endpoint: POST /markdownify
Converts a document to structured markdown format.
Request
Parameters
url(string, required): Path or URL to the documentpage_number(integer, optional): Page number to process (default: 1)temperature(float, optional): AI model temperature 0.0-1.0 (default: 0.0)max_tokens(integer, optional): Maximum tokens for processing (default: 8192)include_image_captioning(boolean, optional): Enable image descriptions (default: false)
Response
Example
3. Create Semantic Chunks
Endpoint: POST /chunks
Converts text content into semantic chunks with summaries and Q&A pairs.
Request
Parameters
text(string, required): Text content to chunktemperature(float, optional): AI model temperature (default: 0.0)max_tokens(integer, optional): Maximum tokens for processing (default: 8192)
Response
Example
4. Generate Memories
Endpoint: POST /memories
Creates contextual memories from conversation history.
Request
Parameters
messages(array, required): Array of conversation messagesrole(string): Either "user" or "assistant"content(string): Message content
temperature(float, optional): AI model temperature (default: 0.0)max_tokens(integer, optional): Maximum tokens for processing (default: 8192)
Response
Example
Error Handling
The API uses standard HTTP status codes and returns detailed error messages.
Status Codes
200- Success400- Bad Request (invalid parameters)500- Internal Server Error503- Service Unavailable
Error Response Format
Common Errors
400 Bad Request
Occurs when:
Unsupported file format is provided
Required parameters are missing
Invalid parameter values
500 Internal Server Error
Occurs when:
Document processing fails
AI model errors
Unexpected system errors
503 Service Unavailable
Occurs when:
External services are unreachable
Network connectivity issues
Resource limitations
Supported File Formats
Currently supported formats:
PDF documents
The architecture is designed to be extensible for additional formats.
Rate Limiting
No rate limiting is currently implemented. For production use, consider implementing rate limiting middleware.
Request Size Limits
Maximum file size: Depends on server configuration
Maximum text length: No explicit limit (limited by max_tokens)
API Documentation
When the server is running, interactive API documentation is available at:
Swagger UI:
http://localhost:8000/docs
These interfaces provide:
Interactive endpoint testing
Request/response schema validation
Example requests and responses
Parameter documentation
Performance Considerations
Optimization Tips
Batch Processing: Process multiple pages separately for large documents
Caching: Implement client-side caching for repeated requests
Async Requests: Use async clients for concurrent processing
Parameter Tuning: Adjust temperature and max_tokens based on use case
Response Times
Typical response times (varies by document size and complexity):
/markdownify: 2-10 seconds/chunks: 5-15 seconds/documents: 7-25 seconds/memories: 1-5 seconds
Next Steps
Learn about Docker deployment for production setup
Explore use cases for practical implementations
Check the core library for direct Python integration
Review configuration options for customization
Support
For API-related questions or issues:
GitHub Issues: https://github.com/kallia-project/kallia/issues
Email: ck@kallia.net
API Documentation:
http://localhost:8000/docs
Last updated