🌐REST API

Kallia provides a comprehensive RESTful API built with FastAPI for document processing, semantic chunking, and memory management. This guide covers all available endpoints, request/response formats, and usage examples.

Base URL

When running locally:

http://localhost:8000

API Overview

The Kallia API provides four main endpoints:

  • POST /documents - Complete document processing pipeline

  • POST /markdownify - Document to markdown conversion

  • POST /chunks - Text to semantic chunks conversion

  • POST /memories - Conversation memory generation

Authentication

Currently, the API does not require authentication. For production deployments, consider implementing authentication middleware.

Content Type

All requests should use Content-Type: application/json.

Endpoints

1. Process Documents

Endpoint: POST /documents

Complete document processing pipeline that converts a document to markdown and creates semantic chunks.

Request

Parameters

  • url (string, required): Path or URL to the document

  • page_number (integer, optional): Page number to process (default: 1)

  • temperature (float, optional): AI model temperature 0.0-1.0 (default: 0.0)

  • max_tokens (integer, optional): Maximum tokens for processing (default: 8192)

  • include_image_captioning (boolean, optional): Enable image descriptions (default: false)

Response

Example

2. Convert to Markdown

Endpoint: POST /markdownify

Converts a document to structured markdown format.

Request

Parameters

  • url (string, required): Path or URL to the document

  • page_number (integer, optional): Page number to process (default: 1)

  • temperature (float, optional): AI model temperature 0.0-1.0 (default: 0.0)

  • max_tokens (integer, optional): Maximum tokens for processing (default: 8192)

  • include_image_captioning (boolean, optional): Enable image descriptions (default: false)

Response

Example

3. Create Semantic Chunks

Endpoint: POST /chunks

Converts text content into semantic chunks with summaries and Q&A pairs.

Request

Parameters

  • text (string, required): Text content to chunk

  • temperature (float, optional): AI model temperature (default: 0.0)

  • max_tokens (integer, optional): Maximum tokens for processing (default: 8192)

Response

Example

4. Generate Memories

Endpoint: POST /memories

Creates contextual memories from conversation history.

Request

Parameters

  • messages (array, required): Array of conversation messages

    • role (string): Either "user" or "assistant"

    • content (string): Message content

  • temperature (float, optional): AI model temperature (default: 0.0)

  • max_tokens (integer, optional): Maximum tokens for processing (default: 8192)

Response

Example

Error Handling

The API uses standard HTTP status codes and returns detailed error messages.

Status Codes

  • 200 - Success

  • 400 - Bad Request (invalid parameters)

  • 500 - Internal Server Error

  • 503 - Service Unavailable

Error Response Format

Common Errors

400 Bad Request

Occurs when:

  • Unsupported file format is provided

  • Required parameters are missing

  • Invalid parameter values

500 Internal Server Error

Occurs when:

  • Document processing fails

  • AI model errors

  • Unexpected system errors

503 Service Unavailable

Occurs when:

  • External services are unreachable

  • Network connectivity issues

  • Resource limitations

Supported File Formats

Currently supported formats:

  • PDF documents

The architecture is designed to be extensible for additional formats.

Rate Limiting

No rate limiting is currently implemented. For production use, consider implementing rate limiting middleware.

Request Size Limits

  • Maximum file size: Depends on server configuration

  • Maximum text length: No explicit limit (limited by max_tokens)

API Documentation

When the server is running, interactive API documentation is available at:

  • Swagger UI: http://localhost:8000/docs

These interfaces provide:

  • Interactive endpoint testing

  • Request/response schema validation

  • Example requests and responses

  • Parameter documentation

Performance Considerations

Optimization Tips

  1. Batch Processing: Process multiple pages separately for large documents

  2. Caching: Implement client-side caching for repeated requests

  3. Async Requests: Use async clients for concurrent processing

  4. Parameter Tuning: Adjust temperature and max_tokens based on use case

Response Times

Typical response times (varies by document size and complexity):

  • /markdownify: 2-10 seconds

  • /chunks: 5-15 seconds

  • /documents: 7-25 seconds

  • /memories: 1-5 seconds

Next Steps

Support

For API-related questions or issues:

Last updated