🌐REST API

Kallia provides a comprehensive RESTful API built with FastAPI for document processing, semantic chunking, and memory management. This guide covers all available endpoints, request/response formats, and usage examples.

Base URL

When running locally:

http://localhost:8000

API Overview

The Kallia API provides four main endpoints:

POST /documents - Complete document processing pipeline
POST /markdownify - Document to markdown conversion
POST /chunks - Text to semantic chunks conversion
POST /memories - Conversation memory generation

Authentication

Currently, the API does not require authentication. For production deployments, consider implementing authentication middleware.

Content Type

All requests should use Content-Type: application/json.

Endpoints

1. Process Documents

Endpoint: POST /documents

Complete document processing pipeline that converts a document to markdown and creates semantic chunks.

Request

{
  "url": "string",
  "page_number": 1,
  "temperature": 0.7,
  "max_tokens": 4000,
  "include_image_captioning": false
}

Parameters

url (string, required): Path or URL to the document
page_number (integer, optional): Page number to process (default: 1)
temperature (float, optional): AI model temperature 0.0-1.0 (default: 0.0)
max_tokens (integer, optional): Maximum tokens for processing (default: 8192)
include_image_captioning (boolean, optional): Enable image descriptions (default: false)

Response

{
  "documents": [
    {
      "page_number": 1,
      "chunks": [
        {
          "original_text": "Original text content...",
          "concise_summary": "Brief summary of the content",
          "question": "What does this section discuss?",
          "answer": "This section discusses..."
        }
      ]
    }
  ]
}

Example

curl -X POST "http://localhost:8000/documents" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/document.pdf",
    "page_number": 1,
    "temperature": 0.7,
    "max_tokens": 4000,
    "include_image_captioning": true
  }'

2. Convert to Markdown

Endpoint: POST /markdownify

Converts a document to structured markdown format.

Request

{
  "url": "string",
  "page_number": 1,
  "temperature": 0.7,
  "max_tokens": 4000,
  "include_image_captioning": false
}

Parameters

url (string, required): Path or URL to the document
page_number (integer, optional): Page number to process (default: 1)
temperature (float, optional): AI model temperature 0.0-1.0 (default: 0.0)
max_tokens (integer, optional): Maximum tokens for processing (default: 8192)
include_image_captioning (boolean, optional): Enable image descriptions (default: false)

Response

{
  "markdown": "# Document Title\n\nDocument content in markdown format..."
}

Example

curl -X POST "http://localhost:8000/markdownify" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "document.pdf",
    "page_number": 1,
    "temperature": 0.5,
    "max_tokens": 6000
  }'

3. Create Semantic Chunks

Endpoint: POST /chunks

Converts text content into semantic chunks with summaries and Q&A pairs.

Request

{
  "text": "string",
  "temperature": 0.7,
  "max_tokens": 4000
}

Parameters

text (string, required): Text content to chunk
temperature (float, optional): AI model temperature (default: 0.0)
max_tokens (integer, optional): Maximum tokens for processing (default: 8192)

Response

{
  "chunks": [
    {
      "original_text": "Original text segment...",
      "concise_summary": "Summary of this segment",
      "question": "Generated question about content",
      "answer": "Answer to the generated question"
    }
  ]
}

Example

curl -X POST "http://localhost:8000/chunks" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "This is a long document that needs to be chunked into semantic segments...",
    "temperature": 0.8,
    "max_tokens": 5000
  }'

4. Generate Memories

Endpoint: POST /memories

Creates contextual memories from conversation history.

Request

{
  "messages": [
    {
      "role": "user",
      "content": "Hello, how are you?"
    },
    {
      "role": "assistant",
      "content": "I'm doing well, thank you!"
    }
  ],
  "temperature": 0.7,
  "max_tokens": 4000
}

Parameters

messages (array, required): Array of conversation messages
- role (string): Either "user" or "assistant"
- content (string): Message content
temperature (float, optional): AI model temperature (default: 0.0)
max_tokens (integer, optional): Maximum tokens for processing (default: 8192)

Response

{
  "memories": {
    "short_term": {
      "recent_context": "Summary of recent conversation",
      "topics": ["topic1", "topic2"]
    },
    "long_term": {
      "insights": "Key insights extracted",
      "patterns": ["pattern1", "pattern2"]
    },
    "relationships": {
      "connections": "Relationship mappings"
    }
  }
}

Example

curl -X POST "http://localhost:8000/memories" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "user", "content": "What is semantic chunking?"},
      {"role": "assistant", "content": "Semantic chunking is a process..."},
      {"role": "user", "content": "How does it work?"},
      {"role": "assistant", "content": "It works by analyzing..."}
    ],
    "temperature": 0.6,
    "max_tokens": 3000
  }'

Error Handling

The API uses standard HTTP status codes and returns detailed error messages.

Status Codes

200 - Success
400 - Bad Request (invalid parameters)
500 - Internal Server Error
503 - Service Unavailable

Error Response Format

{
  "detail": "Error description"
}

Common Errors

400 Bad Request

{
  "detail": "Invalid File Format"
}

Occurs when:

Unsupported file format is provided
Required parameters are missing
Invalid parameter values

500 Internal Server Error

{
  "detail": "Internal Server Error"
}

Occurs when:

Document processing fails
AI model errors
Unexpected system errors

503 Service Unavailable

{
  "detail": "Service Unavailable"
}

Occurs when:

External services are unreachable
Network connectivity issues
Resource limitations

Supported File Formats

Currently supported formats:

PDF documents

The architecture is designed to be extensible for additional formats.

Rate Limiting

No rate limiting is currently implemented. For production use, consider implementing rate limiting middleware.

Request Size Limits

Maximum file size: Depends on server configuration
Maximum text length: No explicit limit (limited by max_tokens)

API Documentation

When the server is running, interactive API documentation is available at:

Swagger UI: http://localhost:8000/docs

These interfaces provide:

Interactive endpoint testing
Request/response schema validation
Example requests and responses
Parameter documentation

Performance Considerations

Optimization Tips

Batch Processing: Process multiple pages separately for large documents
Caching: Implement client-side caching for repeated requests
Async Requests: Use async clients for concurrent processing
Parameter Tuning: Adjust temperature and max_tokens based on use case

Response Times

Typical response times (varies by document size and complexity):

/markdownify: 2-10 seconds
/chunks: 5-15 seconds
/documents: 7-25 seconds
/memories: 1-5 seconds

Next Steps

Learn about Docker deployment for production setup
Explore use cases for practical implementations
Check the core library for direct Python integration
Review configuration options for customization

Support

For API-related questions or issues:

GitHub Issues: https://github.com/kallia-project/kallia/issues
Email: ck@kallia.net
API Documentation: http://localhost:8000/docs

PreviousCore Library NextDocker Setup

Last updated 3 months ago

Good morning

Base URL

API Overview

Authentication

Content Type

Endpoints

1. Process Documents

Request

Parameters

Response

Example

2. Convert to Markdown

Request

Parameters

Response

Example

3. Create Semantic Chunks

Request

Parameters

Response

Example

4. Generate Memories

Request

Parameters

Response

Example

Error Handling

Status Codes

Error Response Format

Common Errors

400 Bad Request

500 Internal Server Error

503 Service Unavailable

Supported File Formats

Rate Limiting

Request Size Limits

API Documentation

Performance Considerations

Optimization Tips

Response Times

Next Steps

Support