Skip to content

API Reference

Base URL: https://inference.provocative.earth/v1

All requests require a Bearer token: Authorization: Bearer pk-prov-YOUR-KEY

The full OpenAPI spec is available at /openapi.json.

Endpoints

Chat Completions

POST /v1/chat/completions

Create a chat completion. Supports streaming via SSE.

Request body:

Field Type Required Description
model string Yes Model ID (e.g., llama-3.1-70b-instruct). Append :adapter-name for LoRA.
messages array Yes Array of {role, content} objects. Roles: system, user, assistant, tool.
stream boolean No Default false. Set true for Server-Sent Events streaming.
max_tokens integer No Max output tokens. Shared tier cap: 8192.
temperature float No Sampling temperature, 0-2. Default 1.0.
top_p float No Nucleus sampling. Default 1.0.
tools array No Tool/function definitions for function calling.
response_format object No {"type": "json_object"} for JSON-only output.

Response (non-streaming):

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1700000000,
  "model": "llama-3.1-70b-instruct",
  "choices": [
    {
      "index": 0,
      "message": {"role": "assistant", "content": "Hello!"},
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 5,
    "total_tokens": 15
  }
}

Response (streaming): Server-Sent Events where each data: line is a chunk:

data: {"id":"chatcmpl-abc123","choices":[{"delta":{"role":"assistant"},"index":0}]}

data: {"id":"chatcmpl-abc123","choices":[{"delta":{"content":"Hello"},"index":0}]}

data: {"id":"chatcmpl-abc123","choices":[{"delta":{"content":"!"},"index":0,"finish_reason":"stop"}],"usage":{...}}

data: [DONE]

Completions

POST /v1/completions

Legacy text completion. Same parameters as chat completions but with prompt instead of messages.

Embeddings

POST /v1/embeddings
Field Type Required Description
model string Yes Embedding model ID (e.g., bge-m3).
input string or array Yes Text(s) to embed.

Response:

{
  "object": "list",
  "data": [
    {"object": "embedding", "index": 0, "embedding": [0.123, -0.456, ...]}
  ],
  "model": "bge-m3",
  "usage": {"prompt_tokens": 5, "total_tokens": 5}
}

Models

GET /v1/models

List all models available to your tenant.

Usage

GET /v1/usage?start_date=2025-01-01&end_date=2025-01-31

Returns per-model, per-day token counts and latency percentiles. Requires ClickHouse backend.

Adapters

POST   /v1/adapters          — create from HF repo
POST   /v1/adapters/upload   — upload weights directly
GET    /v1/adapters           — list
GET    /v1/adapters/{id}      — get details
DELETE /v1/adapters/{id}      — delete

See LoRA Adapters guide.

Batch

POST   /v1/batch              — submit JSONL file
GET    /v1/batch               — list jobs
GET    /v1/batch/{id}          — get status
GET    /v1/batch/{id}/output   — download results
POST   /v1/batch/{id}/cancel   — cancel job

See Batch Inference guide.

Error format

All errors follow the OpenAI error shape:

{
  "error": {
    "message": "Human-readable description",
    "type": "invalid_request_error",
    "code": "invalid_api_key"
  }
}
HTTP Status Meaning
400 Invalid request (bad JSON, missing field)
401 Invalid or missing API key
404 Model not found
413 Request too large (batch file, adapter)
429 Rate limit exceeded. Check Retry-After header.
502 Upstream worker error
503 No healthy workers for the requested model

Response headers

Header Description
X-Request-Id Unique request ID
X-Provocapi-Worker Worker that served the request
X-Provocapi-Queue-Ms Queue wait time in milliseconds
X-Provocapi-Model Resolved model ID
Retry-After Seconds until rate limit resets (on 429)