Migrating from OpenAI
provocapi is a drop-in replacement for the OpenAI API. If your code uses the official OpenAI Python or JavaScript SDK, you can switch by changing two lines.
What changes
| OpenAI | provocapi | |
|---|---|---|
| Base URL | https://api.openai.com/v1 |
https://inference.provocative.earth/v1 |
| API key prefix | sk- |
pk-prov- |
| Models | gpt-4o, text-embedding-3-large |
llama-3.1-70b-instruct, bge-m3 |
What doesn't change
- Request/response format — identical JSON shapes for chat completions, completions, embeddings, and model listing.
- Streaming — same SSE format, same
[DONE]sentinel. - Tool/function calling — supported on chat models that have it (Llama 3.1, Qwen 2.5).
- JSON mode — pass
response_format: {"type": "json_object"}and it works via grammar-constrained decoding. - Error format — same
{"error": {"message": ..., "type": ..., "code": ...}}shape. - SDK methods —
client.chat.completions.create(),client.embeddings.create(),client.models.list()all work unchanged.
Python migration
from openai import OpenAI
client = OpenAI(
- # defaults to OPENAI_API_KEY env var
+ base_url="https://inference.provocative.earth/v1",
+ api_key="pk-prov-YOUR-KEY", # or set OPENAI_API_KEY
)
response = client.chat.completions.create(
- model="gpt-4o",
+ model="llama-3.1-70b-instruct",
messages=[{"role": "user", "content": "hello"}],
)
Alternatively, set environment variables and change nothing in code:
export OPENAI_BASE_URL=https://inference.provocative.earth/v1
export OPENAI_API_KEY=pk-prov-YOUR-KEY
JavaScript migration
import OpenAI from "openai";
const client = new OpenAI({
+ baseURL: "https://inference.provocative.earth/v1",
+ apiKey: "pk-prov-YOUR-KEY",
});
Model mapping
Use this table to find the provocapi equivalent of the OpenAI model you're currently using:
| OpenAI model | provocapi equivalent | Notes |
|---|---|---|
gpt-4o |
llama-3.1-70b-instruct |
Comparable on most benchmarks, 30-50% cheaper |
gpt-4o-mini |
llama-3.1-8b-instruct |
Fast, cheap, good for routing/classification |
gpt-4-turbo |
qwen-2.5-72b-instruct |
Strong multilingual and coding |
gpt-3.5-turbo |
mistral-small-3-24b |
Mid-tier, good general purpose |
text-embedding-3-large |
bge-m3 |
1024-dim, multilingual |
text-embedding-3-small |
nomic-embed-v1.5 |
768-dim, fast, cheap |
Features not yet supported
These OpenAI features are not available on provocapi v1. Plan accordingly:
- Assistants API (threads, runs, file search) — we're inference-only, not a RAG product
- Image generation (DALL-E) — not on our roadmap
- Audio (Whisper, TTS) — coming in v1.5
- Vision (image inputs) — coming in v1.5 with Llama 3.2 Vision
- Fine-tuning API — bring your own LoRA weights instead (see LoRA Adapters)
- Moderation endpoint — not provided
- Realtime API (WebRTC) — not provided
Response headers
provocapi adds observability headers to every response that OpenAI doesn't include:
| Header | Description |
|---|---|
X-Request-Id |
Unique request ID for tracing |
X-Provocapi-Worker |
Which backend worker served the request |
X-Provocapi-Queue-Ms |
Time spent in the routing queue |
X-Provocapi-Model |
Resolved model ID |
Your existing code will ignore these (they're custom headers), but they're useful for debugging latency issues.