Migrating from OpenAI

provocapi is a drop-in replacement for the OpenAI API. If your code uses the official OpenAI Python or JavaScript SDK, you can switch by changing two lines.

What changes

	OpenAI	provocapi
Base URL	`https://api.openai.com/v1`	`https://inference.provocative.earth/v1`
API key prefix	`sk-`	`pk-prov-`
Models	`gpt-4o`, `text-embedding-3-large`	`llama-3.1-70b-instruct`, `bge-m3`

What doesn't change

Request/response format — identical JSON shapes for chat completions, completions, embeddings, and model listing.
Streaming — same SSE format, same [DONE] sentinel.
Tool/function calling — supported on chat models that have it (Llama 3.1, Qwen 2.5).
JSON mode — pass response_format: {"type": "json_object"} and it works via grammar-constrained decoding.
Error format — same {"error": {"message": ..., "type": ..., "code": ...}} shape.
SDK methods — client.chat.completions.create(), client.embeddings.create(), client.models.list() all work unchanged.

Python migration

  from openai import OpenAI

  client = OpenAI(
-     # defaults to OPENAI_API_KEY env var
+     base_url="https://inference.provocative.earth/v1",
+     api_key="pk-prov-YOUR-KEY",  # or set OPENAI_API_KEY
  )

  response = client.chat.completions.create(
-     model="gpt-4o",
+     model="llama-3.1-70b-instruct",
      messages=[{"role": "user", "content": "hello"}],
  )

Alternatively, set environment variables and change nothing in code:

export OPENAI_BASE_URL=https://inference.provocative.earth/v1
export OPENAI_API_KEY=pk-prov-YOUR-KEY

JavaScript migration

  import OpenAI from "openai";

  const client = new OpenAI({
+   baseURL: "https://inference.provocative.earth/v1",
+   apiKey: "pk-prov-YOUR-KEY",
  });

Model mapping

Use this table to find the provocapi equivalent of the OpenAI model you're currently using:

OpenAI model	provocapi equivalent	Notes
`gpt-4o`	`llama-3.1-70b-instruct`	Comparable on most benchmarks, 30-50% cheaper
`gpt-4o-mini`	`llama-3.1-8b-instruct`	Fast, cheap, good for routing/classification
`gpt-4-turbo`	`qwen-2.5-72b-instruct`	Strong multilingual and coding
`gpt-3.5-turbo`	`mistral-small-3-24b`	Mid-tier, good general purpose
`text-embedding-3-large`	`bge-m3`	1024-dim, multilingual
`text-embedding-3-small`	`nomic-embed-v1.5`	768-dim, fast, cheap

Features not yet supported

These OpenAI features are not available on provocapi v1. Plan accordingly:

Assistants API (threads, runs, file search) — we're inference-only, not a RAG product
Image generation (DALL-E) — not on our roadmap
Audio (Whisper, TTS) — coming in v1.5
Vision (image inputs) — coming in v1.5 with Llama 3.2 Vision
Fine-tuning API — bring your own LoRA weights instead (see LoRA Adapters)
Moderation endpoint — not provided
Realtime API (WebRTC) — not provided

Response headers

provocapi adds observability headers to every response that OpenAI doesn't include:

Header	Description
`X-Request-Id`	Unique request ID for tracing
`X-Provocapi-Worker`	Which backend worker served the request
`X-Provocapi-Queue-Ms`	Time spent in the routing queue
`X-Provocapi-Model`	Resolved model ID

Your existing code will ignore these (they're custom headers), but they're useful for debugging latency issues.