Quickstart
Get your first API response in 60 seconds. The examples below assume an API key prefixed with pk-prov-. Replace pk-prov-YOUR-KEY with your own.
1. Make a request
curl
curl https://inference.provocative.earth/v1/chat/completions \
-H "Authorization: Bearer pk-prov-YOUR-KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "llama-3.1-70b-instruct",
"messages": [{"role": "user", "content": "Explain inference-as-a-service in one sentence."}],
"max_tokens": 100
}'
Python (OpenAI SDK)
from openai import OpenAI
client = OpenAI(
base_url="https://inference.provocative.earth/v1",
api_key="pk-prov-YOUR-KEY",
)
response = client.chat.completions.create(
model="llama-3.1-70b-instruct",
messages=[{"role": "user", "content": "Explain inference-as-a-service in one sentence."}],
max_tokens=100,
)
print(response.choices[0].message.content)
JavaScript (OpenAI SDK)
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://inference.provocative.earth/v1",
apiKey: "pk-prov-YOUR-KEY",
});
const response = await client.chat.completions.create({
model: "llama-3.1-70b-instruct",
messages: [{ role: "user", content: "Explain inference-as-a-service in one sentence." }],
max_tokens: 100,
});
console.log(response.choices[0].message.content);
2. Stream tokens
Add stream: true to get tokens as they're generated via Server-Sent Events:
stream = client.chat.completions.create(
model="llama-3.1-70b-instruct",
messages=[{"role": "user", "content": "Write a haiku about GPUs."}],
stream=True,
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
3. Generate embeddings
embeddings = client.embeddings.create(
model="bge-m3",
input=["search query", "document to compare"],
)
print(f"Dimensions: {len(embeddings.data[0].embedding)}")
Next steps
- Migrating from OpenAI — what changes, what doesn't
- Model Catalog — all available models with specs
- LoRA Adapters — serve your fine-tuned models
- Batch Inference — async bulk processing at 50% cost