Carbon-negative inference. Built in Massachusetts.
Provocative is an OpenAI-compatible inference API for open-weight LLMs and embedding models, running on GPUs we own and operate. Our datacenter co-locates compute with on-site direct air capture — inference workloads run alongside hardware that pulls CO₂ out of the atmosphere.
Why Provocative
-
Owned hardware
We run our own racks of H100, Blackwell, and RTX 5090 GPUs in our Massachusetts facility. Marginal cost is electricity and amortization, not an AWS markup — which translates to per-token prices 30–50% below cloud inference and contractual reserved capacity for teams that need it.
-
Carbon-negative datacenter
Compute is co-located with direct air capture at the same facility. The datacenter operates net-negative on emissions — removing more CO₂ from the atmosphere than it consumes — without relying on retroactive offset credits.
-
Northeast US POP
Inference traffic terminates in Massachusetts, not us-east-1. Customers in the Northeast see lower round-trip latency than they would from any major-cloud region — meaningful for voice agents, IDE autocomplete, and real-time applications.
-
Contractual data residency
Because we own the racks, we can guarantee in writing where your prompts and completions are processed. Useful for fintech, healthtech, legal, and EU teams with regulatory exposure.
Drop-in OpenAI SDK
Change two lines and keep your existing OpenAI client:
from openai import OpenAI
client = OpenAI(
base_url="https://inference.provocative.earth/v1",
api_key="pk-prov-YOUR-KEY",
)
response = client.chat.completions.create(
model="llama-3.1-70b-instruct",
messages=[{"role": "user", "content": "Hello!"}],
)
The full migration guide is here.
Models
We serve a curated catalog of open-weight chat and embedding models, including Llama 3.1 (8B / 70B / 405B), Qwen 2.5 Coder 32B and 72B, Mistral Small 3, DeepSeek V3, and the embedding models BGE-M3, E5-Mistral, and Nomic Embed. The full catalog lists context windows, GPU class, and per-token rates.
Next steps
- Quickstart — first API call in under a minute
- Migrating from OpenAI — two-line swap, full SDK compatibility
- Pricing — per-token, batch, and reserved capacity
- Privacy — what we log, what we don't, and where the data lives