Skip to content

Carbon-negative inference. Built in Massachusetts.

Provocative is an OpenAI-compatible inference API for open-weight LLMs and embedding models, running on GPUs we own and operate. Our datacenter co-locates compute with on-site direct air capture — inference workloads run alongside hardware that pulls CO₂ out of the atmosphere.

Get started See pricing

Why Provocative

  • Owned hardware


    We run our own racks of H100, Blackwell, and RTX 5090 GPUs in our Massachusetts facility. Marginal cost is electricity and amortization, not an AWS markup — which translates to per-token prices 30–50% below cloud inference and contractual reserved capacity for teams that need it.

  • Carbon-negative datacenter


    Compute is co-located with direct air capture at the same facility. The datacenter operates net-negative on emissions — removing more CO₂ from the atmosphere than it consumes — without relying on retroactive offset credits.

  • Northeast US POP


    Inference traffic terminates in Massachusetts, not us-east-1. Customers in the Northeast see lower round-trip latency than they would from any major-cloud region — meaningful for voice agents, IDE autocomplete, and real-time applications.

  • Contractual data residency


    Because we own the racks, we can guarantee in writing where your prompts and completions are processed. Useful for fintech, healthtech, legal, and EU teams with regulatory exposure.

Drop-in OpenAI SDK

Change two lines and keep your existing OpenAI client:

from openai import OpenAI

client = OpenAI(
    base_url="https://inference.provocative.earth/v1",
    api_key="pk-prov-YOUR-KEY",
)

response = client.chat.completions.create(
    model="llama-3.1-70b-instruct",
    messages=[{"role": "user", "content": "Hello!"}],
)

The full migration guide is here.

Models

We serve a curated catalog of open-weight chat and embedding models, including Llama 3.1 (8B / 70B / 405B), Qwen 2.5 Coder 32B and 72B, Mistral Small 3, DeepSeek V3, and the embedding models BGE-M3, E5-Mistral, and Nomic Embed. The full catalog lists context windows, GPU class, and per-token rates.

Next steps

  • Quickstart — first API call in under a minute
  • Migrating from OpenAI — two-line swap, full SDK compatibility
  • Pricing — per-token, batch, and reserved capacity
  • Privacy — what we log, what we don't, and where the data lives