Model Catalog

All models are available at https://inference.provocative.earth/v1 and listed via GET /v1/models.

Chat & Completion Models

Llama 3.1 8B Instruct


Model ID	`llama-3.1-8b-instruct`
Parameters	8B
Context window	128k (limited to 8k default, request increase)
Strengths	Fast, cheap. Classification, routing, simple chat, structured extraction.
GPU	1x RTX 5090 (32GB)
LoRA	Supported
Pricing	$0.10 / $0.15 per 1M tokens (input/output)

Llama 3.1 70B Instruct


Model ID	`llama-3.1-70b-instruct`
Parameters	70B
Context window	128k (limited to 8k default)
Strengths	Flagship general-purpose. Comparable to GPT-4o on most benchmarks. Tool calling, reasoning, long-form generation.
GPU	1x H100 80GB (FP8) or 2x RTX PRO 6000 96GB (FP16, TP=2)
LoRA	Supported
Pricing	$0.55 / $0.75 per 1M tokens

Qwen 2.5 72B Instruct


Model ID	`qwen-2.5-72b-instruct`
Parameters	72B
Context window	32k
Strengths	Strong multilingual (Chinese, Japanese, Korean, European languages). Excellent at coding.
GPU	1x H100 80GB (FP8)
LoRA	Supported
Pricing	$0.55 / $0.75 per 1M tokens

Qwen 2.5 Coder 32B


Model ID	`qwen-2.5-coder-32b`
Parameters	32B
Context window	32k
Strengths	Code generation, completion, and review. Optimized for IDE integration.
GPU	1x H100 80GB or 1x RTX PRO 6000 96GB
LoRA	Supported
Pricing	$0.30 / $0.45 per 1M tokens

Mistral Small 3 24B


Model ID	`mistral-small-3-24b`
Parameters	24B
Context window	32k
Strengths	Mid-tier general purpose. Good balance of quality and cost.
GPU	1x RTX PRO 6000 96GB or 1x A100 80GB
LoRA	Supported
Pricing	$0.20 / $0.30 per 1M tokens

DeepSeek V3


Model ID	`deepseek-v3`
Parameters	671B (MoE, ~37B active)
Context window	64k
Strengths	Top-tier reasoning. Mixture-of-experts architecture — frontier quality at non-frontier cost.
GPU	8x H100 80GB
LoRA	Not supported (MoE architecture)
Pricing	$0.50 / $1.20 per 1M tokens
Availability	Reserved tier only

Llama 3.1 405B Instruct


Model ID	`llama-3.1-405b-instruct`
Parameters	405B
Context window	128k (limited to 8k default)
Strengths	Largest open-weight model. Premium quality for complex reasoning, long documents.
GPU	8x H100 80GB (FP8, TP=8)
LoRA	Not supported at this scale
Pricing	$2.50 / $3.50 per 1M tokens
Availability	Reserved tier only

Embedding Models

BGE-M3


Model ID	`bge-m3`
Parameters	560M
Dimensions	1024
Strengths	Best multilingual embedding model. Supports 100+ languages. Dense, sparse, and ColBERT retrieval in one model.
GPU	1x RTX 5090
Pricing	$0.02 per 1M tokens

E5-Mistral 7B Instruct


Model ID	`e5-mistral-7b-instruct`
Parameters	7B
Dimensions	4096
Strengths	Instruction-tuned embeddings. Higher quality than BGE-M3 for English.
GPU	1x RTX 5090
Pricing	$0.05 per 1M tokens

Nomic Embed v1.5


Model ID	`nomic-embed-v1.5`
Parameters	137M
Dimensions	768
Strengths	Fastest, cheapest. Good for high-volume, latency-sensitive retrieval.
GPU	1x RTX 5090
Pricing	$0.02 per 1M tokens

Requesting models not in the catalog

Shared tier: use what's listed above.

Reserved/dedicated tier: request any model in a supported architecture (Llama, Mistral, Qwen, Gemma families). We evaluate and deploy on your capacity within 1 business day. Contact your account manager or open a request in the dashboard.