Skip to content

Model Catalog

All models are available at https://inference.provocative.earth/v1 and listed via GET /v1/models.

Chat & Completion Models

Llama 3.1 8B Instruct

Model ID llama-3.1-8b-instruct
Parameters 8B
Context window 128k (limited to 8k default, request increase)
Strengths Fast, cheap. Classification, routing, simple chat, structured extraction.
GPU 1x RTX 5090 (32GB)
LoRA Supported
Pricing $0.10 / $0.15 per 1M tokens (input/output)

Llama 3.1 70B Instruct

Model ID llama-3.1-70b-instruct
Parameters 70B
Context window 128k (limited to 8k default)
Strengths Flagship general-purpose. Comparable to GPT-4o on most benchmarks. Tool calling, reasoning, long-form generation.
GPU 1x H100 80GB (FP8) or 2x RTX PRO 6000 96GB (FP16, TP=2)
LoRA Supported
Pricing $0.55 / $0.75 per 1M tokens

Qwen 2.5 72B Instruct

Model ID qwen-2.5-72b-instruct
Parameters 72B
Context window 32k
Strengths Strong multilingual (Chinese, Japanese, Korean, European languages). Excellent at coding.
GPU 1x H100 80GB (FP8)
LoRA Supported
Pricing $0.55 / $0.75 per 1M tokens

Qwen 2.5 Coder 32B

Model ID qwen-2.5-coder-32b
Parameters 32B
Context window 32k
Strengths Code generation, completion, and review. Optimized for IDE integration.
GPU 1x H100 80GB or 1x RTX PRO 6000 96GB
LoRA Supported
Pricing $0.30 / $0.45 per 1M tokens

Mistral Small 3 24B

Model ID mistral-small-3-24b
Parameters 24B
Context window 32k
Strengths Mid-tier general purpose. Good balance of quality and cost.
GPU 1x RTX PRO 6000 96GB or 1x A100 80GB
LoRA Supported
Pricing $0.20 / $0.30 per 1M tokens

DeepSeek V3

Model ID deepseek-v3
Parameters 671B (MoE, ~37B active)
Context window 64k
Strengths Top-tier reasoning. Mixture-of-experts architecture — frontier quality at non-frontier cost.
GPU 8x H100 80GB
LoRA Not supported (MoE architecture)
Pricing $0.50 / $1.20 per 1M tokens
Availability Reserved tier only

Llama 3.1 405B Instruct

Model ID llama-3.1-405b-instruct
Parameters 405B
Context window 128k (limited to 8k default)
Strengths Largest open-weight model. Premium quality for complex reasoning, long documents.
GPU 8x H100 80GB (FP8, TP=8)
LoRA Not supported at this scale
Pricing $2.50 / $3.50 per 1M tokens
Availability Reserved tier only

Embedding Models

BGE-M3

Model ID bge-m3
Parameters 560M
Dimensions 1024
Strengths Best multilingual embedding model. Supports 100+ languages. Dense, sparse, and ColBERT retrieval in one model.
GPU 1x RTX 5090
Pricing $0.02 per 1M tokens

E5-Mistral 7B Instruct

Model ID e5-mistral-7b-instruct
Parameters 7B
Dimensions 4096
Strengths Instruction-tuned embeddings. Higher quality than BGE-M3 for English.
GPU 1x RTX 5090
Pricing $0.05 per 1M tokens

Nomic Embed v1.5

Model ID nomic-embed-v1.5
Parameters 137M
Dimensions 768
Strengths Fastest, cheapest. Good for high-volume, latency-sensitive retrieval.
GPU 1x RTX 5090
Pricing $0.02 per 1M tokens

Requesting models not in the catalog

Shared tier: use what's listed above.

Reserved/dedicated tier: request any model in a supported architecture (Llama, Mistral, Qwen, Gemma families). We evaluate and deploy on your capacity within 1 business day. Contact your account manager or open a request in the dashboard.