Model Catalog
All models are available at https://inference.provocative.earth/v1 and listed via GET /v1/models.
Chat & Completion Models
Llama 3.1 8B Instruct
|
|
| Model ID |
llama-3.1-8b-instruct |
| Parameters |
8B |
| Context window |
128k (limited to 8k default, request increase) |
| Strengths |
Fast, cheap. Classification, routing, simple chat, structured extraction. |
| GPU |
1x RTX 5090 (32GB) |
| LoRA |
Supported |
| Pricing |
$0.10 / $0.15 per 1M tokens (input/output) |
Llama 3.1 70B Instruct
|
|
| Model ID |
llama-3.1-70b-instruct |
| Parameters |
70B |
| Context window |
128k (limited to 8k default) |
| Strengths |
Flagship general-purpose. Comparable to GPT-4o on most benchmarks. Tool calling, reasoning, long-form generation. |
| GPU |
1x H100 80GB (FP8) or 2x RTX PRO 6000 96GB (FP16, TP=2) |
| LoRA |
Supported |
| Pricing |
$0.55 / $0.75 per 1M tokens |
Qwen 2.5 72B Instruct
|
|
| Model ID |
qwen-2.5-72b-instruct |
| Parameters |
72B |
| Context window |
32k |
| Strengths |
Strong multilingual (Chinese, Japanese, Korean, European languages). Excellent at coding. |
| GPU |
1x H100 80GB (FP8) |
| LoRA |
Supported |
| Pricing |
$0.55 / $0.75 per 1M tokens |
Qwen 2.5 Coder 32B
|
|
| Model ID |
qwen-2.5-coder-32b |
| Parameters |
32B |
| Context window |
32k |
| Strengths |
Code generation, completion, and review. Optimized for IDE integration. |
| GPU |
1x H100 80GB or 1x RTX PRO 6000 96GB |
| LoRA |
Supported |
| Pricing |
$0.30 / $0.45 per 1M tokens |
Mistral Small 3 24B
|
|
| Model ID |
mistral-small-3-24b |
| Parameters |
24B |
| Context window |
32k |
| Strengths |
Mid-tier general purpose. Good balance of quality and cost. |
| GPU |
1x RTX PRO 6000 96GB or 1x A100 80GB |
| LoRA |
Supported |
| Pricing |
$0.20 / $0.30 per 1M tokens |
DeepSeek V3
|
|
| Model ID |
deepseek-v3 |
| Parameters |
671B (MoE, ~37B active) |
| Context window |
64k |
| Strengths |
Top-tier reasoning. Mixture-of-experts architecture — frontier quality at non-frontier cost. |
| GPU |
8x H100 80GB |
| LoRA |
Not supported (MoE architecture) |
| Pricing |
$0.50 / $1.20 per 1M tokens |
| Availability |
Reserved tier only |
Llama 3.1 405B Instruct
|
|
| Model ID |
llama-3.1-405b-instruct |
| Parameters |
405B |
| Context window |
128k (limited to 8k default) |
| Strengths |
Largest open-weight model. Premium quality for complex reasoning, long documents. |
| GPU |
8x H100 80GB (FP8, TP=8) |
| LoRA |
Not supported at this scale |
| Pricing |
$2.50 / $3.50 per 1M tokens |
| Availability |
Reserved tier only |
Embedding Models
BGE-M3
|
|
| Model ID |
bge-m3 |
| Parameters |
560M |
| Dimensions |
1024 |
| Strengths |
Best multilingual embedding model. Supports 100+ languages. Dense, sparse, and ColBERT retrieval in one model. |
| GPU |
1x RTX 5090 |
| Pricing |
$0.02 per 1M tokens |
E5-Mistral 7B Instruct
|
|
| Model ID |
e5-mistral-7b-instruct |
| Parameters |
7B |
| Dimensions |
4096 |
| Strengths |
Instruction-tuned embeddings. Higher quality than BGE-M3 for English. |
| GPU |
1x RTX 5090 |
| Pricing |
$0.05 per 1M tokens |
Nomic Embed v1.5
|
|
| Model ID |
nomic-embed-v1.5 |
| Parameters |
137M |
| Dimensions |
768 |
| Strengths |
Fastest, cheapest. Good for high-volume, latency-sensitive retrieval. |
| GPU |
1x RTX 5090 |
| Pricing |
$0.02 per 1M tokens |
Requesting models not in the catalog
Shared tier: use what's listed above.
Reserved/dedicated tier: request any model in a supported architecture (Llama, Mistral, Qwen, Gemma families). We evaluate and deploy on your capacity within 1 business day. Contact your account manager or open a request in the dashboard.