How much does it cost to train an LLM in 2026?

By GPUPriceBook editorial · 2026-06-05

In short: Training cost = GPU-hours x price-per-GPU-hour, and GPU-hours come from compute (6 x params x tokens FLOPs) divided by realized throughput. A small fine-tune is $50-500; a 7B-from-scratch run is tens of thousands; a frontier model is millions. At ~$2.30/hr per H100, an 8xH100 node costs ~$18/hr, so the lever that matters most is GPU-hours, not which provider you pick.

“How much to train an LLM” has no single answer — it ranges from pocket change to hundreds of millions. But the math is simple, and you can estimate your own number.

The formula

Cost = GPU-hours x price per GPU-hour.

GPU-hours come from compute:

Total training FLOPs for a dense transformer ≈ 6 x parameters x training tokens (the “6ND” rule).
GPU-hours = total FLOPs / (per-GPU FLOP/s x MFU x 3600), where MFU (model FLOPs utilization) is realistically 30-55%.

An H100 delivers roughly 1,000 TFLOP/s of usable BF16/FP16 compute. At ~$2.30/hr per H100 (cheapest on Vultr), an 8xH100 node is ~$18/hr — see the cheapest 8xH100 node ranking.

Worked examples (neocloud H100 prices)

Job	Rough GPU-hours	Est. cost @ ~$2.30/H100-hr
LoRA fine-tune, 7B model, 1 GPU, 4 h	4	~$10
Full fine-tune, 13B, 8 GPUs, 6 h	48	~$110
Pretrain 1B model on 25B tokens	~1,500	~$3,500
Pretrain 7B model on 1T tokens	~150,000	~$345,000
Frontier model (proxy)	tens of millions	millions+

These assume ~40% MFU and exclude data prep, failed runs, storage and egress. Snapshot June 2026 — verify current prices before budgeting. Plug your own numbers into the cost calculator.

What actually moves the bill

GPU-hours dominate. Halving them (better data quality, higher MFU, a smaller model that’s “good enough”) saves far more than a cheaper provider.
MFU is free money. Going from 30% to 50% utilization cuts cost by ~40% with no extra hardware.
Spot for restartable phases. Checkpointed pretraining on spot/community can cut 30-65%.
Right-size the GPU. Fine-tunes rarely need an H100 — an A100 or even an inference card is often plenty.

The takeaway

Pick a cheap provider (the neoclouds are 2-4x cheaper than hyperscalers), then spend your energy on efficiency and GPU-hours — that’s where the real money is.

Frequently asked questions

What's the formula for LLM training cost?

Cost = GPU-hours x price per GPU-hour. GPU-hours = total FLOPs / (per-GPU FLOP/s x utilization x 3600). Total training FLOPs for a dense transformer is roughly 6 x parameters x training tokens (the '6ND' rule). Utilization (MFU) is typically 30-55% in practice.

How much to fine-tune an existing model?

Usually $50-500. A LoRA/QLoRA fine-tune of a 7-13B model on a single A100 or H100 for a few hours is often under $50 at neocloud prices. Full fine-tunes cost more but are still typically in the low hundreds for small models.

Why does provider choice matter less than I'd think?

Because GPU-hours dominate. Halving your GPU-hours (better data, higher MFU, smaller model) saves far more than shaving 20% off the hourly rate. Pick a cheap provider, but spend your effort on efficiency.

Last updated: 2026-06-05