GPUPriceBook

How much does it cost to train an LLM in 2026?

By GPUPriceBook editorial · 2026-06-05

In short: Training cost = GPU-hours x price-per-GPU-hour, and GPU-hours come from compute (6 x params x tokens FLOPs) divided by realized throughput. A small fine-tune is $50-500; a 7B-from-scratch run is tens of thousands; a frontier model is millions. At ~$2.30/hr per H100, an 8xH100 node costs ~$18/hr, so the lever that matters most is GPU-hours, not which provider you pick.

“How much to train an LLM” has no single answer — it ranges from pocket change to hundreds of millions. But the math is simple, and you can estimate your own number.

The formula

Cost = GPU-hours x price per GPU-hour.

GPU-hours come from compute:

An H100 delivers roughly 1,000 TFLOP/s of usable BF16/FP16 compute. At ~$2.30/hr per H100 (cheapest on Vultr), an 8xH100 node is ~$18/hr — see the cheapest 8xH100 node ranking.

Worked examples (neocloud H100 prices)

JobRough GPU-hoursEst. cost @ ~$2.30/H100-hr
LoRA fine-tune, 7B model, 1 GPU, 4 h4~$10
Full fine-tune, 13B, 8 GPUs, 6 h48~$110
Pretrain 1B model on 25B tokens~1,500~$3,500
Pretrain 7B model on 1T tokens~150,000~$345,000
Frontier model (proxy)tens of millionsmillions+

These assume ~40% MFU and exclude data prep, failed runs, storage and egress. Snapshot June 2026 — verify current prices before budgeting. Plug your own numbers into the cost calculator.

What actually moves the bill

  1. GPU-hours dominate. Halving them (better data quality, higher MFU, a smaller model that’s “good enough”) saves far more than a cheaper provider.
  2. MFU is free money. Going from 30% to 50% utilization cuts cost by ~40% with no extra hardware.
  3. Spot for restartable phases. Checkpointed pretraining on spot/community can cut 30-65%.
  4. Right-size the GPU. Fine-tunes rarely need an H100 — an A100 or even an inference card is often plenty.

The takeaway

Pick a cheap provider (the neoclouds are 2-4x cheaper than hyperscalers), then spend your energy on efficiency and GPU-hours — that’s where the real money is.

Frequently asked questions

What's the formula for LLM training cost?

Cost = GPU-hours x price per GPU-hour. GPU-hours = total FLOPs / (per-GPU FLOP/s x utilization x 3600). Total training FLOPs for a dense transformer is roughly 6 x parameters x training tokens (the '6ND' rule). Utilization (MFU) is typically 30-55% in practice.

How much to fine-tune an existing model?

Usually $50-500. A LoRA/QLoRA fine-tune of a 7-13B model on a single A100 or H100 for a few hours is often under $50 at neocloud prices. Full fine-tunes cost more but are still typically in the low hundreds for small models.

Why does provider choice matter less than I'd think?

Because GPU-hours dominate. Halving your GPU-hours (better data, higher MFU, smaller model) saves far more than shaving 20% off the hourly rate. Pick a cheap provider, but spend your effort on efficiency.

Related articles

Last updated: 2026-06-05