How much cheaper are spot GPUs?

Typically 30-65% below the on-demand rate, varying by provider and supply. AWS/GCP/Azure spot H100s can be ~50-80% off; Vast.ai interruptible is often ~50% off; RunPod Community is its cheaper tier. The exact discount changes with demand.

Will my spot GPU really get reclaimed?

It can be, at any time, usually with a short warning (e.g. ~2 minutes on hyperscalers). How often depends on the GPU type, region and demand. The right defense is frequent checkpointing so a reclaim costs you minutes, not hours.

Can I serve a production API on spot GPUs?

Risky for latency-critical serving, because a reclaim drops your endpoint. A common pattern is a small on-demand baseline for reliability plus spot for burst capacity, behind a load balancer that drains reclaimed nodes gracefully.

On-demand vs spot GPUs: when to use which (and how much you save)

The single biggest lever on your GPU bill isn’t the provider — it’s whether you use on-demand or spot/community capacity. Here’s how to choose.

The savings are large

Discounted, reclaimable capacity goes by different names — AWS/GCP/Azure call it Spot, Vast.ai calls it Interruptible, RunPod’s cheaper tier is Community Cloud, and Nebius/Verda publish reserved/preemptible rates. All trade a lower price for the risk of being reclaimed.

GPU	On-demand (cheapest)	Spot/community (cheapest)	Saving
H100 SXM	~$2.30	~$1.99 (RunPod Community) / ~$0.95 (Vast interruptible)	up to ~60%
A100 80GB	~$1.35	~$0.35 (Vast interruptible)	up to ~70%
RTX 4090	~$0.34 (RunPod Community)	~$0.20 (Vast interruptible)	~40%
L40S	~$0.86	~$0.74 (Nebius reserved)	~15%

Snapshot June 2026 — prices change weekly; verify on each provider’s pricing page. See the live best spot prices ranking.

Use spot for these workloads

Checkpointed training. Save state every N steps; a reclaim costs minutes.
Batch / offline inference. Throughput matters, latency doesn’t.
Hyperparameter sweeps. Many independent short jobs; losing one is cheap.
Data preprocessing / embeddings. Embarrassingly parallel and restartable.

Use on-demand (or reserved) for these

Latency-critical online serving. A reclaim drops your endpoint.
Long unattended runs without checkpointing. A late reclaim wastes hours.
Tight deadlines where waiting for spot capacity to free up isn’t acceptable.

The hybrid pattern

For production inference, many teams run a small on-demand baseline for reliability plus spot for burst, behind a load balancer that drains reclaimed nodes. For training, reserved capacity (committed 1-12 months) can beat both if your utilization is high and steady — several providers (Nebius, Verda, Together) publish reserved rates.

Try it

Toggle on-demand vs spot in the cost calculator to see the difference on your exact job, and read how much it costs to train an LLM for a worked training example.