GPUPriceBook

On-demand vs spot GPUs: when to use which (and how much you save)

By GPUPriceBook editorial · 2026-04-15

In short: Spot/community/interruptible GPUs cost 30-65% less than on-demand but can be reclaimed with little warning. Use them for checkpointed training, batch inference, sweeps and data processing. Use on-demand (or reserved) for latency-critical online serving and long unattended runs where a mid-run reclaim is expensive.

The single biggest lever on your GPU bill isn’t the provider — it’s whether you use on-demand or spot/community capacity. Here’s how to choose.

The savings are large

Discounted, reclaimable capacity goes by different names — AWS/GCP/Azure call it Spot, Vast.ai calls it Interruptible, RunPod’s cheaper tier is Community Cloud, and Nebius/Verda publish reserved/preemptible rates. All trade a lower price for the risk of being reclaimed.

GPUOn-demand (cheapest)Spot/community (cheapest)Saving
H100 SXM~$2.30~$1.99 (RunPod Community) / ~$0.95 (Vast interruptible)up to ~60%
A100 80GB~$1.35~$0.35 (Vast interruptible)up to ~70%
RTX 4090~$0.34 (RunPod Community)~$0.20 (Vast interruptible)~40%
L40S~$0.86~$0.74 (Nebius reserved)~15%

Snapshot June 2026 — prices change weekly; verify on each provider’s pricing page. See the live best spot prices ranking.

Use spot for these workloads

Use on-demand (or reserved) for these

The hybrid pattern

For production inference, many teams run a small on-demand baseline for reliability plus spot for burst, behind a load balancer that drains reclaimed nodes. For training, reserved capacity (committed 1-12 months) can beat both if your utilization is high and steady — several providers (Nebius, Verda, Together) publish reserved rates.

Try it

Toggle on-demand vs spot in the cost calculator to see the difference on your exact job, and read how much it costs to train an LLM for a worked training example.

Frequently asked questions

How much cheaper are spot GPUs?

Typically 30-65% below the on-demand rate, varying by provider and supply. AWS/GCP/Azure spot H100s can be ~50-80% off; Vast.ai interruptible is often ~50% off; RunPod Community is its cheaper tier. The exact discount changes with demand.

Will my spot GPU really get reclaimed?

It can be, at any time, usually with a short warning (e.g. ~2 minutes on hyperscalers). How often depends on the GPU type, region and demand. The right defense is frequent checkpointing so a reclaim costs you minutes, not hours.

Can I serve a production API on spot GPUs?

Risky for latency-critical serving, because a reclaim drops your endpoint. A common pattern is a small on-demand baseline for reliability plus spot for burst capacity, behind a load balancer that drains reclaimed nodes gracefully.

Related articles

Last updated: 2026-04-15