GPUPriceBook

Cheapest GPUs for inference

For AI inference you rarely need a flagship H100. The cheapest inference-class GPU we track is the NVIDIA L4 at $0.33/hr on Vast.ai. The L4 (24GB, 72W) gives the best cost-per-token for steady serving; the L40S (48GB) adds throughput for larger models; the A6000 and RTX 4090 are budget VRAM workhorses on marketplace clouds. Ranked below across every provider.

Source: Provider pricing pages. Data as of June 2026.

Inference GPUs ranked cheapest first

RankGPUProviderVRAMOn-demand /hr
#1L4Vast.ai24GB GDDR6$0.33/hr
#2L4RunPod24GB GDDR6$0.39/hr
#3A6000RunPod48GB GDDR6$0.49/hr
#4A6000Hyperstack48GB GDDR6$0.50/hr
#5RTX 6000 AdaLambda48GB GDDR6$0.50/hr
#6RTX 4090Vast.ai24GB GDDR6X$0.52/hr
#7A6000Verda (formerly DataCrunch)48GB GDDR6$0.61/hr
#8RTX 4090RunPod24GB GDDR6X$0.69/hr
#9L4Google Cloud24GB GDDR6$0.71/hr
#10RTX 6000 AdaRunPod48GB GDDR6$0.77/hr
#11A6000Lambda48GB GDDR6$0.80/hr
#12L4AWS EC224GB GDDR6$0.81/hr
#13L40SRunPod48GB GDDR6$0.86/hr
#14RTX 6000 AdaVerda (formerly DataCrunch)48GB GDDR6$1.04/hr
#15L40SVerda (formerly DataCrunch)48GB GDDR6$1.37/hr
#16L40SCrusoe48GB GDDR6$1.50/hr
#17L40SNebius48GB GDDR6$1.55/hr
#18L40SPaperspace (DigitalOcean)48GB GDDR6$1.57/hr
#19RTX 6000 AdaPaperspace (DigitalOcean)48GB GDDR6$1.57/hr
#20L40SVultr Cloud GPU48GB GDDR6$1.67/hr
#21RTX 6000 AdaHyperstack48GB GDDR6$1.80/hr
#22RTX 6000 AdaNebius48GB GDDR6$1.80/hr
#23L40SAWS EC248GB GDDR6$1.86/hr
#24A6000Paperspace (DigitalOcean)48GB GDDR6$1.89/hr
#25L40SCoreWeave48GB GDDR6$2.25/hr
#26RTX 6000 AdaCoreWeave48GB GDDR6$2.50/hr

Source: Provider pricing pages. Data as of June 2026.

Snapshot June 2026 — cloud GPU prices change weekly; verify on the provider's pricing page before relying on a figure.

Frequently asked questions

What is the cheapest GPU for AI inference?

The cheapest inference-class GPU we track is the NVIDIA L4 at $0.33/hr on Vast.ai. For steady, high-volume serving the L4 (24GB, 72W) is usually the best cost-per-token; the L40S adds throughput and 48GB for bigger models. Snapshot June 2026 — cloud GPU prices change weekly; verify on the provider's pricing page before relying on a figure.

L4 vs L40S vs A6000 for inference?

L4 (24GB) is the cheapest and most power-efficient for steady serving of small/medium models. L40S (48GB) is much faster and fits bigger models, at a higher hourly rate. The A6000/RTX 6000 Ada (48GB) are good value when you mainly need lots of VRAM. Match VRAM to your model size first, then optimise cost.

Can I use an RTX 4090 for inference?

Yes — the RTX 4090 (24GB) is excellent value for inference and Stable Diffusion on community/marketplace clouds (RunPod Community, Vast.ai). It is a consumer card, so it is rarely offered by enterprise providers and has no ECC, but per dollar it is hard to beat for single-GPU serving.

Should inference run on spot GPUs?

For batch/offline inference, yes — spot cuts cost ~50%. For latency-sensitive online serving, prefer on-demand or reserved so your endpoint is not reclaimed mid-request; or run a small on-demand baseline plus spot for burst.

Keep exploring

Last updated: 2026-06-21