Some GPUs on the network are passed through to VMs in indivisible groups — paired consumer GPUs, for example, come as 2-GPU units. Previously, requesting a GPU count that didn't match that granularity (like a single GPU on a paired-GPU fleet) could never be placed: the request sat in the queue silently and only surfaced as a failure at the deploy timeout, 45 minutes later.
We've fixed this at every layer:
- Availability API —
/v1/gpu-availabilitynow reports a per-modelgroupSize, the smallest GPU increment that model can be allocated in. - Fail-fast validation — creating a VM or cluster with a GPU count that isn't a multiple of the model's
groupSizeis now rejected immediately with a clear 400 error listing the valid counts, instead of being accepted and timing out later. - Dashboard wizard — the create-VM flow now shows a GPU count dropdown containing only counts that can actually be placed, and defaults to the first GPU model with free capacity.
If you use the API directly, check the new groupSize field on /v1/gpu-availability when choosing a GPU count. In the dashboard, nothing to do — invalid options simply no longer appear.
Try it on OpenRelay
Rent GPUs by the hour and deploy fault-tolerant inference in minutes.
Get started free