Backed by Y Combinator

Hosted inferenceand GPU compute.

Call hosted models through OpenAI or Anthropic request formats, or launch dedicated GPU VMs.

Start free

Built for teams that outgrew the hyperscaler bill.

The open network for inference at scale.

User request18msNearest healthy node

HealthyBusyDown

Open network
A global mesh of GPUs, all in one network.
Automatic failover
Traffic reroutes in milliseconds.
Instant load balancing
Requests spread across every available node.
Uptime SLA available
Committed uptime, latency, and throughput on dedicated capacity.

Deploy with one API call

OpenAI-compatible. No code changes.

1client = OpenAI(
2  base_url="https://inference.openrelay.inc/v1",
3  api_key="sk-••••••••••••"
4)

Two ways to ship. Both live in under two minutes.

Rent dedicated GPUs or call any model through our OpenAI-compatible API.

GPU compute

Rent a GPU in seconds

1
Select capacity
Pick a GPU type, VRAM, region, and hourly price from available cards across the network.
2
Boot a machine
Choose an image (Ubuntu 24.04), attach a persistent volume, and launch the runtime.
3
Own the loop
SSH in with full control, install your stack, then monitor and scale as you go.

Ready in ~47sDeploy now

Inference endpoint

Or just call a model

1
Select a model
Use a hosted open model — no node setup, no cold-start wrangling.
2
Point your SDK
Swap the base URL and keep your existing OpenAI-compatible client and tools.
3
Serve traffic
Send requests; routing, autoscaling, and failover are handled for you.

>base_url="https://inference.openrelay.inc/v1"

View docs

Pay by the hour.
Nothing else.

Dedicated NVIDIA GPUs with SSH access and a persistent volume. Deposit $5 to get $10 and start deploying.

Best value

RTX 4090

$0.29/hr

24 GB VRAM

Deploy RTX 4090

RTX 5090

$0.44/hr

32 GB VRAM

Deploy RTX 5090

A100

from$0.80/hr

40 GB VRAM$0.80/hr
80 GB VRAM$1.10/hr

Deploy A100

H100

$2.60/hr

80 GB VRAM

Deploy H100

Up to 90% cheaper than AWS, Azure, and GCP.

API & SDK

Call hosted models from your own stack.

Send chat requests to the inference gateway. Use the model ID from the catalog and your organization API key.

Two request formats

Use OpenAI Chat Completions or Anthropic Messages with supported models.

Metered by tokens

The gateway authenticates, routes, streams, and records usage for each request.

openrelay ~ api

# Call GLM 5.2 with Chat Completions
$ curl https://inference.openrelay.inc/v1/chat/completions \
  -H "Authorization: Bearer $OPENRELAY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"openrelay/glm-5.2","messages":[{"role":"user","content":"Explain this stack trace."}]}'
# Response
{
  "object": "chat.completion",
  "model": "openrelay/glm-5.2",
  "choices": [...]
}

View full API docs

Ready to build?

Get your API key and start deploying in minutes.

$export OPENRELAY_API_KEY=vl_••••••••••••••

Create account

FAQ

Questions we hear often.

How long does deployment take?

Provisioning time depends on the hardware and image you choose. The dashboard reports each stage until your VM is ready. Hosted models need no deployment.

Do I need to change my code?

VM workloads run as standard containers. For hosted inference, OpenRelay supports OpenAI Chat Completions and Anthropic Messages request formats.

How does billing work?

GPU VMs accrue usage at the displayed hourly rate while they run. Hosted inference is metered by input and output tokens.

What happens when a node fails?

Hosted inference routes new requests across healthy backends. VMs are not automatically migrated; failed capacity must be restarted or replaced.

Do I need a credit card to start?

New users receive $1 in welcome credit. Continued usage requires a card and deposit. A first deposit of $5 or more receives a one-time $5 match.

Can I contribute my own GPUs?

You can apply to become a provider. Approved providers enroll supported GPU or CPU nodes and can track usage and earnings in the dashboard.

Start relaying compute today.

An OpenAI-compatible endpoint and GPUs from $0.18/hr. No contract, no minimums — deposit $5 to get $10 and start deploying.

Get started

Hosted inferenceand GPU compute.

Built for teams that outgrew the hyperscaler bill.

Open network

Automatic failover

Instant load balancing

Uptime SLA available