Backed by Y Combinator

Inference from anywhere.Lower costs.

Production GPU inference across a distributed network — routed to the nearest healthy node, up to 90% cheaper than the hyperscalers.

Built for teams that outgrew the hyperscaler bill.

The open network for inference at scale.

User request18msNearest healthy node
HealthyBusyDown

Deploy with one API call

OpenAI-compatible. No code changes.

1client = OpenAI(
2 base_url="https://api.openrelay.ai/v1",
3 api_key="sk-••••••••••••"
4)

Inference in production for

YC StartupsResearch LabsAI StudiosEnterprise

Two ways to ship. Both live in under two minutes.

Rent dedicated GPUs or call any model through our OpenAI-compatible API.

GPU compute

Rent a GPU in seconds

  1. 1

    Select capacity

    Pick a GPU type, VRAM, region, and hourly price from available cards across the network.

  2. 2

    Boot a machine

    Choose an image (Ubuntu 24.04), attach a persistent volume, and launch the runtime.

  3. 3

    Own the loop

    SSH in with full control, install your stack, then monitor and scale as you go.


Ready in ~47sDeploy now

Inference endpoint

Or just call a model

  1. 1

    Select a model

    Use a hosted open model — no node setup, no cold-start wrangling.

  2. 2

    Point your SDK

    Swap the base URL and keep your existing OpenAI-compatible client and tools.

  3. 3

    Serve traffic

    Send requests; routing, autoscaling, and failover are handled for you.


>base_url="https://api.openrelay.ai/v1"
View docs

Pay by the hour.
Nothing else.

Dedicated NVIDIA GPUs with SSH access and a persistent volume. Free credits for new accounts — no credit card required.

Up to 90% cheaper than AWS, Azure, and GCP.

API & SDK

Control OpenRelay from your own stack.

OpenAI-compatible API. GPU-native. Developer and agent friendly.

OpenAI-compatible

Drop-in compatible with the OpenAI API.

Simple & powerful

Create clusters, scale on demand, and manage everything via API.

openrelay ~ api
# Create a GPU cluster
$ curl -X POST https://api.openrelay.ai/v1/clusters \
-H "Authorization: Bearer $OPENRELAY_API_KEY" \
-d '{"gpu":"h100","image":"vllm/vllm-openai","count":1}'
# Response
{
"status": "running",
"cluster_id": "clu_01H7X8ZJ7P2Q9KJY4V8B3YN9T",
"endpoint": "https://clu_01H7X8ZJ7P2Q9KJY4V8B3YN9T.api.openrelay.ai/v1",
"created_at": "2024-05-20T14:32:18Z"
}

Ready to build?

Get your API key and start deploying in minutes.

$export OPENRELAY_API_KEY=or_••••••••••••••
Create account

FAQ

Answers before you ask.

How fast can I deploy?

Under two minutes — pick capacity, boot the machine, and SSH in, or point your SDK at a hosted model.

Do I need to change my code?

No. Workloads run as standard Docker containers, and the inference API is OpenAI-compatible — swap the base URL and keep your client.

How does billing work?

Pay per hour. No contracts, no minimums. Volume discounts and reserved pricing are available on request.

What happens when a node fails?

Traffic reroutes to a healthy machine within milliseconds, and requests keep balancing across available nodes.

Do I need a credit card to start?

No. New accounts get free credits with no card required — claim them and start deploying.

Can I contribute my own GPUs?

Yes. OpenRelay is an open network — anyone can join as a provider and earn on idle compute.

Start relaying compute today.

Free credits, an OpenAI-compatible endpoint, and GPUs from $0.19/hr. No contract, no minimums, no credit card required.