Inference catalog

Hosted models with clear pricing

Select a model ID and send a request to the OpenRelay inference gateway. Models list their context window and input and output token rates below.

Get an API key API reference

Chat Completions
Anthropic Messages for GLM 5.2
Pay per token
No cluster setup

GPT-OSS 120B

OpenAI

openrelay/gpt-oss-120b

GPT-OSS 20B

OpenAI

openrelay/gpt-oss-20b

Best value

Context: 128K
Parameters: 20B
Input / 1M: $0.05
Output / 1M: $0.20

The smaller GPT-OSS variant, with tool calling and structured output support at lower token rates.

TextCode

Gemma 4 31B

Google

openrelay/gemma-4-31b

Vision

Context: 32K
Parameters: 31B
Input / 1M: $0.99
Output / 1M: $1.49

Google's dense 31B model with text and image input, reasoning, tool calling, and a 32K-token context window.

TextVisionReasoning

Gemma 4 31B NVFP4 32K

Google

openrelay/gemma-4-31b-nvfp4-32k

Best value

Context: 32K
Parameters: 31B
Input / 1M: $0.12
Output / 1M: $0.35

An NVFP4-quantized Gemma 4 31B deployment with a 32K-token context window and lower token rates.

TextVisionReasoning

GLM 5.2

Zhipu

openrelay/glm-5.2

Reasoning

Context: 1M
Parameters: 744B MoE
Input / 1M: $1.82
Output / 1M: $5.72

Zhipu's GLM model for reasoning, coding, and tool use in English and Chinese, with a 1M-token context window.

TextReasoningCode

DeepSeek-OCR 2

DeepSeek

openrelay/deepseek-ocr-2

OCR

Context: 8K
Parameters: -
Input / 1M: $0.039
Output / 1M: $0.039

DeepSeek's second-generation OCR model. Reads document images (scans, receipts, screenshots, tables) and returns structured markdown that preserves headings, tables, and layout.

Vision

Need implementation details for GLM 5.2? Read the model guide. Need another model? Request a model.

Two supported request formats

Use POST /v1/chat/completions for supported models. GLM 5.2 also accepts Anthropic Messages requests at POST /v1/messages. Streaming is supported on both routes.

Model ID	Context	Input / 1M	Output / 1M
GPT-OSS 120Bopenrelay/gpt-oss-120b	128K	$0.15	$0.60
GPT-OSS 20Bopenrelay/gpt-oss-20b	128K	$0.05	$0.20
Gemma 4 31Bopenrelay/gemma-4-31b	32K	$0.99	$1.49
Gemma 4 31B NVFP4 32Kopenrelay/gemma-4-31b-nvfp4-32k	32K	$0.12	$0.35
GLM 5.2openrelay/glm-5.2	1M	$1.82	$5.72
DeepSeek-OCR 2openrelay/deepseek-ocr-2	8K	$0.039	$0.039

Prices are per 1M tokens. You pay for metered input and output tokens. There is no per-hour GPU rental for hosted inference. Running a large offline job on an eligible model? Batch them at 50% off. Need dedicated GPU capacity instead? Run your own GPU cluster.

Service-level agreements

Need a guaranteed SLA for a model?

The pay-per-token catalog uses shared capacity. Contact us to discuss dedicated capacity and contractual availability or performance targets. Model and region availability depends on the requested deployment.

Availability target

Uptime terms and service credits are agreed in the contract for dedicated capacity.

Performance target

Time-to-first-token and throughput targets are sized to the selected model and traffic profile.

Support coverage

Support channels and response times are defined as part of the deployment agreement.

Request a model SLA

Or email sales@openrelay.inc.

Start calling models in minutes

Grab an API key, pick a model, and send your first request.

Get started free Read the API docs