Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.rixapi.com/docs/llms.txt

Use this file to discover all available pages before exploring further.

Overview

/v1/models/pricing is a public, unauthenticated aggregation endpoint that returns every model currently available on the platform along with its published price. The response strictly follows the OpenRouter Providers specification, so it can be ingested by OpenRouter, third-party price aggregators, or any custom client.
This endpoint is the “published price” entry point and carries no user identity. Prices returned represent the final cost for the default group in an anonymous context. To see the price an authenticated user actually pays (group, tier discount, agent markup, etc.), use /api/pricing instead.

When to use it

Use caseDescription
Third-party price aggregationOpenRouter and other aggregators scraping the model catalog and published prices
Client displayApps, bots, SDKs that need to render the model list
MonitoringTrack changes to the model catalog or pricing over time

Endpoint

ItemValue
URLhttps://api.ephone.ai/v1/models/pricing
MethodGET
AuthNone (public)
Rate limitStrict per-IP critical rate limit
Caching60s in-process cache; responses include Cache-Control: public, max-age=60

Request examples

curl https://api.ephone.ai/v1/models/pricing

Response structure

The top level is always wrapped in a data array:
{
  "data": [
    {
      "id": "gpt-4o",
      "name": "GPT-4o",
      "created": 1715558400,
      "input_modalities": ["text", "image"],
      "output_modalities": ["text"],
      "quantization": "unknown",
      "context_length": 128000,
      "max_output_length": 16384,
      "pricing": {
        "prompt": "0.0000025",
        "completion": "0.00001",
        "request": "0",
        "image": "0",
        "input_cache_read": "0.00000125"
      },
      "supported_sampling_parameters": [
        "temperature", "top_p", "frequency_penalty", "presence_penalty",
        "stop", "seed", "max_tokens", "logit_bias", "response_format",
        "tools", "tool_choice", "parallel_tool_calls", "logprobs", "top_logprobs"
      ],
      "supported_features": ["tools", "json_mode", "structured_outputs", "logprobs"]
    }
  ]
}

Top-level field

FieldTypeDescription
dataArray<Model>Array of model entries; empty array means no models are currently published

Model fields

FieldTypeRequiredDescription
idstringUnique model identifier; use this value when calling /v1/chat/completions
namestringDisplay name (e.g. GPT-4o); falls back to id when not configured
createdint64Model release time (Unix seconds)
input_modalitiesstring[]Supported input modalities; includes at least text
output_modalitiesstring[]Supported output modalities; includes at least text
quantizationstringQuantization (fp16 / fp8 / bf16 / int8 / unknown)
context_lengthintMaximum context window in tokens
max_output_lengthintMaximum output tokens per response
pricingPricingPricing object; see below
pricing_tiersPricing[]Tiered pricing array (at most one element; populated when an admin explicitly tags a Condition with a threshold)
supported_sampling_parametersstring[]Supported sampling parameters (e.g. temperature, top_p)
supported_featuresstring[]Supported feature flags (see table below)
deprecation_datestringPlanned deprecation date (ISO 8601, YYYY-MM-DD)
hugging_face_idstringHugging Face repository ID for open-weight models

Pricing fields

All prices are USD strings. Token-class fields are priced per token; flat-class fields use the unit listed below.
FieldTypeRequiredUnitDescription
promptstringUSD / input tokenInput token unit price
completionstringUSD / output tokenOutput token unit price
requeststringUSD / requestFlat per-request fee (used by call-billed models)
imagestringUSD / imagePer-image price (image generation models)
audiostringUSD / audioPer-audio-segment price
videostringUSD / videoPer-video-segment price
web_searchstringUSD / queryBuilt-in web search tool fee
internal_reasoningstringUSD / reasoning tokenReasoning token price (o1 / Claude thinking)
input_cache_readstringUSD / tokenCache-hit read price
input_cache_writestringUSD / tokenCache write price (most expensive window is published)
min_contextinttokensOnly valid inside pricing_tiers; minimum input tokens that trigger this tier
  • The four required fields (prompt / completion / request / image) are always present, possibly as "0". Optional fields are omitted when not configured.
  • A value of "0" may mean either “free” or “not applicable to this model”. Combine with supported_features and supported_sampling_parameters to infer capability instead of relying on pricing alone.

supported_features enum

ValueMeaning
toolsFunction calling / tool use
reasoningReturns reasoning tokens (reasoning_content / thinking)
json_modeSupports JSON mode output
structured_outputsSupports schema-constrained structured outputs
logprobsReturns token log probabilities
web_searchBuilt-in web search tool

How the three billing modes map

The platform supports three internal billing modes (see Pricing). All three are losslessly mapped to OpenRouter fields:
Internal modeOutput field(s)Notes
Per tokenprompt / completion / internal_reasoning / input_cache_read / input_cache_write / web_searchText and embedding models; per-million prices are converted to per-token strings
Per callimage / video / audio / requestRouted by the call unit: image generation → image, video → video, music/speech → audio, otherwise → request
Per secondrequest / video / audioOpenRouter has no per_second field; the platform publishes per_second × typical_duration (default 30s) as the per-request price. Actual billing still uses real seconds, so reconciliation is exact.

How the price is computed

The published price is:
final_usd = base_price × group_ratio × currency_factor
  • base_price: the official price configured by the admin
  • group_ratio: the ratio of the default group
  • currency_factor: if the base is configured in CNY it is divided by the FX rate; if already USD it is left unchanged
  • Not applied: agent markups, user tier discounts, per-user model multipliers, time-of-day rules, runtime dynamic factors
Authenticated users may pay a different price. Pricing returned here always reflects the anonymous “default group” view. Custom group, tier discount, or agent-domain markups are layered on top per request and are intentionally hidden from this public endpoint.

Tiered pricing

Some models charge a different unit price above a context-length threshold (e.g. Gemini 1.5 / 2.5 Pro doubles at ≥128K tokens). The upper tier is exposed via pricing_tiers:
{
  "id": "gemini-1.5-pro",
  "pricing": {
    "prompt": "0.00000125",
    "completion": "0.000005",
    "request": "0",
    "image": "0"
  },
  "pricing_tiers": [
    {
      "min_context": 128000,
      "prompt": "0.0000025",
      "completion": "0.00001",
      "request": "0",
      "image": "0"
    }
  ]
}
  • pricing is the base tier (the Prompt Tier with the lowest threshold)
  • pricing_tiers[i] is the upper tier; applied when total input tokens ≥ min_context
  • At most one upper tier is returned (matching OpenRouter’s 2-tier maximum)

Data source: structured Prompt Tiers

The admin panel configures Prompt Tiers for each model. Each tier has two fields:
FieldMeaning
min_prompt_tokensTrigger threshold (incl. cache and multimodal tokens)
priceFull PriceData for that tier (input/output/cache/multimodal unit prices)
Runtime billing and public exposure share the same data, ensuring published price ≡ billed price so OpenRouter monthly reconciliation never drifts.

Matching dimension

The threshold is matched against total input tokens, including cache and multimodal:
TotalPromptTokensForTier =
    plain text input
  + text cache (cache_read / cache_create / 5m / 1h)
  + image input + image cache_read
  + audio input + audio cache_read
  + video input + video cache_read
This aligns with OpenRouter min_context and Gemini/Claude official tiered docs — total context length defines the tier; otherwise users could bypass tiered pricing by going fully through cache.

Matching rule

Largest-threshold-wins: among all tiers satisfying total >= min, the one with the highest threshold is selected. The full price of the matched tier is applied (no progressive stepping, matching how official providers price). If no tier is matched, the base tier (lowest threshold) is used.

Conditions that cannot be tiered

OpenRouter only models the “input token count” dimension. The following scenarios are still billed by the general rule engine (Conditions) but are not published via pricing_tiers — OpenRouter only sees the base price:
  • Image quality (e.g. request.body.quality == '1080p')
  • Video duration (e.g. request.body.duration > 30)
  • Response-driven conditions (e.g. response.body.usage.reasoning_tokens > 100)
  • Compound conditions (&& / || combinations)
For these models, configure an unconditional baseline price under the default group as the publishable rate.

Caching and update cadence

AspectBehavior
Server cache60-second in-process cache; admin updates take effect within 1 minute
Client cacheCache-Control: public, max-age=60; CDN and browser caches are safe to use
Recommended pollingAt most once every 5 minutes; aggressive polling is rate-limited

Comparison with other endpoints

EndpointPurposeAuth
/v1/models/pricingThis endpoint: public pricing in OpenRouter formatPublic
/v1/modelsOpenAI-SDK-compatible model id listToken required
/api/pricingFront-end pricing page (user-aware)Optional token

FAQ

The published price reflects the default group in an anonymous context. If you belong to a different group (e.g. pro / premium), have tier discounts, or access through an agent domain with markup applied, your real price will differ. Authenticated users should call /api/pricing to fetch their own effective price.
"0" can mean either “free” or “this dimension does not apply to the model”. Always combine supported_features and supported_sampling_parameters to infer model capability — don’t rely on pricing alone.
OpenRouter has no per_second field, so the platform publishes per_second × typical_duration (admin-configured, default 30 seconds) as the request / video / audio price. Actual billing still uses real seconds, so OpenRouter’s monthly reconciliation always matches what we charged.
Yes — a strict per-IP critical rate limit. Best practices:
  • Poll no more than once every 5 minutes
  • Honor Cache-Control: max-age=60 and cache client-side
  • For long-term monitoring, subscribe to platform announcements rather than polling aggressively
No query parameters are supported right now — the endpoint always returns all healthy models. Filter client-side using supported_features / input_modalities after fetching.