Models & Pricing API - ePhone AI Docs

Overview

/v1/models/pricing is a public, unauthenticated aggregation endpoint that returns every model currently available on the platform along with its published price. The response strictly follows the OpenRouter Providers specification, so it can be ingested by OpenRouter, third-party price aggregators, or any custom client.

This endpoint is the “published price” entry point and carries no user identity. Prices returned represent the final cost for the default group in an anonymous context. To see the price an authenticated user actually pays (group, tier discount, agent markup, etc.), use /api/pricing instead.

When to use it

Use case	Description
Third-party price aggregation	OpenRouter and other aggregators scraping the model catalog and published prices
Client display	Apps, bots, SDKs that need to render the model list
Monitoring	Track changes to the model catalog or pricing over time

Endpoint

Item	Value
URL	`https://api.ephone.ai/v1/models/pricing`
Method	`GET`
Auth	None (public)
Rate limit	Strict per-IP critical rate limit
Caching	60s in-process cache; responses include `Cache-Control: public, max-age=60`

Request examples

curl https://api.ephone.ai/v1/models/pricing

Response structure

The top level is always wrapped in a data array:

{
  "data": [
    {
      "id": "gpt-4o",
      "name": "GPT-4o",
      "created": 1715558400,
      "input_modalities": ["text", "image"],
      "output_modalities": ["text"],
      "quantization": "unknown",
      "context_length": 128000,
      "max_output_length": 16384,
      "pricing": {
        "prompt": "0.0000025",
        "completion": "0.00001",
        "request": "0",
        "image": "0",
        "input_cache_read": "0.00000125"
      },
      "supported_sampling_parameters": [
        "temperature", "top_p", "frequency_penalty", "presence_penalty",
        "stop", "seed", "max_tokens", "logit_bias", "response_format",
        "tools", "tool_choice", "parallel_tool_calls", "logprobs", "top_logprobs"
      ],
      "supported_features": ["tools", "json_mode", "structured_outputs", "logprobs"]
    }
  ]
}

Top-level field

Field	Type	Description
`data`	`Array<Model>`	Array of model entries; empty array means no models are currently published

Model fields

Field	Type	Required	Description
`id`	`string`	✅	Unique model identifier; use this value when calling `/v1/chat/completions`
`name`	`string`	✅	Display name (e.g. `GPT-4o`); falls back to `id` when not configured
`created`	`int64`	✅	Model release time (Unix seconds)
`input_modalities`	`string[]`	✅	Supported input modalities; includes at least `text`
`output_modalities`	`string[]`	✅	Supported output modalities; includes at least `text`
`quantization`	`string`	✅	Quantization (`fp16` / `fp8` / `bf16` / `int8` / `unknown`)
`context_length`	`int`	✅	Maximum context window in tokens
`max_output_length`	`int`	✅	Maximum output tokens per response
`pricing`	`Pricing`	✅	Pricing object; see below
`pricing_tiers`	`Pricing[]`		Tiered pricing array (at most one element; populated when an admin explicitly tags a Condition with a threshold)
`supported_sampling_parameters`	`string[]`	✅	Supported sampling parameters (e.g. `temperature`, `top_p`)
`supported_features`	`string[]`	✅	Supported feature flags (see table below)
`deprecation_date`	`string`		Planned deprecation date (ISO 8601, `YYYY-MM-DD`)
`hugging_face_id`	`string`		Hugging Face repository ID for open-weight models

Pricing fields

All prices are USD strings. Token-class fields are priced per token; flat-class fields use the unit listed below.

Field	Type	Required	Unit	Description
`prompt`	`string`	✅	USD / input token	Input token unit price
`completion`	`string`	✅	USD / output token	Output token unit price
`request`	`string`	✅	USD / request	Flat per-request fee (used by call-billed models)
`image`	`string`	✅	USD / image	Per-image price (image generation models)
`audio`	`string`		USD / audio	Per-audio-segment price
`video`	`string`		USD / video	Per-video-segment price
`web_search`	`string`		USD / query	Built-in web search tool fee
`internal_reasoning`	`string`		USD / reasoning token	Reasoning token price (o1 / Claude thinking)
`input_cache_read`	`string`		USD / token	Cache-hit read price
`input_cache_write`	`string`		USD / token	Cache write price (most expensive window is published)
`min_context`	`int`		tokens	Only valid inside `pricing_tiers`; minimum input tokens that trigger this tier

The four required fields (prompt / completion / request / image) are always present, possibly as "0". Optional fields are omitted when not configured.
A value of "0" may mean either “free” or “not applicable to this model”. Combine with supported_features and supported_sampling_parameters to infer capability instead of relying on pricing alone.

`supported_features` enum

Value	Meaning
`tools`	Function calling / tool use
`reasoning`	Returns reasoning tokens (`reasoning_content` / `thinking`)
`json_mode`	Supports JSON mode output
`structured_outputs`	Supports schema-constrained structured outputs
`logprobs`	Returns token log probabilities
`web_search`	Built-in web search tool

How the three billing modes map

The platform supports three internal billing modes (see Pricing). All three are losslessly mapped to OpenRouter fields:

Internal mode	Output field(s)	Notes
Per token	`prompt` / `completion` / `internal_reasoning` / `input_cache_read` / `input_cache_write` / `web_search`	Text and embedding models; per-million prices are converted to per-token strings
Per call	`image` / `video` / `audio` / `request`	Routed by the call unit: image generation → `image`, video → `video`, music/speech → `audio`, otherwise → `request`
Per second	`request` / `video` / `audio`	OpenRouter has no `per_second` field; the platform publishes `per_second × typical_duration` (default 30s) as the per-request price. Actual billing still uses real seconds, so reconciliation is exact.

How the price is computed

The published price is:

final_usd = base_price × group_ratio × currency_factor

base_price: the official price configured by the admin
group_ratio: the ratio of the default group
currency_factor: if the base is configured in CNY it is divided by the FX rate; if already USD it is left unchanged
Not applied: agent markups, user tier discounts, per-user model multipliers, time-of-day rules, runtime dynamic factors

Authenticated users may pay a different price. Pricing returned here always reflects the anonymous “default group” view. Custom group, tier discount, or agent-domain markups are layered on top per request and are intentionally hidden from this public endpoint.

Tiered pricing

Some models charge a different unit price above a context-length threshold (e.g. Gemini 1.5 / 2.5 Pro doubles at ≥128K tokens). The upper tier is exposed via pricing_tiers:

{
  "id": "gemini-1.5-pro",
  "pricing": {
    "prompt": "0.00000125",
    "completion": "0.000005",
    "request": "0",
    "image": "0"
  },
  "pricing_tiers": [
    {
      "min_context": 128000,
      "prompt": "0.0000025",
      "completion": "0.00001",
      "request": "0",
      "image": "0"
    }
  ]
}

pricing is the base tier (the Prompt Tier with the lowest threshold)
pricing_tiers[i] is the upper tier; applied when total input tokens ≥ min_context
At most one upper tier is returned (matching OpenRouter’s 2-tier maximum)

Data source: structured Prompt Tiers

The admin panel configures Prompt Tiers for each model. Each tier has two fields:

Field	Meaning
`min_prompt_tokens`	Trigger threshold (incl. cache and multimodal tokens)
`price`	Full `PriceData` for that tier (input/output/cache/multimodal unit prices)

Runtime billing and public exposure share the same data, ensuring published price ≡ billed price so OpenRouter monthly reconciliation never drifts.

Matching dimension

The threshold is matched against total input tokens, including cache and multimodal:

TotalPromptTokensForTier =
    plain text input
  + text cache (cache_read / cache_create / 5m / 1h)
  + image input + image cache_read
  + audio input + audio cache_read
  + video input + video cache_read

This aligns with OpenRouter min_context and Gemini/Claude official tiered docs — total context length defines the tier; otherwise users could bypass tiered pricing by going fully through cache.

Matching rule

Largest-threshold-wins: among all tiers satisfying total >= min, the one with the highest threshold is selected. The full price of the matched tier is applied (no progressive stepping, matching how official providers price). If no tier is matched, the base tier (lowest threshold) is used.

Conditions that cannot be tiered

OpenRouter only models the “input token count” dimension. The following scenarios are still billed by the general rule engine (Conditions) but are not published via pricing_tiers — OpenRouter only sees the base price:

Image quality (e.g. request.body.quality == '1080p')
Video duration (e.g. request.body.duration > 30)
Response-driven conditions (e.g. response.body.usage.reasoning_tokens > 100)
Compound conditions (&& / || combinations)

For these models, configure an unconditional baseline price under the default group as the publishable rate.

Caching and update cadence

Aspect	Behavior
Server cache	60-second in-process cache; admin updates take effect within 1 minute
Client cache	`Cache-Control: public, max-age=60`; CDN and browser caches are safe to use
Recommended polling	At most once every 5 minutes; aggressive polling is rate-limited

Comparison with other endpoints

Endpoint	Purpose	Auth
`/v1/models/pricing`	This endpoint: public pricing in OpenRouter format	Public
`/v1/models`	OpenAI-SDK-compatible model id list	Token required
`/api/pricing`	Front-end pricing page (user-aware)	Optional token

FAQ

Why does the returned price differ from what I actually pay?

The published price reflects the default group in an anonymous context. If you belong to a different group (e.g. pro / premium), have tier discounts, or access through an agent domain with markup applied, your real price will differ. Authenticated users should call /api/pricing to fetch their own effective price.

Why do some fields return '0'?

"0" can mean either “free” or “this dimension does not apply to the model”. Always combine supported_features and supported_sampling_parameters to infer model capability — don’t rely on pricing alone.

How are per-second models (TTS, Realtime, video) priced here?

OpenRouter has no per_second field, so the platform publishes per_second × typical_duration (admin-configured, default 30 seconds) as the request / video / audio price. Actual billing still uses real seconds, so OpenRouter’s monthly reconciliation always matches what we charged.

Is there a rate limit?

Yes — a strict per-IP critical rate limit. Best practices:

Poll no more than once every 5 minutes
Honor Cache-Control: max-age=60 and cache client-side
For long-term monitoring, subscribe to platform announcements rather than polling aggressively

Can I filter by vendor or modality?

No query parameters are supported right now — the endpoint always returns all healthy models. Filter client-side using supported_features / input_modalities after fetching.

Documentation Index

​Overview

​When to use it

​Endpoint

​Request examples

​Response structure

​Top-level field

​Model fields

​Pricing fields

​supported_features enum

​How the three billing modes map

​How the price is computed

​Tiered pricing

​Data source: structured Prompt Tiers

​Matching dimension

​Matching rule

​Conditions that cannot be tiered

​Caching and update cadence

​Comparison with other endpoints

​FAQ

Overview

When to use it

Endpoint

Request examples

Response structure

Top-level field

Model fields

Pricing fields

`supported_features` enum

How the three billing modes map

How the price is computed

Tiered pricing

Data source: structured Prompt Tiers

Matching dimension

Matching rule

Conditions that cannot be tiered

Caching and update cadence

Comparison with other endpoints

FAQ