Docs

Methodology and API.

Use this page to verify the benchmark claim and copy the request shape.

reference

Average reduction

30%

Observed average token reduction in current production traffic.

Best benchmark

49%

Highest reduction observed in the benchmark set.

Max single call

90%

Highest reduction observed on an individual call.

Methodology

What was measured.

The latest Rust compression report covers 6,216,052 user prompts across eight datasets.
Report-wide token reduction comes in at 22.04%.
Current production traffic averages about 30% reduction, and individual calls can reach 90%.
Best benchmark result reaches 49%, with dataset averages summarized below.

API

Example request.

Use your API key with a standard OpenAI-compatible client. Most integrations only need a new base URL.

Base URL

https://api-infer.agentsey.ai/v1

POST https://api-infer.agentsey.ai/v1/chat/completions
Authorization: Bearer zr_your_api_key
Content-Type: application/json

{
  "model": "<configured-model>",
  "messages": [{ "role": "user", "content": "Hello" }]
}

Benchmarks

30% average reduction. Up to 90% on single calls.

The latest eight-dataset Rust report comes in at 22.04% overall. Best benchmark result: 49%.

Personal Claude Code Data

8,724 prompts

25.84%

Routed token load74.16

Dataclaw

2,372 prompts

39.80%

Routed token load60.20

Agentic Code Dataset 22

17,611 prompts

5.93%

Routed token load94.07

SWE-bench / SWE-smith trajectories

2,849,278 prompts

28.45%

Routed token load71.55

Nebius SWE-agent trajectories

2,115,624 prompts

11.93%

Routed token load88.07

OpenHands SFT trajectories

9,630 prompts

22.99%

Routed token load77.01

CodeChat V2.0

1,116,303 prompts

20.06%

Routed token load79.94

Claude Multiround Chat 30k

96,510 prompts

0.97%

Routed token load99.03