Why TOON Outperforms Other Formats

LLM
Benchmarks
RAG

If you are building LLM applications, specifically Retrieval-Augmented Generation (RAG) systems or agents that consume large datasets, you are likely fighting a constant war on two fronts: token cost and context window limits.

For years, JSON has been the default lingua franca of data interchange. It’s human-readable (mostly) and ubiquitous. But when you paste a 500-row JSON array into a prompt, you are burning thousands of tokens on repeated field names ("id":, "name":, "email":) that carry zero semantic value for the specific row.

Enter TOON. It’s a format designed specifically to solve the signal-to-noise ratio problem in LLM inputs. I’ve been diving into the latest benchmarks, and the results are startling: TOON isn't just saving space; it’s actually helping models like GPT-5-nano and Gemini-2.5-flash understand data better.

Let’s break down why TOON is beating the heavyweights (JSON, CSV, YAML, XML) and look at the raw numbers.

The Verbosity Trap: JSON vs. TOON

The biggest enemy of token efficiency is structure repetition. Let’s look at a standard Time-Series Analytics dataset. In JSON, every single data point carries the baggage of its schema.

JSON (Standard) Tokens used in benchmark: 22,250

That is a lot of wasted space. Now, look at the TOON equivalent. TOON defines the schema once in the header and then switches to a dense, CSV-style layout for the values.

TOON Tokens used in benchmark: 9,120

The Result: A massive 59.0% reduction in token usage.

By stripping away the repeated keys, TOON allows you to fit more history into the model's context window. But crucially, unlike CSV, it maintains type awareness and explicit structure via the header definition metrics[5]{...}.

Why Not Just Use CSV?

This is the most common counter-argument. "If you want flat data, just use CSV."

The problem is that real-world data is rarely perfectly flat. CSV breaks down completely the moment you have nested structures, lists within objects, or complex descriptions containing commas and quotes.

In the benchmarks, specifically the Mixed-Structure Track (which includes e-commerce orders and event logs), CSV was excluded entirely because it couldn't represent the data without lossy flattening.

TOON handles this gracefully. It allows for nested objects while optimizing the arrays. In a test of 100 GitHub repositories (which contain mixed text descriptions and metadata), the efficiency gap was clear:

  • JSON: 15,145 tokens
  • TOON: 8,745 tokens (42.3% savings)

Even against JSON Compact (minified), TOON still squeezed out nearly 24% more savings. When you are paying per million tokens, that is immediate ROI.

Accuracy: The Surprise Winner

Here is the part that surprised me. Usually, when you compress data, you lose clarity. You would expect the LLM to struggle to parse a denser format. The benchmarks show the opposite.

Across 209 data retrieval questions tested on models like Claude Haiku, Gemini Flash, and GPT-5-nano, TOON achieved a 73.9% retrieval accuracy, compared to standard JSON's 69.7%.

Why? It likely comes down to Cognitive Load (or the LLM equivalent).

  1. Less Noise: The model doesn't have to attend to thousands of repeating "key" tokens. The relevant values are closer together in the attention mechanism.
  1. Explicit Metadata: TOON headers include the count ([N]) and field names explicitly.
  1. Structure Awareness: In tests asking about dataset structure (e.g., "How many rows are there?"), TOON hit 88% accuracy, while JSON and XML lagged behind. The explicit count in the TOON header (repositories[100]) acts as a hint that prevents the model from having to "count" tokens manually, which LLMs are notoriously bad at.

The XML and YAML Fatigue

We should briefly mention the other contenders.

XML is the heavy loser here. It is verbose, difficult to read, and expensive to process. In the benchmarks, XML consistently used the most tokens (over 5,000 for a uniform employee record set that TOON represented in ~2,700) and had the lowest accuracy (67.1%).

YAML performs better than XML but still suffers from token bloat compared to TOON. While YAML is great for human configuration files, its whitespace-sensitive nature and key repetition make it suboptimal for high-volume data context. In the "E-commerce orders" test, YAML used ~14% more tokens than TOON.

When to Switch?

The data is fairly conclusive. If you are dealing with:

  1. Lists of Objects: Logs, transaction histories, search results, or product catalogs.
  1. RAG Pipelines: Where you retrieve chunks of data from a DB to feed into a prompt.
  1. High-Volume APIs: Where bandwidth and latency matter.

TOON offers a "best of both worlds" scenario. You get the density of CSV with the structural integrity of JSON.

In the benchmarks, GPT-5-nano achieved a staggering 90.9% accuracy on TOON formatted data. This suggests that newer, smarter models are becoming increasingly adept at parsing these optimized formats, meaning the "readability penalty" of moving away from JSON is effectively zero for the machine.

If you are still formatting your RAG context as JSON.stringify(data, null, 2), you are effectively paying a "readability tax" on every single API call. It might be time to switch formats.