What is TOON?

TOON
JSON
Optimization

We have all been there. You are engineering a prompt for a Large Language Model (LLM), and you need to pass structured data. You reach for JSON. It’s the industry standard, after all. But as you watch your context window fill up with endless curly braces, repeated keys, and quote marks around simple integers, you start to wonder: Is there a better way?

YAML offers readability but suffers from ambiguity. CSV is dense but lacks hierarchy.

Enter TOON.

TOON is a data serialization format that feels like a breath of fresh air for developers and a native language for AI models. It bridges the gap between human readability and machine efficiency. Today, let's dive deep into the syntax and mechanics of TOON to understand why it’s quickly becoming a favorite for high-efficiency data interchange.

The Philosophy: JSON Semantics, YAML Aesthetics

At its core, TOON shares the exact same data model as JSON. If you can represent it in JSON—primitives (strings, numbers, booleans, null), objects, and arrays—you can represent it in TOON. However, the presentation is radically different.

TOON ditches the braces. It uses indentation to represent hierarchy, much like YAML. A simple object looks clean and approachable:

Unlike YAML, however, TOON is strict about types. There is no guessing if no means false or the string "no". In TOON, strings only require quotes when absolutely necessary—such as when they contain special characters, resemble numbers, or are empty. If you type message: Hello World, you get a string. If you type count: 42, you get a number.

id: 123
name: Ada
active: true

The Power of Arrays: Length and Tables

Where TOON truly separates itself from the pack is its handling of arrays. This is the "killer feature" for token optimization.

Every array in TOON explicitly declares its length in brackets, like items[3]. This might seem redundant to a human, but for an LLM, it is a superpower. It allows the model to validate structure immediately and detect truncation. If the stream cuts off after two items but the header promised three, the parser knows something went wrong.

TOON effectively offers three ways to handle arrays, automatically choosing the most efficient one:

  1. Inline Primitives: For simple lists of numbers or strings, TOON keeps it compact. tags[3]: admin,ops,dev
  1. Standard Lists: For mixed types, it uses a hyphenated list syntax similar to YAML.
  1. Tabular Objects: This is the game-changer.

If you have an array of objects that share the same keys—a very common pattern in database records—TOON pivots to a Tabular Format. instead of repeating keys for every single row, it declares the keys once in the header.

In the example above, users[2]{id,name,role}: tells us we have 2 rows and defines the schema. The data follows in a CSV-like structure. This eliminates the massive token overhead of repeating "id":, "name":, and "role": for every user.

users[2]{id,name,role}: 1,Alice Admin,admin
  2,"Bob Smith",user

Delimiters and Token Efficiency

You might notice the use of commas in the examples above. TOON actually supports three delimiters: commas (default), tabs, and pipes (|).

Why does this matter? Tokenization.

In many LLM tokenizers, a comma followed by a quote might be split into multiple tokens. A tab character, however, often tokenizes very cleanly. TOON allows you to switch delimiters at the array header level. If you use a tab delimiter, you often don't even need to quote strings that contain spaces, further compressing your data.

The format is smart enough to handle "collisions." If your data contains the active delimiter, TOON simply quotes that specific value.

items[2	]{sku	name	qty}: A1	Widget Name	2
  B2	Gadget Name	1

Key Folding: Flattening the Curve

Another feature that highlights TOON’s focus on efficiency is Key Folding. Deeply nested objects usually result in a "staircase" of indentation that eats up horizontal space and tokens.

If you have a deep hierarchy where intermediate objects don't have siblings, TOON can collapse them into a dot-notation path.

Instead of writing:

You can write:

data:
  metadata:
    items[2]: a,b

This feature, available since spec v1.5, significantly reduces line count and indentation tokens. Importantly, this is fully reversible. When you decode the data with path expansion enabled, it reconstructs the deep object hierarchy perfectly.

data.metadata.items[2]: a,b

Strictness and Safety

Despite its concise look, TOON is not loose with data. It adheres to a strict set of rules for quoting and escaping.

Strings generally stay unquoted, which is great for readability. However, TOON enforces quoting for edge cases to ensure data integrity. If a string looks like a number (e.g., "05" or "1e-6"), it gets quoted to prevent it from being parsed as a number. If a string is a reserved word like true or null, it gets quoted.

Furthermore, TOON normalizes numbers. It emits canonical decimal forms—no scientific notation or trailing zeros in the output—ensuring consistency. It even handles BigInt safely; if a number exceeds the safe integer range, it is serialized as a string to prevent precision loss during transport.

Root Forms

While most of us work with Root Objects, TOON is flexible. A document doesn't have to start with a key-value pair. It supports Root Arrays (starting immediately with [N]:) or even a single Root Primitive. This parity with JSON means you can swap TOON into almost any pipeline where JSON is currently used, provided you have the parser on the other end.

Final Thoughts

TOON isn't just "another format." It is a specialized tool for an era where data is consumed by probabilistic models as often as it is by deterministic code. By combining the rigid data model of JSON with the density of CSV and the readability of YAML, it solves the specific problem of context-window optimization without sacrificing type safety.

If you are building agents, fine-tuning models, or just tired of scrolling through endless closing braces, it is time to give TOON a look.