TOON (Token-Oriented Object Notation) is a data serialization format designed specifically for LLM prompts to be highly efficient. It dramatically reduces token usage by 30-60% compared to JSON while remaining structured and human-readable. By using a tabular format for arrays and minimal syntax for objects, TOON makes your data cheaper and faster to process with AI models.

What's the difference between TOON and JSON?

The key difference is token efficiency. JSON is verbose, with brackets, quotes, and commas that consume tokens. TOON is a more compact syntax designed for LLMs, representing arrays as tables with headers and using minimal punctuation. This efficiency directly translates to significant cost savings on your LLM API bills, especially for large or repeated datasets.

How much can I save with TOON?

You can typically expect to save 30-60% on LLM tokens compared to using JSON. For large datasets or frequent API calls, this translates directly into significant cost savings. Data with repeated structures, like API responses or database results, often sees savings at the higher end of this range (40-60%).

Is TOON compatible with all LLMs?

Yes. TOON is a simple text format that works flawlessly with all major large language models, including those from OpenAI (GPT-4), Anthropic (Claude), Google (Gemini), and Meta (LLaMA). Since any LLM can process plain text, they can all be instructed to understand and parse the TOON format with a simple instruction in your prompt.

Can I convert TOON back to JSON?

Absolutely. TOON is fully and losslessly reversible. Our converter tool supports bidirectional conversion, meaning you can convert TOON back to the exact original JSON structure without any data loss. This allows you to use TOON for efficiency and then convert back to JSON for compatibility with other tools.

What types of data work best with TOON?

TOON can represent any valid JSON data, but it delivers the highest token savings (40-60%) on uniform tabular data. This includes database query results, API responses with lists of objects, analytics data, or product catalogs. While TOON fully supports nested objects and arrays, the token reduction is most dramatic with flatter, more repetitive data structures.

Is my data safe when using this converter?

100% safe. All conversion from JSON to TOON (and back) happens locally in your browser. Your data is never sent to any server, never stored, and never seen by us. The converter even works offline once the page has loaded, guaranteeing your information remains private.

Yes, completely free. Both this TOON converter and the underlying TOON format specification are open and free to use without any limits, file size restrictions, or premium features. It's an open-source effort to make working with LLMs more efficient for everyone.

为什么 TOON 优于其他格式

法学硕士

基准测试

抹布

如果您正在构建 LLM 应用程序，特别是使用大型数据集的检索增强生成 (RAG) 系统或代理，您可能会在两个方面进行持续的战争：令牌成本和上下文窗口限制。

多年来，JSON 一直是数据交换的默认通用语言。它（大部分）是人类可读的并且无处不在。但是，当您将 500 行 JSON 数组粘贴到提示中时，您会在重复的字段名称（“id”:、“name”:、“email”:`）上烧毁数千个标记，这些标记对于特定行来说具有零语义值。

输入卡通。它是专门为解决 LLM 输入中的信噪比问题而设计的格式。我一直在深入研究最新的基准测试，结果令人震惊：TOON 不仅节省了空间，而且还节省了空间。它实际上帮助 GPT-5-nano 和 Gemini-2.5-flash 等模型更好地理解数据。

让我们来分析一下为什么 TOON 能够击败重量级数据（JSON、CSV、YAML、XML）并查看原始数据。

冗长陷阱：JSON 与 TOON

代币效率的最大敌人是结构重复。让我们看一下标准的时间序列分析数据集。在 JSON 中，每个数据点都承载着其模式的包袱。

JSON（标准） 基准测试中使用的代币：22,250

这是大量浪费的空间。现在，看看 TOON 的等效项。 TOON 在标头中定义架构一次，然后切换到值的密集、CSV 样式布局。

卡通 基准测试中使用的代币：9,120

结果： 代币使用量大幅 59.0% 减少。

通过去除重复的键，TOON 允许您将更多历史记录放入模型的上下文窗口中。但至关重要的是，与 CSV 不同，它通过标头定义“metrics[5]{...}”保持类型感知和显式结构。

为什么不直接使用 CSV？

这是最常见的反驳观点。 “如果您想要平面数据，只需使用 CSV。”

问题在于现实世界的数据很少是完全平坦的。当您有嵌套结构、对象内的列表或包含逗号和引号的复杂描述时，CSV 就会完全崩溃。

在基准测试中，特别是混合结构轨道（包括电子商务订单和事件日志），CSV 被完全排除在外，因为它无法在没有有损扁平化的情况下表示数据。

TOON 优雅地处理了这个问题。它允许在优化数组时嵌套对象。在对 100 个 GitHub 存储库（包含混合文本描述和元数据）的测试中，效率差距很明显：

JSON： 15,145 个令牌

TOON： 8,745 个代币（节省 42.3%）

即使与 JSON Compact（缩小版）相比，TOON 仍然节省了近 24% 的成本。当您支付每百万代币时，这就是立即的投资回报率。

准确性：意外获胜者

这是令我惊讶的部分。通常，当你压缩数据时，你会失去清晰度。你可能会认为法学硕士很难解析更密集的格式。基准测试显示相反的情况。

在 Claude Haiku、Gemini Flash 和 GPT-5-nano 等模型上测试的 209 个数据检索问题中，TOON 实现了 73.9% 的检索准确率，而标准 JSON 的检索准确率**为 69.7%。

为什么？它可能归结为认知负荷（或法学硕士同等学历）。

更少噪音： 该模型不必处理数千个重复的“key”标记。相关值在注意力机制中更加紧密地结合在一起。

显式元数据： TOON 标头显式包含计数 ([N]) 和字段名称。

结构意识： 在询问数据集结构的测试中（例如，“有多少行？”），TOON 达到 88% 的准确率，而 JSON 和 XML 落后。 TOON 标头 (repositories[100]) 中的显式计数充当提示，防止模型必须手动“计数”令牌，而 LLM 非常不擅长这一点。

XML 和 YAML 疲劳

我们应该简要提及其他竞争者。

XML 是这里的重大输家。它冗长、难以阅读且处理成本昂贵。在基准测试中，XML 始终使用最多的标记（TOON 表示的统一员工记录集超过 5,000 个标记，约为 2,700 个），但准确性最低 (67.1%)。

YAML 的性能比 XML 更好，但与 TOON 相比仍然存在令牌膨胀的问题。虽然 YAML 非常适合人工配置文件，但其对空格敏感的性质和键重复使其对于大容量数据上下文来说不是最佳选择。在“电子商务订单”测试中，YAML 使用的代币比 TOON 多约 14%。

何时切换？

数据是相当有决定性的。如果您正在处理：

对象列表： 日志、交易历史、搜索结果或产品目录。

RAG 管道： 从数据库检索数据块以输入提示的位置。

大容量 API： 带宽和延迟很重要的地方。

TOON 提供了“两全其美”的场景。您可以获得 CSV 的密度和 JSON 的结构完整性。

在基准测试中，GPT-5-nano 在 TOON 格式数据上实现了惊人的 90.9% 准确率。这表明更新、更智能的模型越来越擅长解析这些优化的格式，这意味着放弃 JSON 的“可读性损失”对于机器来说实际上为零。

如果您仍然将 RAG 上下文格式化为“JSON.stringify(data, null, 2)”，那么您实际上是在为每个 API 调用支付“可读性税”。也许是时候转换格式了。