Nodejam Logo

Why We're Building a New File Format

Why .ndjm exists and why legacy formats don't cut it.

March 15, 2026 · 8 min read

Why We're Building a New File Format

Office documents haven't fundamentally changed since the 1990s. .docx, .xlsx, and .pptx are ZIP archives of XML designed for desktop software and local file systems. They were built for a world where documents lived on hard drives, got emailed as attachments, and were edited by one person at a time. That world no longer exists. Nodejam is building .ndjm, a purpose-built format designed for cloud databases, agents, and real-time workflows.

Legacy Formats Were Not Built for This

Every tool that wants to modify a Word document has to decompress a ZIP archive, parse nested XML, mutate the document tree, revalidate the schema, and repackage everything. A simple text replacement becomes a multi-step pipeline with dozens of failure modes. Spreadsheets carry dense grids of empty cells. A 10-row dataset in Excel is stored as if it were a full sheet. Presentations embed binary images directly into the file, inflating document sizes and making them opaque to automated processing.

These formats assume documents live on a desktop. Today they live in databases, get processed by agents, and need to be read and written hundreds of times per session. The impedance mismatch between legacy file formats and modern infrastructure is not a minor inconvenience. It's a fundamental constraint on what's possible.

What .ndjm Is

The .ndjm format is a proprietary document representation stored directly in the database. Three content types are unified under one format: text documents support full rich formatting including tables, images, and nested lists. Spreadsheets support a complete formula engine with multi-sheet capabilities and efficient storage that scales with actual data, not empty cells. Slides support charts, tables, shapes, and images with precise positioning and layout control.

Images are referenced rather than embedded, keeping documents lightweight and eliminating the bloat that plagues legacy formats. The format is optimized for machine readability: purpose-built for programmatic access, instant queries, and automated processing at scale.

Designed for Agents

Most AI tools for documents work as plugins on top of legacy formats, bolted onto systems that were never designed for them. They inherit every limitation of the format underneath. Nodejam took a different approach: build the format first, then build the agent system natively on top of it.

Nodejam's agent system operates with a deep set of specialized tools that read and write .ndjm directly. No format conversion, no intermediate representations, no decompression. When an agent needs to insert a row into a spreadsheet, it issues a precise operation. When it needs to reformat a paragraph, it targets exactly the content that needs to change. The format is compact by design: agents work with only the data that matters, dramatically reducing overhead compared to dense grid formats. Structured enough for deterministic tool operations, flexible enough for natural language generation.

Multi-round agentic loops execute dozens of sequential edits in a single session without ever touching a file system. There is no save step, no export step, no conversion step. The document in the database is the document the user sees, and it's the same document the agent modifies. The format carries no overhead. What the agent sees is the content itself, not layers of packaging around it. This means more of the context window is spent on actual work, not parsing legacy structure. This eliminates an entire class of synchronization bugs that plague agent-augmented editing in legacy formats.

A Project, Not a File

In legacy suites, Word, Excel, and PowerPoint are separate applications with separate formats. They don't share context, don't share structure, and can't reference each other natively. An agent working across them is switching between entirely different systems. In .ndjm, there is no app boundary.

The .ndjm format is not a single document. It's a collection of files within a project. Text documents, spreadsheets, and slides coexist in one workspace with drag-and-drop reorderable tabs. This reflects how real work happens: a financial model lives next to the memo that explains it, which lives next to the presentation that summarizes it. Agents operate across all files in the project, combining and cross-referencing content types that legacy formats keep siloed.

Current Challenges

Building a new format doesn't mean abandoning the old ones. The world still runs on DOCX, XLSX, and PPTX, and we don't expect that to change overnight. Importing from and exporting to legacy formats has to work well. Anything less would make .ndjm an island. The challenge is that these formats encode content in complex, sometimes inconsistent ways, where the same document can be structured differently depending on which version of which app created it. Achieving perfect import fidelity is a solvable engineering problem. It just takes time and resources to get right. The pipeline works today, and accuracy improves with every iteration.

Building a file format from scratch also means rebuilding every feature that legacy suites have had decades to accumulate. We're at roughly 80-90% feature parity with Microsoft Office and Google Workspace today. The remaining gaps are not architectural limitations. They're engineering iterations, the same kind of work that closed the first 80%. The goal isn't just to match what exists. It's to surpass it, because a format built for modern infrastructure can support things that legacy architectures never will.

What Comes Next

Text, spreadsheets, and slides are just the beginning. The .ndjm format is designed to grow. New content types, deeper integrations, and capabilities that legacy suites were never architected to support. When you're not constrained by formats built for a different era, the ceiling disappears.

We're not building a better version of what already exists. We're building what comes after it. The file format is the foundation. Everything else builds on top of it.