Skip to content
Back to blog

OfficeDex Blog

If Coding Can Be Vibe Coding, Why Can't Office Work Be Vibe Officing?

A technical essay on the gap between today's Office + AI tools and Vibe Officing, and why OfficeDex chooses OOXML as the collaboration medium.

OfficeCLI8 min read

OfficeDex open source repository: https://github.com/officecli/officedex

People are already familiar with the way Vibe Coding works, but Vibe Officing is rarely discussed. This article looks at the gap between today's AI document tools and Vibe Officing from a senior engineering perspective. It explains why neither HTML nor Markdown can carry this workflow, and why OOXML is a better technical foundation.

I am a founder building for overseas markets, and I have used Vibe Coding to create several products. During promotion and operations, I found that most of my time was no longer spent writing code. Because Vibe Coding is asynchronous, the time I actually spend on it is roughly 20 percent. The remaining 80 percent goes into Reddit posts, X content, investor documents, product materials, and other office work.

After trying many Office + AI products, I still found document writing painfully time-consuming. At first I thought I was using them incorrectly. But after studying how these products work, I now think many Office + AI tools are taking the wrong path. They are not moving toward Vibe Officing.

When I first used Manus and Genspark, I thought they would save a lot of time. I only had to type one sentence, wait a while in the browser, and a finished-looking result appeared with a title, colors, and layout. It looked convincing. But when I downloaded the .pptx file and opened it locally, the details often broke. Title positions shifted, editable charts became images, and complex layouts were flattened. I spent a long time aligning and adjusting things by hand. After that, I wanted AI to revise the copy on pages 6 to 10 in batch, only to discover that the deck could not be sent back to the AI for continued work. These products are useful material generators, but they are still far from Vibe Officing.

Today's Vibe Coding also requires interaction between humans and AI. Human review and manual edits are still necessary. Vibe Coding works because its users are programmers. Programmers can read and write code. When either AI or a human changes the code, the other side can understand and modify it. The loop is complete.

The same pattern does not transfer directly to AI office work. An office file is not plain text. It has pages, images, charts, comments, themes, masters, and many pieces of business information that look like layout details. After a human edits the file, the AI must still understand it. After the AI edits the file, the human must be able to see the result and continue editing it manually. If that is not possible, a fast first draft cannot save the rework that follows. That is why many Office + AI demos look smooth but feel awkward once they enter real work.

Three Obstacles on the Way to Vibe Officing

Vibe Coding works because code is naturally suitable for shared maintenance between humans and machines. Source code is readable, editable, executable, and testable. Most Office + AI workflows fail for three reasons.

Human-AI collaboration cannot form a closed loop

There is both an execution gap and an evaluation gap between Office + AI software and user needs. The user wants the AI to "revise the body copy on slides 6 to 10, but keep the layout and colors unchanged." The AI often responds by generating a new deck that looks similar to the requested result. That is the execution gap.

The generated artifact may preview correctly in the browser, but once it is downloaded, styles can shift and object properties can change. That creates the evaluation gap. These two gaps make it impossible for human-AI collaboration to form a reliable closed loop.

There is no sustainable editability

Because users are still clarifying their own requirements, and because prompt writing is limited, AI-generated artifacts almost never work as final versions on the first attempt. In every AI generation domain, local editing is a capability users care about deeply.

Image generation is a useful comparison. If an image cannot be edited locally and reliably after generation, users can only keep rolling again and hope for the right result. Once local refinement became more stable, AI image generation moved toward AI video generation. Documents have the same requirement. An AI-generated document must be able to return to the AI for further modification before it becomes useful in real work.

The collaboration medium is not authoritative enough

The collaboration medium is the format that humans and AI operate on together across multiple rounds. Humans judge work by looking at the result of that medium, so the medium must be authoritative. AI edits, human edits, preview, and final export all need to refer to the same object.

When developing a static frontend page, HTML is authoritative. In office document production, the collaboration medium must preview exactly like the final deliverable.

Markdown and HTML Are Both the Wrong Fit

The Claude team previously published "Using Claude Code: The unreasonable effectiveness of HTML," which sparked discussion in the Vibe Coding community. I agree with the core point. Humans are just as important as AI during Vibe Coding. Markdown is a compromise that favors AI, but it is not friendly enough for humans. When presenting a design proposal to people, HTML is more effective than Markdown.

In office documents, Markdown is excellent for README files, notes, and simple technical explanations. It is lightweight and readable as source. But it is fundamentally a linear text format. An image inside Markdown is usually just a ![]() reference.

Office documents in non-developer environments need far more than that. Images need anchors. They need cropping. They need relationships with surrounding text. Slides have placeholders, masters, themes, chart objects, and many other elements that Markdown cannot express well.

HTML is much more expressive than Markdown. That is why the Claude team strongly recommends it: AI can output a browsable page that helps humans make decisions.

But HTML is still the wrong fit for office documents. First, it is mainly readable; only programmers know how to edit it directly. Second, export fidelity is a real problem. The Manus and Genspark examples above show this clearly. HTML-based preview can only be treated as a reference, not as the final object.

Why OOXML Is a Better Fit

I am more optimistic about native OOXML. ECMA-376 standardized Office Open XML, including its vocabulary, document representation, and packaging model. Microsoft's Open XML documentation also explains that Open XML files are made of packages, parts, and relationships. WordprocessingML, PresentationML, and SpreadsheetML correspond to Word, PowerPoint, and Excel document structures.

In essence, .docx, .pptx, and .xlsx files are ZIP packages. After extraction, they contain a group of XML parts. These data files describe body content, styles, themes, images, icons, comments, and file relationships. Each part carries one category of information, and relationships connect the parts. A native Office file is a small document project.

AI can treat that project like code. When a change is needed, it can read and modify the key files selectively. From the AI's point of view, it is writing code.

LLMs are already familiar with OOXML. Office Open XML, Open Packaging Convention, Office automation, format conversion, python-docx, and python-pptx have existed for a long time in public documentation and code repositories. For a model, unzipping a package, traversing XML trees, locating nodes by namespace, and following image or chart references through relationships are close to code-understanding and code-editing tasks.

OOXML maps directly to the three obstacles above.

First, it makes the collaboration loop possible. AI edits the native file structure. Humans view and continue editing the same file. There is no need to convert back and forth between an HTML preview and an Office file. The execution object and the evaluation object are aligned, so both the execution gap and the evaluation gap become much smaller.

Second, OOXML supports sustainable editability. It is a small code project. AI can make local changes while preserving unrelated content.

Third, OOXML can become the authoritative collaboration medium. .docx, .pptx, and .xlsx files are the objects AI operates on, the objects users edit locally, and the objects delivered at the end. The collaboration medium, editing medium, and delivery medium are the same object, so multi-round human-AI work does not break during format conversion.

That is why OOXML is the best foundation for Vibe Officing.

My Attempt at Vibe Officing

Document adjustment takes up a large part of my daily work, and I could not find a product that really matched this workflow. So I built a tool based on the ideas above. It is called OfficeDex. The tool is shaped by my own daily work, and I keep improving it as I use it.

OfficeDex sets native .docx, .pptx, and .xlsx files as the target files. That reflects the ideas above: human-AI collaboration, native formats, mixed text and visual layout, and OOXML.

This is what I mean by Vibe Officing. It is not just a name copied from Vibe Coding. At the technical level, it is still writing code, but the code is OOXML. Vibe Coding produces applications and services. Vibe Officing produces office documents. OOXML provides structure, chart objects provide data visualization, style systems and layout rules provide pages, and data binding connects content back to business information.

When a user says, "Help me make a proposal I can show to a client," a Vibe Officing product should not only output a document. More importantly, the user and the AI should be able to keep working around the same file object. OfficeDex is my desktop-client attempt to put that idea into practice.