Working group formed to develop standard for AI-native docs

Computerworld NZ

Working group formed to develop standard for AI-native docs

LF AI & Data Foundation, a division of the Linux Foundation, launched a working group on Tuesday that will focus on the development of DocLang, a specification intended to support interoperable document processing across AI and agentic workflows. The working group, founded by premier members IBM, Nvidia and Red Hat, is tasked with the creation of an open, universal, AI-native document format designed to improve how enterprises prepare, exchange, and govern document data for AI systems. Contributors ABBYY and Human Signal will also be involved in its development. The announcement stated, “enterprises today work across a fragmented landscape of document formats, including PDFs, JPEGs, and other file types built primarily for human consumption rather than AI interpretation.” As organizations increasingly rely on generative AI and agentic systems, it said, “this disconnect can introduce complexity, raise costs, and reduce reliability when extracting meaning from business documents.” Mark Collier , executive director of LF AI & Data, said the goal of the DocLang Specification Working Group is to “develop a vendor-neutral, interoperable standard that helps organizations prepare document data for AI more reliably, transparently, and at scale.” To that end, an information document released by the group stated, “PDF was built for print, DOCX was built for editors. DocLang is built for what comes next, a machine-readable document standard your models can actually trust.” DocLang, it said, “defines a structured, machine-readable format for documents of any type. Not a converter. Not an API. A standard, like JSON for data, like HTML for the web, that any tool can implement and any pipeline can consume.” Standards must evolve for AI Something like DocLang is needed, said independent technology analyst Carmi Levy . “Existing document standards have done an admirable job allowing global stakeholders to confidently collaborate for decades, but it’s becoming increasingly clear that they are in desperate need of an update as AI reshapes the rules around how work gets done,” he explained. Largely static document types, he said, “can be somewhat limiting when AI is redefining the very word, ‘document.’ In many ways. AI-age documents are far more iterative and dynamic than what they once were, and the definitions need to evolve with the times. The documents we currently live with simply weren’t designed for the AI age.” Within that context, Levy said, “DocLang represents an early, best hope of achieving some kind of foundational baseline for document standards, one that will hopefully allow more intelligent, more efficient, lower-risk workflows than is currently the case.” Taking an open-source, vendor-agnostic approach to the process ensures the collective will take precedence over the needs of specific vendors, he said, adding, “earlier standards-setting efforts around networking, documentation, the web, and the cloud powered the free-flowing digital landscape that defines modern life.” An AI-centric documentation standard will carry that reality into the next generation of technology, said Levy. A question of governance Asked what a DocLang standard will mean for human workers and in particular for governance and accountability, Jason Andersen , principal analyst at Moor Insights & Strategy said, “at a high level, I like and understand the idea of standards, but the question raises an important point.” The entire concept of LLMs, said Andersen, “involves using natural human languages. The computer is supposed to understand us without us changing our syntax or language. Forcing a syntax on users is exactly what we have today with SEO and more advanced programming languages.” With something like DocLang, where the standard can be applied to content ingestion, he said, “I would be OK with that being automated, which seems to be the intent. The use case I envision is that when I upload a document to an agent, a skill can be run to preprocess the document into the DocLang standard format, saving tokens.” That makes sense, he said, adding that he thinks it’s good “if it can help generate outputs, like a visualization, that can be shared outside an AI tool. On that front, that is also why I am liking Web MCP, since you are just adding some code to the page, like CSS or JavaScript, and the consumer, in this case, an AI browser or skill, is better equipped to handle the site.” The point, he said, is, “these standards need to preserve the fact that humans can still do what they want, and do not need to know any coding to be proficient. In terms of governance, I am not sure if it matters.” Again, Andersen pointed out, “if there is some sort of preprocessing that appends metadata or code to the document, as long as it’s maintained, there should be no issue; in fact, it could make governance easier, since there is some standardization of the context. But that’s not coming across yet in the specs, and I’d encourage the team to consider it.” Yaz Palanichamy , senior research analyst at Info-Tech Research Group, said, “in theory, the concept of AI-native documents, at least from a user productivity standpoint, can certainly help organizations better prepare their organization’s documentation data for AI-embedded systems.” However, he added, “organizational compliance controls and an overarching governance model would be absolutely necessary to employ if and when an organization does decide to proceed with such a use case.” Moreover, in addition to model training permissions and fine-tuning extraction scope, Palanichamy said, “the hypothetical organization ‘X’ that wants to employ AI-embedded document management workflows needs to also understand whether their company, from a technology readiness standpoint, is able to appropriately standardize their internal document management practices across both AI and agentic workflows.” That being said, he added, “without doing any internal feasibility studies or prepping their organization in advance, change management from a document lifecycle management standpoint will not be enforced appropriately, and, therefore this would deter the organization from maturing and/or scaling their AI-embedded document processing capabilities further.” Palanichamy pointed out, “in essence, while in theory DocLang as a universal AI-native documentation format is not an ineffective idea as such, there will still be several organizational controls that will need to be reviewed appropriately from a governance standpoint to ensure that the organization scales this new collaborative standard and toolkit in an accountable and secure manner.” This article originally appeared on CIO.com .

Go to News Site

Google Play

App Store