
The Google Cloud logo is pictured at the MWC (Mobile World Congress), the world's biggest mobile fair, in Barcelona on March 4, 2025. Surrounded by investment and innovation projects, the Mobile World Congress (MWC) kicks off today in Barcelona amid a context of euphoria but also tensions over artificial intelligence (AI), whose rapid advancement is shaking up the tech sector. Manaure Quintero/Getty Images
Three days after Google Cloud published the Open Knowledge Format on June 12, 2026, engineering and data teams building agentic AI systems now have something they have lacked since agents became a realistic enterprise tool: a single, vendor-neutral specification that packages what an organization knows into a form any AI agent can read, without custom integrations, proprietary SDKs, or platform lock-in.
That specification — Open Knowledge Format v0.1, or OKF — is available on GitHub under the Apache 2.0 license and was released by Google Cloud tech leads Sam McVeety and Amir Hormati alongside three sample knowledge bundles and two working reference implementations. The practical stakes are immediate: any team that has been building its own bespoke "AGENTS.md," "CLAUDE.md," or Obsidian vault pipeline can now adopt a shared structure instead, and any tool that adds OKF support becomes readable by every other tool that understands the spec.
The problem OKF addresses has a name inside Google Cloud: the context-assembly problem. Before an AI agent can answer a question like "How do we compute weekly active users from our event stream?" it has to retrieve fragments of context from wherever that knowledge lives — a metadata catalog API, a shared drive, code comments in a repository, a Confluence wiki, or the head of a senior data engineer who has been at the company for five years.
Every organization that has deployed agentic AI workflows has hit this wall. The information exists. It is just scattered across incompatible surfaces, and no two surfaces speak the same format. The result, as Google Cloud puts it, is that every agent builder solves the same assembly problem from scratch, every catalog vendor reinvents the same data models, and the knowledge stays locked behind whichever surface produced it. OKF's answer is not a new service, database, or runtime. It is a format.
An OKF bundle is a directory tree of plain Markdown files. Each file represents one concept — a single unit of knowledge that might describe a database table, an API endpoint, a business metric, a runbook, or an incident playbook. The concept's identity is its file path within the bundle: a file at tables/orders.md carries the concept ID tables/orders.
Every concept document has two parts. At the top of the file sits a YAML frontmatter block, delimited by --- markers, that carries a small set of structured fields agents can query without parsing the full document. Below that is a standard Markdown body for free-form content — schema descriptions, business context, caveats, SQL examples, whatever the author wants to write.
Only one frontmatter field is required: type, a short string that identifies what kind of concept the document describes. Google's examples include BigQuery Table, API Endpoint, Metric, and Playbook. The optional recommended fields — title, description, resource, tags, and timestamp — provide richer structure but are never mandatory. Consumers must tolerate unknown fields and unknown type values without rejecting the document, a design choice that makes the format genuinely tolerant of early-stage and partial implementations.
Concepts link to each other using standard Markdown hyperlinks. Those links turn the bundle's flat directory into a graph of relationships whose meaning is conveyed by the surrounding prose rather than by a typed edge schema — a table concept might link to a metric concept that uses it, and the sentence explaining that link is what tells an agent the nature of the relationship. The full OKF specification is available on GitHub.
Two reserved filenames carry defined meaning anywhere in the directory hierarchy. An index.md file enumerates what is available in its directory, enabling progressive disclosure so an agent can survey a bundle before reading individual documents. A log.md file records changes to that scope in date-grouped entries, newest first — a lightweight changelog that travels with the knowledge rather than living in a separate ticketing system.
OKF's minimal design is a deliberate inversion of the approach taken by earlier formal knowledge representation standards. The W3C's Web Ontology Language (OWL) and Resource Description Framework (RDF) — developed in the early 2000s — offered far more expressive power: typed relationships, formal axioms, automated reasoning support. They also required schema registries, specialized tooling, and significant expertise to implement. Enterprise adoption remained limited to organizations with dedicated ontology engineering resources.
OKF reaches the opposite conclusion. Its specification states that knowledge should be "readable by humans without tooling, parseable by agents without bespoke SDKs, and diffable in version control." There is no central schema registry. There is no required tooling. The specification fits on a single page. If you can open a Markdown file in a text editor, you can read OKF. If you can run git clone, you can ship it.
This is not a limitation — it is the architecture. A format that requires no infrastructure to read has no adoption barrier. A format that lives in git inherits all of git's versioning, access control, and collaboration properties for free. A format that is plain text can be consumed by any agent framework without a purpose-built connector.
Google shipped two reference implementations alongside the specification to demonstrate how OKF bundles are produced and consumed in practice.
The producer is a reference enrichment agent that turns a BigQuery dataset into a conformant OKF bundle through a two-pass process. In the first pass, the agent walks every table and view in the dataset and drafts a concept document for each one, using BigQuery's own metadata — column names, data types, descriptions — as the raw material. The result is a structurally valid but content-sparse bundle.
In the second pass, the agent crawls authoritative web documentation related to each concept and uses a language model to enrich each document with citations, join paths between related tables, schema context, and business-layer explanation. The enrichment agent is built on Google's Agent Development Kit (ADK) using Gemini as its model backend. A configurable --web-max-pages cap and a --web-allowed-host filter limit how much of the web each enrichment pass can reach, preventing runaway token consumption.
The consumer is a static HTML visualizer that renders any OKF bundle as an interactive graph view — a single self-contained HTML file that requires no backend, no installation, and no data leaving the reader's browser. The visualizer treats every Markdown link between concept documents as a directed edge, producing a navigable relationship map of the organization's knowledge.
Three ready-to-browse sample bundles are available in the GitHub repository: GA4 e-commerce data, the Stack Overflow public dataset, and the Bitcoin public dataset — all produced by the reference enrichment agent.
OKF does not replace the Model Context Protocol (MCP) or retrieval-augmented generation (RAG). Understanding the division is useful before deciding whether to adopt it.
MCP, initially developed by Anthropic and adopted by OpenAI in March 2025, governs how AI agents connect to tools and live data sources — the runtime plumbing. OKF governs what the agent knows about those sources before it touches them — the curated, stable knowledge layer. One way to think about the relationship: MCP is the socket; OKF is the knowledge that flows through it. An MCP server can expose an OKF bundle as a knowledge source.
RAG — retrieval-augmented generation — works differently at a deeper architectural level. A RAG system embeds documents as vectors, indexes them in a vector database, and retrieves relevant chunks at inference time using semantic search. It is well-suited to large, frequently-changing document corpora where the full body of knowledge cannot fit in a context window. OKF is suited to a different regime: stable, curated, human-maintained knowledge about how an organization's systems work, where the full bundle is small enough to load into context directly. The Karpathy LLM Wiki gist showed that for focused, stable knowledge domains, a well-maintained Markdown library can reduce token consumption by up to 95% compared to naive document loading, because the agent reads compiled, cross-linked knowledge rather than rediscovering relationships from raw documents on every query.
Read more: Stack Overflow for Agents Enters Beta: Human Reputation Anchors Machine-Speed Corpus
OKF's roots trace to April 2026, when Andrej Karpathy — co-founder of OpenAI and former Director of AI at Tesla — published a GitHub gist describing a pattern he had been using to manage his own research. The post accumulated more than 16 million views on X and the gist itself drew over 5,000 stars within days, generating implementations across the developer community under names like AGENTS.md, CLAUDE.md, and Obsidian-to-agent pipelines.
The core insight was straightforward: AI agents do not get bored, do not forget to update a cross-reference, and can touch 15 files in a single pass — exactly the maintenance burden that causes humans to abandon wikis. Giving agents a persistent, curated Markdown library to consult before taking action reduces the stateless, from-scratch-every-session quality of most agent deployments. The problem was that every team implementing this pattern built their own version, incompatible with every other.
OKF is Google Cloud's answer to that fragmentation: a minimal agreement on structure that lets a knowledge bundle produced by one tool be consumed by any other tool that understands the spec.
OKF is deliberately low-barrier for engineering and data teams already working in git and Markdown. For those teams, adopting OKF means replacing an informal internal convention with a shared naming standard — the migration path is largely a renaming exercise plus adding the required type frontmatter field to existing files.
For non-technical teams — marketing, human resources, operations — the format's dependence on Markdown and git represents a meaningful onboarding cost. Google Cloud's own positioning acknowledges this indirectly: the producer-and-consumer model anticipates that engineers and data engineers will be the primary producers, with AI agents as the primary consumers. Non-technical contributors would likely continue working in whichever frontend they prefer — Notion, Confluence, SharePoint — with OKF serving as the backend format that engineering teams maintain in parallel.
The larger adoption question is whether tooling vendors outside Google will support OKF as a native import or export format. The format's value as an interoperability standard scales with how many producers and consumers speak it. Google Cloud updated its Knowledge Catalog to ingest OKF bundles natively at launch, establishing an immediate path for enterprises already on Google Cloud. Whether other data catalog vendors follow is the open question that the v0.1 designation signals Google Cloud is treating as a community process rather than a finished outcome.
What is the Open Knowledge Format and how does it differ from RAG?
Open Knowledge Format (OKF) is an open specification from Google Cloud for packaging organizational knowledge — database schemas, metric definitions, API descriptions, runbooks — as a directory of Markdown files with YAML frontmatter. Unlike RAG, which retrieves relevant chunks from a large unstructured corpus at inference time using vector search, OKF is a curated, stable, pre-compiled knowledge layer that agents can load into context directly. RAG handles large, dynamic document archives. OKF handles the stable, human-maintained knowledge about how an organization's systems work. The two are complementary: OKF gives RAG systems cleaner, structured source material and adds relationship context that raw document chunks do not carry.
Does OKF replace the Model Context Protocol?
No. MCP governs how AI agents access tools and live data sources at runtime — the connection layer. OKF governs what agents know about those sources — the curated knowledge layer. They solve adjacent problems and are designed to work together: an MCP server can expose an OKF bundle as a knowledge source, and an agent can pull from it to ground its decisions before taking action.
How does OKF differ from earlier knowledge representation standards like OWL or RDF?
OWL and RDF are far more expressive than OKF, with support for typed relationships, formal axioms, and automated reasoning. They also require schema registries, specialized tooling, and significant expertise to implement, which limited their enterprise adoption to organizations with dedicated ontology engineering resources. OKF reaches the opposite conclusion: the only required field is type, there is no central authority, and the format is deliberately minimal so that any team with a text editor and a git repository can implement it without specialized infrastructure. The design bet is that broad, low-friction adoption of a simpler format produces more organizational value than narrow, high-friction adoption of a richer one.
Is OKF tied to Google Cloud or BigQuery?
No. The specification, sample bundles, and reference implementations are published on GitHub under the Apache 2.0 license and are not tied to any cloud provider, database, model provider, or agent framework. The reference enrichment agent uses BigQuery as its data source and Gemini as its model backend, but those choices demonstrate one implementation path — the format itself works with any source. Google Cloud updated its Knowledge Catalog to ingest OKF at launch, but the spec is explicitly vendor-neutral and can be implemented on any infrastructure.
