Google Cloud Open Knowledge Format Turns Scattered Org Knowledge Into Agent-Ready Bundles - AI

7 x 24 Track global technological trends

Hot Topic

Day

News Topic

Google Cloud Open Knowledge Format Turns Scattered Org Knowledge Into Agent-Ready Bundles

3 hour ago / Read about 37 minute

Source：TechTimes

The Google Cloud logo is pictured at the MWC (Mobile World Congress), the world's biggest mobile fair, in Barcelona on March 4, 2025. Surrounded by investment and innovation projects, the Mobile World Congress (MWC) kicks off today in Barcelona amid a context of euphoria but also tensions over artificial intelligence (AI), whose rapid advancement is shaking up the tech sector. Manaure Quintero/Getty Images

Three days after Google Cloud published the Open Knowledge Format on June 12, 2026, engineering and data teams building agentic AI systems now have something they have lacked since agents became a realistic enterprise tool: a single, vendor-neutral specification that packages what an organization knows into a form any AI agent can read, without custom integrations, proprietary SDKs, or platform lock-in.

That specification — Open Knowledge Format v0.1, or OKF — is available on GitHub under the Apache 2.0 license and was released by Google Cloud tech leads Sam McVeety and Amir Hormati alongside three sample knowledge bundles and two working reference implementations. The practical stakes are immediate: any team that has been building its own bespoke "AGENTS.md," "CLAUDE.md," or Obsidian vault pipeline can now adopt a shared structure instead, and any tool that adds OKF support becomes readable by every other tool that understands the spec.

Every AI Agent Team Is Solving the Same Problem from Scratch

The problem OKF addresses has a name inside Google Cloud: the context-assembly problem. Before an AI agent can answer a question like "How do we compute weekly active users from our event stream?" it has to retrieve fragments of context from wherever that knowledge lives — a metadata catalog API, a shared drive, code comments in a repository, a Confluence wiki, or the head of a senior data engineer who has been at the company for five years.

Every organization that has deployed agentic AI workflows has hit this wall. The information exists. It is just scattered across incompatible surfaces, and no two surfaces speak the same format. The result, as Google Cloud puts it, is that every agent builder solves the same assembly problem from scratch, every catalog vendor reinvents the same data models, and the knowledge stays locked behind whichever surface produced it. OKF's answer is not a new service, database, or runtime. It is a format.

What OKF Actually Specifies

An OKF bundle is a directory tree of plain Markdown files. Each file represents one concept — a single unit of knowledge that might describe a database table, an API endpoint, a business metric, a runbook, or an incident playbook. The concept's identity is its file path within the bundle: a file at tables/orders.md carries the concept ID tables/orders.

Every concept document has two parts. At the top of the file sits a YAML frontmatter block, delimited by --- markers, that carries a small set of structured fields agents can query without parsing the full document. Below that is a standard Markdown body for free-form content — schema descriptions, business context, caveats, SQL examples, whatever the author wants to write.

Only one frontmatter field is required: type, a short string that identifies what kind of concept the document describes. Google's examples include BigQuery Table, API Endpoint, Metric, and Playbook. The optional recommended fields — title, description, resource, tags, and timestamp — provide richer structure but are never mandatory. Consumers must tolerate unknown fields and unknown type values without rejecting the document, a design choice that makes the format genuinely tolerant of early-stage and partial implementations.

Concepts link to each other using standard Markdown hyperlinks. Those links turn the bundle's flat directory into a graph of relationships whose meaning is conveyed by the surrounding prose rather than by a typed edge schema — a table concept might link to a metric concept that uses it, and the sentence explaining that link is what tells an agent the nature of the relationship. The full OKF specification is available on GitHub.

Two reserved filenames carry defined meaning anywhere in the directory hierarchy. An index.md file enumerates what is available in its directory, enabling progressive disclosure so an agent can survey a bundle before reading individual documents. A log.md file records changes to that scope in date-grouped entries, newest first — a lightweight changelog that travels with the knowledge rather than living in a separate ticketing system.

Why Less Schema Beats More: OKF vs. Prior Knowledge Standards

OKF's minimal design is a deliberate inversion of the approach taken by earlier formal knowledge representation standards. The W3C's Web Ontology Language (OWL) and Resource Description Framework (RDF) — developed in the early 2000s — offered far more expressive power: typed relationships, formal axioms, automated reasoning support. They also required schema registries, specialized tooling, and significant expertise to implement. Enterprise adoption remained limited to organizations with dedicated ontology engineering resources.

OKF reaches the opposite conclusion. Its specification states that knowledge should be "readable by humans without tooling, parseable by agents without bespoke SDKs, and diffable in version control." There is no central schema registry. There is no required tooling. The specification fits on a single page. If you can open a Markdown file in a text editor, you can read OKF. If you can run git clone, you can ship it.

This is not a limitation — it is the architecture. A format that requires no infrastructure to read has no adoption barrier. A format that lives in git inherits all of git's versioning, access control, and collaboration properties for free. A format that is plain text can be consumed by any agent framework without a purpose-built connector.

The Two-Pass Enrichment Architecture

Google shipped two reference implementations alongside the specification to demonstrate how OKF bundles are produced and consumed in practice.

The producer is a reference enrichment agent that turns a BigQuery dataset into a conformant OKF bundle through a two-pass process. In the first pass, the agent walks every table and view in the dataset and drafts a concept document for each one, using BigQuery's own metadata — column names, data types, descriptions — as the raw material. The result is a structurally valid but content-sparse bundle.

In the second pass, the agent crawls authoritative web documentation related to each concept and uses a language model to enrich each document with citations, join paths between related tables, schema context, and business-layer explanation. The enrichment agent is built on Google's Agent Development Kit (ADK) using Gemini as its model backend. A configurable --web-max-pages cap and a --web-allowed-host filter limit how much of the web each enrichment pass can reach, preventing runaway token consumption.

The consumer is a static HTML visualizer that renders any OKF bundle as an interactive graph view — a single self-contained HTML file that requires no backend, no installation, and no data leaving the reader's browser. The visualizer treats every Markdown link between concept documents as a directed edge, producing a navigable relationship map of the organization's knowledge.

Three ready-to-browse sample bundles are available in the GitHub repository: GA4 e-commerce data, the Stack Overflow public dataset, and the Bitcoin public dataset — all produced by the reference enrichment agent.

OKF, MCP, and RAG: Three Tools for Different Problems

OKF does not replace the Model Context Protocol (MCP) or retrieval-augmented generation (RAG). Understanding the division is useful before deciding whether to adopt it.

MCP, initially developed by Anthropic and adopted by OpenAI in March 2025, governs how AI agents connect to tools and live data sources — the runtime plumbing. OKF governs what the agent knows about those sources before it touches them — the curated, stable knowledge layer. One way to think about the relationship: MCP is the socket; OKF is the knowledge that flows through it. An MCP server can expose an OKF bundle as a knowledge source.

RAG — retrieval-augmented generation — works differently at a deeper architectural level. A RAG system embeds documents as vectors, indexes them in a vector database, and retrieves relevant chunks at inference time using semantic search. It is well-suited to large, frequently-changing document corpora where the full body of knowledge cannot fit in a context window. OKF is suited to a different regime: stable, curated, human-maintained knowledge about how an organization's systems work, where the full bundle is small enough to load into context directly. The Karpathy LLM Wiki gist showed that for focused, stable knowledge domains, a well-maintained Markdown library can reduce token consumption by up to 95% compared to naive document loading, because the agent reads compiled, cross-linked knowledge rather than rediscovering relationships from raw documents on every query.

Read more: Stack Overflow for Agents Enters Beta: Human Reputation Anchors Machine-Speed Corpus

The Karpathy Pattern That Sparked a Standard

OKF's roots trace to April 2026, when Andrej Karpathy — co-founder of OpenAI and former Director of AI at Tesla — published a GitHub gist describing a pattern he had been using to manage his own research. The post accumulated more than 16 million views on X and the gist itself drew over 5,000 stars within days, generating implementations across the developer community under names like AGENTS.md, CLAUDE.md, and Obsidian-to-agent pipelines.

The core insight was straightforward: AI agents do not get bored, do not forget to update a cross-reference, and can touch 15 files in a single pass — exactly the maintenance burden that causes humans to abandon wikis. Giving agents a persistent, curated Markdown library to consult before taking action reduces the stateless, from-scratch-every-session quality of most agent deployments. The problem was that every team implementing this pattern built their own version, incompatible with every other.

OKF is Google Cloud's answer to that fragmentation: a minimal agreement on structure that lets a knowledge bundle produced by one tool be consumed by any other tool that understands the spec.

What Adoption Actually Requires

OKF is deliberately low-barrier for engineering and data teams already working in git and Markdown. For those teams, adopting OKF means replacing an informal internal convention with a shared naming standard — the migration path is largely a renaming exercise plus adding the required type frontmatter field to existing files.

For non-technical teams — marketing, human resources, operations — the format's dependence on Markdown and git represents a meaningful onboarding cost. Google Cloud's own positioning acknowledges this indirectly: the producer-and-consumer model anticipates that engineers and data engineers will be the primary producers, with AI agents as the primary consumers. Non-technical contributors would likely continue working in whichever frontend they prefer — Notion, Confluence, SharePoint — with OKF serving as the backend format that engineering teams maintain in parallel.

The larger adoption question is whether tooling vendors outside Google will support OKF as a native import or export format. The format's value as an interoperability standard scales with how many producers and consumers speak it. Google Cloud updated its Knowledge Catalog to ingest OKF bundles natively at launch, establishing an immediate path for enterprises already on Google Cloud. Whether other data catalog vendors follow is the open question that the v0.1 designation signals Google Cloud is treating as a community process rather than a finished outcome.

Frequently Asked Questions

What is the Open Knowledge Format and how does it differ from RAG?

Open Knowledge Format (OKF) is an open specification from Google Cloud for packaging organizational knowledge — database schemas, metric definitions, API descriptions, runbooks — as a directory of Markdown files with YAML frontmatter. Unlike RAG, which retrieves relevant chunks from a large unstructured corpus at inference time using vector search, OKF is a curated, stable, pre-compiled knowledge layer that agents can load into context directly. RAG handles large, dynamic document archives. OKF handles the stable, human-maintained knowledge about how an organization's systems work. The two are complementary: OKF gives RAG systems cleaner, structured source material and adds relationship context that raw document chunks do not carry.

Does OKF replace the Model Context Protocol?

No. MCP governs how AI agents access tools and live data sources at runtime — the connection layer. OKF governs what agents know about those sources — the curated knowledge layer. They solve adjacent problems and are designed to work together: an MCP server can expose an OKF bundle as a knowledge source, and an agent can pull from it to ground its decisions before taking action.

How does OKF differ from earlier knowledge representation standards like OWL or RDF?

OWL and RDF are far more expressive than OKF, with support for typed relationships, formal axioms, and automated reasoning. They also require schema registries, specialized tooling, and significant expertise to implement, which limited their enterprise adoption to organizations with dedicated ontology engineering resources. OKF reaches the opposite conclusion: the only required field is type, there is no central authority, and the format is deliberately minimal so that any team with a text editor and a git repository can implement it without specialized infrastructure. The design bet is that broad, low-friction adoption of a simpler format produces more organizational value than narrow, high-friction adoption of a richer one.

Is OKF tied to Google Cloud or BigQuery?

No. The specification, sample bundles, and reference implementations are published on GitHub under the Apache 2.0 license and are not tied to any cloud provider, database, model provider, or agent framework. The reference enrichment agent uses BigQuery as its data source and Gemini as its model backend, but those choices demonstrate one implementation path — the format itself works with any source. Google Cloud updated its Knowledge Catalog to ingest OKF at launch, but the spec is explicitly vendor-neutral and can be implemented on any infrastructure.

Previous page：Microsoft Mirage Fixes AI Video World Model Drift ...

Next page：AI Agent Orchestration Gets a Control Plane: Datab...

Return to List

Hot Reading

2 day ago

Google DeepMind Maps the Road From AGI to Superintelligence: Four Paths and Hard Limits

2 day ago

Meta reportedly moves to unwind $2B Manus deal after Beijing’s demand

1 day ago

OpenAI Retires GPT-5.2 and Moves Everyone to GPT-5.5: What Changes for ChatGPT Users and Developers

2 day ago

InfoComm 2026 Opens Today in Las Vegas: Agentic AI and IPMX Drive 750-Exhibitor Pro AV Show

2 day ago

Anthropic shuts down Fable, Mythos models following Trump admin directive

2 day ago

Federal AI Preemption Talks: OpenAI Subpoena Shows What States Could Lose

3 day ago

AI Hacking Agents Reach 69.3% in New Test, Exposing a Growing Security Automation Risk

2 day ago

Anthropic’s safety warnings may have just backfired — the government has pulled the plug on its most powerful AI

2 day ago

Quantum Error Correction Validated in Nature: Microsoft and Quantinuum Log 800-Fold Improvement

2 day ago

Alice & Bob Ships Helium: First On-Premise Cat-Qubit System Claims 18-Qubit Logical Encoding

Previous page：Microsoft Mirage Fixes AI Video World Model Drift ...

Next page：AI Agent Orchestration Gets a Control Plane: Datab...

C114 Communication Network
Communication Home

7 X 24 Track global technological trends

Find

News Topic

Hot Topic

7 x 24 Track global technological trends

News Flash

News Topic

AI
/
Devices
/
Smart Car
/
Chip
/
Cloud

C114 Communication Network

Communication Home