
codex record & replay openai.com
OpenAI has added a feature to its Codex macOS app that changes the barrier to AI-powered automation: instead of writing a prompt or configuring a workflow, a user performs a task while Codex watches, and the agent converts that demonstration into a reusable skill it can execute autonomously on demand. The feature, called Record & Replay, shipped on June 18 as part of Codex app version 26.616, and is available now to ChatGPT Plus, Pro, Business, Enterprise, and Edu subscribers outside the European Economic Area (EEA), the United Kingdom, and Switzerland.
The practical implication is wider than it might appear from the feature description. Record & Replay is an AI-native implementation of a research paradigm called programming by demonstration — a technique that computer scientists have studied since the mid-1980s but that has historically failed to reach general use because the generalization step, inferring a reusable program from a concrete demonstration, proved too brittle for real-world workflows. What OpenAI has done is route that generalization through its language model rather than through the rule-based heuristics that defeated prior systems. That is a meaningful architectural distinction, and it is what separates Record & Replay from the macro-recording tools that RPA vendors have offered for years.
The feature works through a sequence that begins with the user, not with a prompt. Opening the Plugins panel in the Codex app and selecting "Record a skill" starts a recording session. Codex asks for permission to observe actions and window content, then waits while the user performs the workflow — navigating to a site, filling fields, uploading files, or whatever the task requires. When the user stops the recording, Codex inspects the captured sequence and drafts a SKILL.md file.
That file is the artifact that makes the system work differently from what came before it. A SKILL.md file is a human-readable, LLM-interpretable instruction document — not a coordinate-level recording of pixel positions or mouse clicks. It specifies when to use the workflow, what variable inputs it accepts on each run, what steps to follow, and how to verify that the task completed successfully. It is inspectable and editable, meaning a user who wants to refine what Codex learned — to surface a hidden preference, fix a decision point, or change a naming convention — can open the file and adjust it directly. The skill can also be shared across a team, so a workflow that one employee records can be deployed as automation for an entire department without requiring anyone else to re-teach it.
On replay, Codex loads the SKILL.md as reusable context and executes the workflow using whichever tools are available in the current environment: Computer Use for desktop application control, browser actions for web-based tasks, and installed plugins for integrations with services like Slack, Gmail, Notion, or Salesforce.
Robotic Process Automation tools like UiPath and Automation Anywhere have offered record-and-replay functionality for years. The fundamental difference in Codex's approach is the layer at which abstraction happens. Traditional RPA records coordinates — which pixel to click, which field to tab to, which UI element carries a specific CSS class. When the interface changes, even slightly, the recording breaks.
Codex records intent. The SKILL.md it generates describes what the user is trying to accomplish at each step, not the exact input sequence they used to accomplish it. On replay, Codex's language model interprets those descriptions in the context of the current screen state, giving it a degree of adaptability that coordinate-level recording does not have.
This is the technical resolution to a problem that researchers flagged in the programming-by-demonstration literature as early as 2009: that demonstration-based systems fail because automatic generalization may not capture the user's intent, and because the systems are brittle when the graphical user interface changes. By generating a natural-language skill description rather than a mechanical replay log, Codex shifts the generalization work from a rule-based system to a reasoning model. The practical effect is a system that can execute a workflow against a fresh instance of an application, with new input values, rather than one that replays a fixed sequence of clicks that was valid only on the day the recording was made.
The architectural shift does not eliminate the reliability challenge that affects all computer-use agents. Research published in the Stanford 2026 AI Index Report documents that AI agents achieved an average 66% task success rate on the OSWorld benchmark — the most widely used real-world computer-task evaluation — compared with 12% a year earlier. That improvement is significant. A 34% failure rate in unsupervised production automation is also significant.
The compound failure problem makes this more acute for multi-step workflows. Analysis by engineering infrastructure company Temporal shows that even an agent achieving 85% reliability on each individual step of a 10-step workflow would succeed end-to-end only about 20% of the time. Record & Replay's SKILL.md architecture improves adaptability compared with coordinate-level macros, but it does not guarantee that the reasoning model will complete every step of a complex workflow without error. OpenAI's own documentation notes the feature works best when "the steps are stable and the success criteria are clear" — a useful signal about where the system currently holds up and where it does not.
The implication for anyone evaluating Record & Replay for production use is a choice between two automation profiles. Tasks that are repetitive, well-bounded, and run against stable interfaces — filing expense reports, downloading recurring data exports, publishing videos to a consistent upload platform — are strong candidates. Tasks that involve variable layouts, error-handling logic, or judgment calls are not yet suitable for fully unsupervised execution.
Record & Replay requires Computer Use to be enabled in the Codex app settings — either by the user on a personal account or by an administrator through a requirements.toml configuration file in enterprise deployments. The [features].computer_use flag controls both Computer Use and Record & Replay together; setting it to false disables both.
The feature is currently available on macOS only. It is not available in the European Economic Area, the United Kingdom, or Switzerland — a restriction that applies to the initial release, with no stated timeline for expansion to those regions. This is notable because Computer Use itself became available to EEA, UK, and Swiss users on June 16 — two days before Record & Replay shipped — but that expansion did not extend to Record & Replay. OpenAI has not publicly explained the distinction. The EU AI Act's Article 50 transparency obligations, which explicitly cover agentic AI systems, are scheduled to take effect on August 2, 2026, and a pattern of staged geographic rollouts across Codex's most autonomous features suggests ongoing compliance work in those jurisdictions, though OpenAI has not confirmed this.
The Codex app is free to download on macOS. Record & Replay requires a paid ChatGPT subscription — Plus at $20 per month, Pro at $200 per month, or Business, Enterprise, or Edu tier access.
Read more: OpenAI Codex Reaches Europe With Computer Use and Memories, GPT-4.5 Exits ChatGPT in 9 Days
The launch represents a shift in how AI automation is being distributed and taught. Earlier iterations of agent-based automation — including Codex's own prompt-based skill authoring — required users to describe workflows in text. That authoring step is a significant friction point for non-technical users, who often cannot describe their own processes with the precision a language model requires to execute them reliably.
Record & Replay inverts that requirement. The skill is authored by the model from observation, not by the user from description. A less technical user who knows how to do a task but cannot describe it precisely to an AI now has a path to automation that bypasses the description step entirely. The recording captures tacit knowledge — preferences, sequencing habits, decision points — that survives the translation from demonstration to editable file, even when it would not survive the translation to a written prompt.
The team-sharing capability extends this further. A single demonstrated workflow can propagate as a skill across an entire organization. One Codex skill recording a correctly configured Jira issue, a standard expense report submission, or a weekly analytics pull becomes automation available to anyone in the team with access to Codex — without requiring each person to re-teach the workflow individually.
Read more: OpenAI Codex Becomes Desktop Agent: Controls Mac Apps, Watches Screen, Runs on Mobile
What is Codex Record and Replay and how does it work?
Record & Replay is a feature in the OpenAI Codex macOS app that lets a user perform a workflow once while the AI observes, then converts that demonstration into a reusable automation called a skill. During recording, Codex captures actions and window content. After the user stops recording, Codex's language model inspects the sequence and drafts a SKILL.md file — a human-readable instruction document specifying the workflow's steps, variable inputs, and success criteria. On future runs, Codex uses that file as its instruction context and executes the workflow using Computer Use, browser actions, and plugins.
How is Codex Record and Replay different from traditional RPA recording tools?
Traditional robotic process automation tools such as UiPath and Automation Anywhere capture pixel coordinates and UI element identifiers — recordings that break when an interface changes. Codex generates a natural-language SKILL.md description of what the workflow is trying to accomplish, which its language model then interprets against the current screen state on replay. This makes Codex's approach semantically adaptive rather than coordinate-literal, though it does not eliminate failure risk on complex or variable workflows.
Why is Record and Replay not available in Europe?
Record & Replay is not currently available in the European Economic Area, the United Kingdom, or Switzerland at launch. OpenAI has not publicly explained the reason or given a timeline for availability in those regions. The EU AI Act's Article 50 obligations, which cover agentic AI systems, are scheduled to take effect August 2, 2026, and a pattern of staged geographic rollouts across Codex's most autonomous features suggests ongoing regulatory compliance work in those jurisdictions.
What types of workflows is Codex Record and Replay best suited for?
OpenAI's own documentation states the feature works best when the steps are stable and the success criteria are clear. Workflows that are repetitive, well-bounded, and run against stable interfaces — such as filing expense reports, publishing videos, downloading recurring reports, or creating consistently configured tickets — are the strongest candidates. Workflows involving dynamic layouts, error-handling judgment, or multi-step sequences with high variability carry a higher risk of incomplete execution given current AI agent reliability benchmarks.
