Credit: Boris Zhitkov via Getty Images
When is an AI system intelligent enough to be called artificial general intelligence (AGI)? According to one definition reportedly agreed upon by Microsoft and OpenAI, the answer lies in economics: When AI generates $100 billion in profits. This arbitrary profit-based benchmark for AGI perfectly captures the definitional chaos plaguing the AI industry.
In fact, it may be impossible to create a universal definition of AGI, but few people with money on the line will admit it.
Over this past year, several high-profile people in the tech industry have been heralding the seemingly imminent arrival of "AGI" (i.e., within the next two years). But there's a huge problem: Few people agree on exactly what AGI means. As Google DeepMind wrote in a paper on the topic: If you ask 100 AI experts to define AGI, you'll get "100 related but different definitions."
This isn't just academic navel-gazing. The definition problem has real consequences for how we develop, regulate, and think about AI systems. When companies claim they're on the verge of AGI, what exactly are they claiming?
I tend to define AGI in a traditional way that hearkens back to the "general" part of its name: An AI model that can widely generalize—applying concepts to novel scenarios—and match the versatile human capability to perform unfamiliar tasks across many domains without needing to be specifically trained for them.
However, this definition immediately runs into thorny questions about what exactly constitutes "human-level" performance. Expert-level humans? Average humans? And across which tasks—should an AGI be able to perform surgery, write poetry, fix a car engine, and prove mathematical theorems, all at the level of human specialists? (Which human can do all that?) More fundamentally, the focus on human parity is itself an assumption; it's worth asking why mimicking human intelligence is the necessary yardstick at all.
The latest example of this definitional confusion causing trouble comes from the deteriorating relationship between Microsoft and OpenAI. According to The Wall Street Journal, the two companies are now locked in acrimonious negotiations partly because they can't agree on what AGI even means—despite having baked the term into a contract worth over $13 billion.
The term artificial general intelligence has murky origins. While John McCarthy and colleagues coined the term artificial intelligence at Dartmouth College in 1956, AGI emerged much later. Physicist Mark Gubrud first used the term in 1997, though it was computer scientist Shane Legg and AI researcher Ben Goertzel who independently reintroduced it around 2002, with the modern usage popularized by a 2007 book edited by Goertzel and Cassio Pennachin.
Early AI researchers envisioned systems that could match human capability across all domains. In 1965, AI pioneer Herbert A. Simon predicted that "machines will be capable, within 20 years, of doing any work a man can do." But as robotics lagged behind computing advances, the definition narrowed. The goalposts shifted, partly as a practical response to this uneven progress, from "do everything a human can do" to "do most economically valuable tasks" to today's even fuzzier standards.
"An assistant of inventor Captain Richards works on the robot the Captain has invented, which speaks, answers questions, shakes hands, tells the time, and sits down when it's told to." - September 1928.
Credit: Getty Images
For decades, the Turing Test served as the de facto benchmark for machine intelligence. If a computer could fool a human judge into thinking it was human through text conversation, the test surmised, then it had achieved something like human intelligence. But the Turing Test has shown its age. Modern language models can pass some limited versions of the test not because they "think" like humans, but because they're exceptionally capable at creating highly plausible human-sounding outputs.
The current landscape of AGI definitions reveals just how fractured the concept has become. OpenAI's charter defines AGI as "highly autonomous systems that outperform humans at most economically valuable work"—a definition that, like the profit metric, relies on economic progress as a substitute for measuring cognition in a concrete way. Mark Zuckerberg told The Verge that he does not have a "one-sentence, pithy definition" of the concept. OpenAI CEO Sam Altman believes that his company now knows how to build AGI "as we have traditionally understood it." Meanwhile, former OpenAI Chief Scientist Ilya Sutskever reportedly treated AGI as something almost mystical—according to a 2023 Atlantic report, he would lead employees in chants of "Feel the AGI!" during company meetings, treating the concept more like a spiritual quest than a technical milestone.
Dario Amodei, co-founder and chief executive officer of Anthropic, during the Bloomberg Technology Summit in San Francisco on Thursday, May 9, 2024.
Credit: Bloomberg via Getty Images
Dario Amodei, CEO of Anthropic, takes an even more skeptical stance on the terminology itself. In his October 2024 essay "Machines of Loving Grace," Amodei writes that he finds "AGI to be an imprecise term that has gathered a lot of sci-fi baggage and hype." Instead, he prefers terms like "powerful AI" or "Expert-Level Science and Engineering," which he argues better capture the capabilities without the associated hype. When Amodei describes what others might call AGI, he frames it as an AI system "smarter than a Nobel Prize winner across most relevant fields" that can work autonomously on tasks taking hours, days, or weeks to complete—essentially "a country of geniuses in a data center." His resistance to AGI terminology adds another layer to the definitional chaos: Not only do we not agree on what AGI means, but some leading AI developers reject the term entirely.
Perhaps the most systematic attempt to bring order to this chaos comes from Google DeepMind, which in July 2024 proposed a framework with five levels of AGI performance: emerging, competent, expert, virtuoso, and superhuman. DeepMind researchers argued that no level beyond "emerging AGI" existed at that time. Under their system, today's most capable LLMs and simulated reasoning models still qualify as "emerging AGI"—equal to or somewhat better than an unskilled human at various tasks.
But this framework has its critics. Heidy Khlaaf, chief AI scientist at the nonprofit AI Now Institute, told TechCrunch that she thinks the concept of AGI is too ill-defined to be "rigorously evaluated scientifically." In fact, with so many varied definitions at play, one could argue that the term AGI has become technically meaningless.
The Microsoft-OpenAI dispute illustrates what happens when philosophical speculation is turned into legal obligations. When the companies signed their partnership agreement, they included a clause stating that when OpenAI achieves AGI, it can limit Microsoft's access to future technology. According to The Wall Street Journal, OpenAI executives believe they're close to declaring AGI, while Microsoft CEO Satya Nadella has called the idea of using AGI as a self-proclaimed milestone "nonsensical benchmark hacking" on the Dwarkesh Patel podcast in February.
The reported $100 billion profit threshold we mentioned earlier conflates commercial success with cognitive capability, as if a system's ability to generate revenue says anything meaningful about whether it can "think," "reason," or "understand" the world like a human.
Sam Altman speaks onstage during The New York Times Dealbook Summit 2024 at Jazz at Lincoln Center on December 4, 2024, in New York City.
Credit: Eugene Gologursky via Getty Images
Depending on your definition, we may already have AGI, or it may be physically impossible to achieve. If you define AGI as "AI that performs better than most humans at most tasks," then current language models potentially meet that bar for certain types of work (which tasks, which humans, what is "better"?), but agreement on whether that is true is far from universal. This says nothing of the even murkier concept of "superintelligence"—another nebulous term for a hypothetical, god-like intellect so far beyond human cognition that, like AGI, defies any solid definition or benchmark.
Given this definitional chaos, researchers have tried to create objective benchmarks to measure progress toward AGI, but these attempts have revealed their own set of problems.
The search for better AGI benchmarks has produced some interesting alternatives to the Turing Test. The Abstraction and Reasoning Corpus (ARC-AGI), introduced in 2019 by François Chollet, tests whether AI systems can solve novel visual puzzles that require deep and novel analytical reasoning.
"Almost all current AI benchmarks can be solved purely via memorization," Chollet told Freethink in August 2024. A major problem with AI benchmarks currently stems from data contamination—when test questions end up in training data, models can appear to perform well without truly "understanding" the underlying concepts. Large language models serve as master imitators, mimicking patterns found in training data, but not always originating novel solutions to problems.
But even sophisticated benchmarks like ARC-AGI face a fundamental problem: They're still trying to reduce intelligence to a score. And while improved benchmarks are essential for measuring empirical progress in a scientific framework, intelligence isn't a single thing you can measure like height or weight—it's a complex constellation of abilities that manifest differently in different contexts. Indeed, we don't even have a complete functional definition of human intelligence, so defining artificial intelligence by any single benchmark score is likely to capture only a small part of the complete picture.
There is no doubt that the field of AI has seen rapid, tangible progress in numerous fields, including computer vision, protein folding, and translation. Some excitement of progress is justified, but it's important not to oversell an AI model's capabilities prematurely.
Despite the hype from some in the industry, many AI researchers remain skeptical that AGI is just around the corner. A March 2025 survey of AI researchers conducted by the Association for the Advancement of Artificial Intelligence (AAAI) found that a majority (76 percent) of researchers who participated in the survey believed that scaling up current approaches is "unlikely" or "very unlikely" to achieve AGI.
However, such expert predictions should be taken with a grain of salt, as researchers have consistently been surprised by the rapid pace of AI capability advancement. A 2024 survey by Grace et al. of 2,778 AI researchers found that experts had dramatically shortened their timelines for AI milestones after being surprised by progress in 2022–2023. The median forecast for when AI could outperform humans in every possible task jumped forward by 13 years, from 2060 in their 2022 survey to 2047 in 2023. This pattern of underestimation was evident across multiple benchmarks, with many researchers' predictions about AI capabilities being proven wrong within months.
And yet, as the tech landscape shifts, the AI goalposts continue to recede at a constant speed. Recently, as more studies continue to reveal limitations in simulated reasoning models, some experts in the industry have been slowly backing away from claims of imminent AGI. For example, AI podcast host Dwarkesh Patel recently published a blog post arguing that developing AGI still faces major bottlenecks, particularly in continual learning, and predicted we're still seven years away from AI that can learn on the job as seamlessly as humans.
The disconnect we've seen above between researcher consensus, firm terminology definitions, and corporate rhetoric has a real impact. When policymakers act as if AGI is imminent based on hype rather than scientific evidence, they risk making decisions that don't match reality. When companies write contracts around undefined terms, they may create legal time bombs.
The definitional chaos around AGI isn't just philosophical hand-wringing. Companies use promises of impending AGI to attract investment, talent, and customers. Governments craft policy based on AGI timelines. The public forms potentially unrealistic expectations about AI's impact on jobs and society based on these fuzzy concepts.
Without clear definitions, we can't have meaningful conversations about AI misapplications, regulation, or development priorities. We end up talking past each other, with optimists and pessimists using the same words to mean fundamentally different things.
In the face of this kind of challenge, some may be tempted to give up on formal definitions entirely, falling back on an "I'll know it when I see it" approach for AGI—echoing Supreme Court Justice Potter Stewart's famous quote about obscenity. This subjective standard might feel useful, but it's useless for contracts, regulation, or scientific progress.
Perhaps it's time to move beyond the term AGI. Instead of chasing an ill-defined goal that keeps receding into the future, we could focus on specific capabilities: Can this system learn new tasks without extensive retraining? Can it explain its outputs? Can it produce safe outputs that don't harm or mislead people? These questions tell us more about AI progress than any amount of AGI speculation. The most useful way forward may be to think of progress in AI as a multidimensional spectrum without a specific threshold of achievement. But charting that spectrum will demand new benchmarks that don’t yet exist—and a firm, empirical definition of "intelligence" that remains elusive.