
Credit: Getty Images
Since last year’s disastrous rollout of Google’s AI Overviews, the world at large has been aware of how AI-powered search results can differ wildly from the traditional list of links search engines have generated for decades. Now, new research helps quantify that difference, showing that AI search engines tend to cite less popular websites and ones that wouldn’t even appear in the Top 100 links listed in an “organic” Google search.
In the pre-print paper “Characterizing Web Search in The Age of Generative AI,” researchers from Ruhr University in Bochum, Germany, and the Max Planck Institute for Software Systems compared traditional link results from Google’s search engine to its AI Overviews and Gemini-2.5-Flash. The researchers also looked at GPT-4o’s web search mode and the separate “GPT-4o with Search Tool,” which resorts to searching the web only when the LLM decides it needs information found outside its own pre-trained data.
The researchers drew test queries from a number of sources, including specific questions submitted to ChatGPT in the WildChat dataset, general political topics listed on AllSides, and products included in the 100 most-searched Amazon products list.
Overall, the sources cited in results from the generative search tools tended to be from sites that were less popular than those that appeared in the top 10 of a traditional search, as measured by the domain-tracker Tranco. Sources cited by the AI engines were more likely than those linked in traditional Google searches to fall outside both the top 1,000 and top 1,000,000 domains tracked by Tranco. Gemini search in particular showed a tendency to cite unpopular domains, with the median source falling outside Tranco’s top 1,000 across all results.
A majority of the cited AI Overview sources don’t appear in the top 10 Google link results for the same query.
Credit: Kirsten et al
The sources cited by the AI-powered search engines also tended to be ones that wouldn’t appear anywhere near the top results for the same organic Google search. Fifty-three percent of the sources cited by Google’s AI Overviews, for instance, didn’t appear in the top 10 Google links for the same query, and 40 percent of those sources didn’t even fall in the top 100 Google links.
These differences don’t necessarily mean the AI-generated results are “worse,” of course. The researchers found that GPT-based searches were more likely to cite sources like corporate entities and encyclopedias for their information, for instance, while almost never citing social media websites.
An LLM-based analysis tool found that AI-powered search results also tended to cover a similar number of identifiable “concepts” as the traditional top 10 links, suggesting a similar level of detail, diversity, and novelty in the results. At the same time, the researchers found that “generative engines tend to compress information, sometimes omitting secondary or ambiguous aspects that traditional search retains.” That was especially true for more ambiguous search terms (such as names shared by different people), for which “organic search results provide better coverage,” the researchers found.
Google Gemini search in particular was more likely to cite low-popularity domains.
Credit: Kirsten et al
The AI search engines also arguably have an advantage in being able to weave pre-trained “internal knowledge” in with data culled from cited websites. That was especially true for GPT-4o with Search Tool, which often didn’t cite any web sources and simply provided a direct response based on its training.
But this reliance on pre-trained data can become a limitation when searching for timely information. For search terms pulled from Google’s list of Trending Queries for September 15, the researchers found GPT-4o with Search Tool often responded with messages along the lines of “could you please provide more information” rather than actually searching the web for up-to-date information.
While the researchers didn’t determine whether AI-based search engines were overall “better” or “worse” than traditional search engine links, they did urge future research on “new evaluation methods that jointly consider source diversity, conceptual coverage, and synthesis behavior in generative search systems.”
