This study is the result of a joint effort between Rich Sanger and Authoritas.

Understanding how Google AI Overviews source links is necessary for establishing visibility for your content. This year, multiple studies have taken the first step in identifying how links in AI Overviews correlate with existing search results.

After reviewing these studies and having analyzed Google’s patent underlying the AI Overviews, I noticed a glaring piece missing. The AI Overview system does not solely rely on documents responsive to the entered query. This significant omission inspired me to write an article highlighting this oversight and putting forth my understanding of how the system underlying Google AI Overviews works.

My article critiquing Google AI Overview ranking studies received a completely unexpected amount of attention across the SEO community and beyond. Admittedly, I was a bit apprehensive about hitting that publish button. It’s not comfortable for me to criticize what other people do, especially in a public manner. I know how much work it takes to complete this type of research, write it up, and share with the public. At the same time, I strongly believe in the spirit of scientific inquiry where constructive criticism is essential for progress and advancement.

Laurence O’Toole, CEO of Authoritas, who published one of the studies I critiqued, reached out to me after I published. Not only did he welcome the criticism, he said we should collaborate! I was thrilled to learn that Laurence was as interested in exploring AI Overviews as I was, and he had already begun to investigate related queries.

We have collected a broad range of data on various topics related to AI Overviews. Over the coming weeks and months, we will be sharing our findings, including upcoming studies about how AI Overviews vary across intent and industries. What you have here is Laurence’s and my first collaboration. Below, we aimed to quantify how many of the links in an AI Overview can be accounted for by including top search results responsive to related queries.

Based on our research, we offer several actionable recommendations for enhancing visibility in AI Overviews. We hope you find this informative and helpful in your understanding of Google’s AI Overviews.

Background

Google AI Overviews are AI-generated summaries that appear at the top of Google’s search results page. They use underlying large language models like Google Gemini to provide more in-depth responses to searcher’s queries. AI Overviews occupy a significant chunk of top of the page real estate.

A Google AI Overview
A Google AI Overview

Google has recently revamped the links in AI Overviews. Now, each statement has a link icon next to it where you can access the source(s) used to verify the statement. Links appear on the right side of the AI Overview which includes a Show All option.

As AI Overviews typically appear at the top of the search results and include links to content and websites, there is much interest for businesses, website owners, and content creators to establish visibility in the summaries.

A handful of studies have been published this year examining the overlap between links in AI Overviews and search results.

Visit the Google AI Overview Library

Google AI Overview Library

Initial Findings by Authoritas

In January 2024, Authoritas published one of the earliest studies concerning links in AI Overviews, specifically within the experimental Google Labs version known as the Search Generative Experience. Laurence O’Toole, the CEO of Authoritas, discovered a notably low correlation in this initial study, finding that “only 4.5% of generative URLs directly matched a page 1 organic URL” with an average of 10.2 links.

Changes in AI Overview Linking Patterns

Subsequent research by Laurence indicated a rising trend: a study in March 2024 showed an increase to 20.1% with an average of 10.75 links. By August 2024, the latest research from Laurence reported an even more significant increase, with 42.2% of generative URLs matching those on the first page of organic search results with an average of 9.52 links. This upward trajectory suggests improvements or changes in the algorithms driving AI Overviews.

Comparative Studies by Advanced Web Ranking and SE Ranking

In July 2024, Advanced Web Ranking published a comprehensive study that found 33.4% of links in AI Overviews are in the top 10 results for the query. On average they found AI Overviews to contain just over 7 links.

SE Ranking’s latest AI Overview study showed a significant increase in the average number of links, rising from 4 to 9 per overview. Moreover, SE Ranking found that 73% of the links in AI Overviews aligned with the top 10 search results, highlighting a stronger connection between AI Overviews and organic rankings.

Deeper Investigation and Article Overview

In my article, ‘Google AI Overviews: Do Ranking Studies Tell the Whole Story?‘, I contended that AI Overview ranking studies overlooked a vital aspect of how AI Overviews function. These studies focused solely on quantifying how many of the links from AI Overviews appeared in the organic search results for the specific query entered.

Google’s patent on how the AI Overview works explicitly states that the system doesn’t just select documents from search results responsive to a specific query. Instead, if the top results lack diversity, it expands its search to include documents from results related to other relevant queries, aiming to find unique information. It determines related queries by embed distance.

The patent also describes the system’s ability to utilize recent queries made by the searcher and implied queries based on search context and user profile. However, as of this writing, the live version of AI Overviews have yet to incorporate any form of personalization in their creation.

Here is my interpretation of how Google AI Overviews work based on Google’s patent.

Diagram of how AI Overviews work created by Rich Sanger SEP
Diagram of how Google AI Overviews work

This is only part of the story. The AI Overview does not add links until the end of its process, as depicted above. Initially, the system creates the summary and then revisits various documents to find content that verifies its statements. Consequently, the documents linked in the final summary may not be the same as those used as sources during the summary creation.

Foundation for Collaborative Research

This background lays the foundation for the collaborative research between Laurence and me, where we aim to capture a more comprehensive picture of how AI Overviews link to search result documents. By extending our study to incorporate various related query types and examining the underlying technology through Google’s patent, we aim to provide a more detailed understanding of how AI Overviews link to documents.

Study Overview

The aim of this study is to account for all documents linked in Google AI Overviews in the organic search results. To do this we sought to include search result documents from both the top results for direct match keywords and related keywords.

This is our hypothesis:

AI Overviews incorporate documents from top search results for both direct match and related queries to provide a more diversified and contextually relevant summary.

You can access the underlying data here.

Direct Match Documents

The initial step involved identifying how many of the linked documents in an AI Overview corresponded with the top search results for the query. We focused on the first page of Google search results, typically comprising the top 8 to 10 entries. To calculate the percentage, we divided the number of links found in these top search results by the total number of links in the AI Overview. We anticipated this percentage to fall within the range previously reported by Advanced Web Ranking (33.4%) and Authoritas (42.2%) as we are following the same methodology. Laurence refers to this measure in proportion form (e.g., 0.334) as the GOA Score™ (Generative to Organic Alignment Score).

Documents Responsive to Related Queries

Next, we aimed to analyze documents from the top search results that responded to related queries. This involved examining documents that appear in the top 8 to 10 search results for related queries. To achieve this, we utilized queries from Google’s “People also search for” section found on the search results page. For instance, for the query “can you buy a dog a seat on a plane,” we investigated documents responsive to related queries listed under “People also search for.” These included “Can you buy a dog a seat on a plane internationally?” and “Which airline will let you buy a seat for your dog?”

A People also search for result from Google search results.

Queries from Google’s People also search for section.

Reformulated Match Documents

Our third step was to examine reformulated queries. This process involves leveraging a large language model (LLM) to evaluate the initial query and URLs to predict reformulated versions of the query. For example, the LLM produced the following reformulated queries for the direct match query “how do I start my online business”:

  • steps to start an online business
  • how to launch an online business
  • online business setup guide
  • beginning an online business
  • starting a successful online business

This approach helps us understand how variations of a query play a part in the documents linked in Google’s AI Overviews.

Combining Direct Match, Related and Reformulated Queries

In our final step, we included each direct match query with its related and reformulated queries. This allowed us to determine the overall percentage of documents that are accounted for in the top search results.

Results

Overview of AI Overview Triggers and Link Composition

Our study analyzed 11,163 queries, out of which 21% (2,358) triggered an AI Overview. These summaries included, on average, 8.9 links each.

Pie chart showing the percentage of queries (21%) triggering an AI Overview.

The most frequently linked URLs included YouTube with 6.46% ( including 3.88% from youtube.com and 2.59% from m.youtube.com), Wikipedia (3.76%), and Southern Living (1.53% links).

Chart showing the top domains by frequency in AI Overviews.

SERP Position and Link Probability

We also examined the likelihood of a search result document being featured in an AI Overview based on its SERP position. A link in the first position had a 53% chance of appearing in an AI Overview, while this probability decreased to 36.9% for content ranked in the tenth position.

Chart depicting the likelihood of documents being featured in an AI Overview by SERP position for direct match queries.

Analysis of Direct, Related, and Reformulated Queries

Now let’s talk about the related and reformulated links!

Our first task was to determine the percentage of links in the AI Overview that overlapped with the first page search results for the exact match query. We found that 46.3% of the documents linked in the summaries were from the top organic search results for the exact match query.

Next, we added related queries from the “People also search for” section to the direct match queries to measure how many of the links are accounted for. On average, this contributed approximately 7.4 additional queries. The percentage of links from both the direct match and related queries in the top search results increased to 60.4%.

Our third step was to combine direct match queries with the reformulated queries. An average of 5 reformulated queries were generated per direct match query. This resulted in a 61.3% overlap between the links in AI Overviews and the top search results.

Finally, we included the direct match, related, and reformulated queries. By including all query types together, a total of 67.3% of the links in the AI Overviews were accounted for in the top search results.

GOA Score™(Generative to Organic Alignment Score) Summary

As previously mentioned, Laurence defines the proportion of links in AI Overviews that correspond to the top search results as the GOA Score™ (Generative to Organic Alignment Score). The data presented below shows how GOA increases significantly from 0.463 to 0.673, a 45% improvement, when related and reformulated results are combined with direct match results.

Average GOA score for normal, related, and reformulated queries.
Bar chart comparing the average GOA Score™ (Generative to Organic Alignment Score) for normal, related, reformulated, and combined search queries.

Evaluation of Unaccounted Links

We next analyzed the remaining URLs for any trends. There were a total of 2,615 domains not in the first page results for direct match, related queries, reformulated queries. YouTube, was the most common with youtube.com (8.73%) and m.youtube.com (6.24%) providing a total of 14.97%. Wikipedia was the next most common with 2.1%.

Top 20 missing domains from AI Overviews.
Chart showing the top domains unaccounted for in top results by frequency in AI Overviews.

We also analyzed the probability of a search result document appearing in an AI Overview based on its SERP position for related and reformulated queries. In both cases, the trend mirrored that of direct matches: the higher a document ranks in search results, the greater its likelihood of being included in an AI Overview.

Match distribution between AI Overview related and reformulated queries and SERP position.
Charts comparing the likelihood of documents being featured in an AI Overview by SERP position, segmented for reformulated queries (left) and related queries (right).

Discussion

This study offers valuable insights into how AI Overviews link to documents. The overlap between AI Overview links and search results for direct match queries was 46.3%, higher than the findings from earlier studies by Authoritas and Advanced Web Ranking but lower than SE Ranking’s latest research. Given that we followed the same methodology as Authoritas, the results suggest a continuous trend of increasing alignment between AI Overview links and organic search results. This indicates ongoing adjustments in Google’s algorithm, though differences in sample data may also impact these findings.

Improved Understanding with Related and Reformulated Queries By including related and reformulated queries, we were able to account for over 67% of the links—significantly higher than earlier studies, which accounted for up to 42.2% of links. However, nearly one-third of the links remain unaccounted for, suggesting that increasing the number of related and reformulated queries could further raise the percentage of accounted links and underscores the need for deeper analysis.

Our findings also show that a search result document’s likelihood of being featured in an AI Overview is closely tied to its SERP position for both related and reformulated queries, following the same trend as direct matches: the higher the ranking, the greater the visibility in an AI Overview.

Further investigation into these unaccounted links revealed a predominance of specific domains, notably YouTube and Wikipedia, with YouTube alone representing 6.46% of all links and almost 15% of the links that remain unaccounted for.

Technical Aspects and Methodological Challenges According to Google’s patent, the system identifies related queries through embed distance, which involves converting text into numerical vectors that represent the semantic meaning of words or phrases. It’s possible that the related queries and reformulated queries we used in this study are not perfectly aligned with the system’s method of identifying related queries based on its embedding techniques. This could potentially influence the accuracy and relevance of the AI Overview links we analyzed.

Conclusion

The study revealed a notable increase in the percentage of links from AI Overviews that could be matched with top organic search results. However, the persistence of approximately one-third of the links being unaccounted for calls for further investigation.

Complexity in Link Selection The patent indicates that the system does not add links until after the summary is created, matching content in the summary to content in relevant documents to verify statements through embed distance. This adds another layer of complexity to reverse-engineering how the system selects links. The AI Overview system has access to the LLM’s training data, Google’s index, the Knowledge Graph, and owned assets such as YouTube.

Improving the alignment of our research methodologies with Google’s actual embedding techniques could increase the precision of our results. Further exploration into how AI Overviews utilize Google’s extensive data resources—including LLM training data and the Knowledge Graph—might provide deeper insights. For even more insights, you can explore my follow-up research exploring search intent and Google AI Overviews.

Maximize Your Visibility with Authoritas’s AI Overview Rank Tracker

Track how your keywords perform in AI Overviews with Authoritas’s AI Overview Rank Tracker. Optimize your content, stay ahead of the competition, and refine your strategy with real-time insights.

Track Your AI Overview Rankings




Further Reading on Google AI Overviews

References

Subscribe to the SEO, AI & Pizza Newsletter

Receive weekly updates on the intersection of SEO, Search, and AI, directly in your inbox.

Keep up with the latest on the intersection of SEO, Search, and AI, right to your inbox.

Leave a Reply