Google’s AI Overview has officially gone live and and I’m here to answer the question on everyone’s mind: How Does Google AI Overview Work? This feature, which evolved from Google Search Generative Experience (SGE) previously only available in Google Labs, has now been rolled out to all Google users across the United States.

Semrush Sensor showing AI Overview traffic.

As of today, about one week later, the AI Overview is showing in less than 1% of searches according to Semrush’s Sensor. So far, its impact has been minimal, but its potential remains significant. Google also shared that they will not be offering any specific insight into impressions or clicks from the AI Overview. It will be included within your existing Google Search Console data.

A few months back, I published an article on how to optimize for SGE through reverse engineering. Much of what I discussed there is still quite valid. Building on that foundation, this article will get into how the AI Overview works and its implications for search engine optimization.

For even more detailed insights on optimizing for Google’s AI Overview based on learnings from the patent, check out the second part of my patent analysis, “AI Overview Optimization: Insights from Google’s Patent.

How do we know how the AI Overview works? Fortunately, Google filed a patent in September ’23 for their AI Overview called “Generative summaries for search results.” This goes into great detail how their generative AI summaries work. Keep in mind, that while the overarching ideas here are likely still relevant, the algorithms are continually updated to produce (hopefully) better results.

This article will answer the question, “How Does Google AI Overview Work?” by drawing on detailed explanations from their patent and highlighting what you need to know for SEO. Let’s get into it.

What is Google’s AI Overview?

Google’s AI Overview is a generative AI augmented summary for specific queries. It utilizes AI models such as large language models (LLMs) to provide more accurate, relevant, and context-aware answers to user queries.

See an example below for the query “what is Dune?”‘

An AI Overview for the query "what is Dune?"

In this AI Overview, there is a text summary along with links to sources. The AI Overview summary can take many different formats and layouts depending on the queries.

How Does Google AI Overview Work?

Google’s AI Overview works by leveraging large language models (LLMs) to generate contextually relevant and accurate summaries for search queries. It retrieves content from various sources, processes this information, and dynamically adapts its responses based on user interactions and feedback to continually refine and improve the quality of its summaries.

Now that you have a general overview of how Google AI Overview, works, let’s get into the specifics.

How Google AI Overview Understands the Query

When you enter a query and select search, the first step is an analysis which may include one, multiple, or no LLMs. Google uses a variety of LLMs to help understand and interpret the intent and context of the query. For a deeper dive into how LLMs power AI Overviews, including the role of Google Gemini, check out my article, ‘Google AI Overviews: The Role of Large Language Models and Google Gemini.’

When selecting an LLM, it will consider whether the query is informational, creative or a request to create an image, the presence or absence of particular words, the types of search result documents responsive to that query, and more.

For example, if you search for “apple,” the LLM(s) evaluates the context of your query based on what you searched for recently. If you’ve been searching for tech news or Apple products, the LLM interprets your query as related to Apple Inc. On the other hand, if your recent searches include recipes or health topics, it’s more likely to understand that you’re referring to the fruit.

Visit the Google AI Overview Library

Google AI Overview Library

How the AI Overview Selects a Large Language Model

The patent mentions a variety of LLMs that could be selected based on the query, including; Informational LLMs, Creative LLMs, and Text-to-Image Diffusion Models. Specifically, the patent mentions Google’s Pathways Language Model (PaLM) and Language Model for Dialogue Applications (LaMDA).

From the patent, “…the system selects, from multiple candidate generative models: none, one, or multiple to utilize in generating response(s) to render responsive to the query.”

As far as which LLM(s), if any, are selected, the system has several criteria that help. Here are the criteria outlined in the patent:

The system also has the option of not selecting an LLM at all. This occurs when “one or more objective criteria indicating that such utilization is not needed to satisfy the needs of the query.” In these situations, an AI Overview would likely not be provided as a response.

Another aspect highlighted in the patent is resource conservation, with a goal of selecting LLMs in a way that minimizes the use of “computational resources.”

From the patent:

and

These criteria are intended to select the most appropriate LLM to generate a relevant and accurate response to your query while conserving available resources.

How the AI Overview Selects Its Sources

After determining the intent and context of the query, the LLM drafts a summary after selecting the source articles, web pages, images, or videos. You can see this process depicted below in FIG. 2.

There are 3 main criteria for selecting the documents that form the basis of the generated summary.

The system first assesses how relevant each potential source document is to the specific query you’ve entered. It considers query-dependent measures, such as positional ranking and selection rate of the search result document for the query, as well as the relevance to the query’s location and language, ensuring that results are geographically and linguistically appropriate.

Additionally, it assesses query-independent measures for search result documents include its selection rate across multiple queries, trustworthiness (based on the author, domain, or inbound links), overall popularity, and freshness (how recently it was created or updated). It tends to prefer sources that are frequently accessed and from reputable sites.

The system considers user-dependent measures to fine tune results based on the user’s profile and past interactions with the search engine including recent queries, and recent non-query interactions. For example, if a user consistently clicks on video content, the AI is more likely to prioritize video results in future searches.

These criteria help Google provide personalized, accurate, and contextually relevant search results to each user. The user’s familiarity with potential search result documents impacts the documents selected and the summary provided. You can see the system’s decision-making process in FIG. 6 below.

Google’s AI Overview further refines document selection by considering several factors to ensure the relevance and completeness of the information presented. If it finds that the top ranking documents relevant to the specific query are not diverse or are low-quality, it then seeks out documents related to related queries—those that are relevant to queries similar to the main one. To determine relevance of the relate queries, it is using vector embeddings and correlational thresholds.

Secondly, it takes into account recent queries made by the user which can provide more timely and contextually relevant results. Finally, the system considers implied queries—these are queries that the system generates based on the user’s context or indicated interests, even if the user hasn’t explicitly searched for them. This proactive approach allows Google to anticipate and meet user needs more effectively.

Google AI Overview Source Selection

As the system progresses through each section of the search process — whether from the specific query, related queries, or recent queries — the system evaluates the top-ranked documents for relevance, quality, and diversity.

This is aligned with recent research by Authoritas, Advanced Web Ranking, and SE Ranking showing that AI Overview is not just selecting it’s sources from the top organic search result pages.

To go even deeper into why the system selects low or unranked documents for AI Overviews, read my article, “Google AI Overviews: Do Ranking Studies Tell the Whole Story?”

Adding Citations to the AI Overviews

After the AI Overview summary is created, the system then goes back and adds links that verify each portion of the content in the AI Overview. You can see this process depicted in FIG. 3.

The system seeks out documents that are high-ranking and close in the vector embedding space. According to the patent, “the candidate document can be determined based on its corresponding to the top search result for such a search, or being in the top N search results.” Take note of this statement. This part — “being in the top N search results” — implies that Google may limit how deep in the SERPs the system goes to select documents.

In determining whether a document verifies a portion of the NL-based content, “the system can determine a distance measure between the content embedding and the document content embedding and determining, based on the distance measure, whether the document verifies the portion of the NL-based content (e.g., verifies only if distance measure is less than a threshold).”

In addition to ranking and distance, the system can use the same query-dependent, query- independent, and user-dependent measures used during the initial summary creation.

An important note here. The documents the system uses to build the initial AI Overview summary may not be the same documents used to verify and link to in the final process. This means that the AI Overview summary may actually use your document as a reference point to build the summary but not link to your document. This can potentially happen if the system finds that another document that is higher ranking and has relevant on-page text that is closer in the embedding space.

Is the AI Overview Searching the Internet in Real-Time for the Documents?

In the patent, there is no direct mention of the documents being selected from a cached database. But it is providing information from the LLMs training data and Google’s index.

The LLMs use information already existing in their knowledge base from their training. The patent states:

“In some implementations, the NL based summary that is generated can include direct In some implementations or situations, the NL based summary that is generated can also include content that is not directly (or even indirectly) derivable from the content processed using the LLM, but is relevant to the content and is generated based on world knowledge of the LLM (obtained through prior training)”

If the LLM is providing information from its training data, it is certainly likely that it would include URLs or references to sources that are no longer active. This could include web content, articles, and other resources that were available at the time of training. If some of these sources have since become inactive or outdated, the LLM might still reference them because it does not have real-time internet access to verify their current status.

Additionally, the LLMs do have access to Google’s index to search and select relevant documents. In a recent statement addressing quality control issues with the AI Overview, Liz Reid, the VP, Head of Google Search, stated, “While AI Overviews are powered by a customized language model, the model is integrated with our core web ranking systems and designed to carry out traditional “search” tasks, like identifying relevant, high-quality results from our index.”

The system also has access to Google’s Knowledge Graph. This is Google’s understanding of entities such as specific people, places, and things. This information is also used to build the AI Overviews.

To answer the question if the AI Overview is searching the internet in real-time for the documents, then the answer is no. The AI Overview is not searching the internet in real-time for documents. Instead, it relies on pre-existing training data, and Google’s index, and Knowledge Graph to provide information.

The Impact of User Interactions with the AI Overview

The Google patent doc provides some insights into how user interactions within the AI Overview play a part in it’s source selection and refinement.

The LLM generates a summary (AI Overview) that tries to answer your question based on the content it finds online. As you interact with the search results—maybe by clicking on a link, spending time reading a specific document, or even just pausing over particular content—Google observes these actions. It uses your interactions to adjust and improve the responses it gives you.

This feedback loop allows Google’s search AI to learn over time which types of responses are most useful to users, leading to more personalized and accurate information being presented in the form of a revised NL summary.

Google is able to monitor user behaviors in multiple ways in the SERPs. These interactions have an influence on what is shown in the AI Overviews and likely in the organic search results. It’s not just clicks. Here are some of the ways described in the patent document.

Color Coding in AI Overviews and Trust in Source Documents

You may have noticed that the AI Overviews display in different colors. You can see below, on in green. This piqued my interest.

AI Overview

The patent discusses using color-coded confidence measures to assess the trustworthiness of search result documents:

  • Green: High confidence
  • Orange: Medium confidence
  • Red: Low confidence

I was curious if this is what we were seeing in the AI Overviews. Were the colors indicative of trust in the source documents?

I asked John Mueller if this was actually the case in the AI Overview. John says, no.

Barry Schwartz at Search Engine Roundtable shared that the colors were based on the query and user journey:

There is more to the story here, though. Looking through the patent, it highlights a Response Confidence Engine in Fig. 1:

This engine is intended to ascertain confidence in all or part of the response. From the Google patent:

We see this later referred to in Fig. 2, where it indicates “Render with Confidence Annotations” as part of the summary generation process.

In this section of the patent, it says that confidence annotations can be included in the generated summary, either with text or colors.

The Response Confidence Engine section mentions only refers to Fig. 2 and does not refer at all to Fig. 3 or Fig. 4. In the sections of the patent that discuss Figs. 3 and 4, there’s no mention of confidence annotations. This raises an interesting question: Why is the level of confidence indicated only for the LLM in Fig. 2?

I am very curious as to why Google elected not to include these annotations, whether with a text stating “high confidence” or with the color coded system. I will ask the Google team and I’ll keep you posted if I get any answers.

Key SEO Considerations with Google’s AI Overview

With a general idea of how the AI Overview works, we can now get into what you should focus on for SEO. Based on the Google Generative AI patent, the most important aspects for SEOs to consider include:

  1. Query Understanding and Intent: Focus on aligning content with user intent and context, as the AI evaluates beyond just keywords.
  2. Content Relevance and Diversity: Aim to cover topics from various angles, since the AI sources information broadly to ensure comprehensive responses.
  3. Engagement and User Interaction: Optimize content to improve metrics like click-through rates and dwell time, as these influence the AI’s learning and response adjustments.
  4. Quality and Trustworthiness: Produce authoritative, accurate content that adds value, as the AI prioritizes trustworthy sources.
  5. Local and Personalized Content: When relevant, implement local SEO strategies and personalize content to cater to specific audience demographics and locations.

These aspects are important not just for AI Overviews, but first and foremost for your overall SEO optimizations. Applying these strategies to your content will not negatively impact you organic performance. They will help it.

For more on optimizing your content for visibility in Google’s AI Overview based on learnings from the patent, read my article “AI Overview Optimization: Insights from Google’s Patent.

AI Overview: Implications and Future Outlook

Understanding how Google AI Overview works is necessary for predicting its full impact. It’s still too early to determine the full impact of the AI Overview. But Google does seem committed to the AI-enhanced version of its search results page regardless of how users are feeling about them. Having said that, Google’s AI Overview marks a significant change in what Google’s search engine page offers, making use of AI technologies to add to the user experience.

For SEOs, it’s important to know how the AI Overview works. It not only interprets the intent and context of user queries with LLMs but also adjusts and refines content based on user interactions and query specifics. For optimal SEO outcomes, SEOs should focus on creating content that is not only relevant and authoritative but also tailored to meet the diverse needs of users in different contexts. This will also help your traditional organic search result performance as well.

For more details on optimizing your content for Google’s AI Overview, check out the second part of my patent analysis, “AI Overview Optimization: Insights from Google’s Patent


Further Reading on Google AI Overviews

Subscribe to the SEO, AI & Pizza Newsletter

Receive weekly updates on the intersection of SEO, Search, and AI, directly in your inbox.

Keep up with the latest on the intersection of SEO, Search, and AI, right to your inbox.

One thought on “How Does Google AI Overview Work? Insights From the Patent

Leave a Reply