Upgrade your site search: Contextual answers with generative AI

tran.travis
November 1, 2025
6 min read

Generative AI refers to the use of
artificial intelligence to create new content, like text, images, music, audio,
and videos. Generative AI relies on a machine learning (ML) model to learn the
patterns and relationships in a dataset of human-created content.

This technology has shown incredible capabilities, through applications like
Gemini. You may be wondering, how do I implement
generative AI tools into my web products?

One common use case is to provide users a better interface to ask questions
about a website’s content. You can greatly improve your user’s search results,
with the help of machine learning.

Build a better, site-specific search

You could create an interface where users write their question, which is then
sent to a large language model (LLM), such as
Gemini,
and then display the answers to your users.

Suppose such a feature existed on this site. A user wants to know which APIs are
included in Interop 2024, and they input the following query:

What are the features included in Interop 24?

Unfortunately, the output will likely be incorrect, for a couple of reasons:

The user has given the LLM little context for the question, therefore the LLM
is more prone to return wrong answers or hallucinations.
The LLM was likely trained before Interop 2024 was created, or its features
decided, so it’s unaware of that information.

While it’s possible for LLMs to find more current information, LLM training
datasets are inherently outdated. Maintaining fresh results can be incredibly
time consuming and expensive.

Use prompt engineering

Prompt engineering
is a set of techniques to get the best output out of a LLM.

One technique is to provide additional context in the prompt, making the LLM
more likely to output content that is related to the context.

Continuing with our Interop example, our first step is to provide the full
contents of the article as context. Then add the question as the input for the
LLM to answer. For example:

Context:
Following on from the success of Interop 2022 and Interop 2023, we
are excited about the opportunity to collaborate once again with
all key browser vendors and other relevant stakeholders...
(trimmed to fit in this article)

Input:
What are the features included in Interop 2024?

You can expect Gemini to output something like the following:

The features included in Interop 24 are Accessibility, CSS Nesting, Custom
Properties, Declarative Shadow DOM, font-size-adjust, HTTPS URLs for
WebSocket, IndexedDB, Layout, Pointer and Mouse Events, Popover, Relative
Color Syntax, requestVideoFrameCallback, Scrollbar Styling, @starting-style
and, transition-behavior, Text Directionality, text-wrap: balance,URL

This answer is likely much better than with a prompt without context, as the
answer is based on the provided context.

Scale up with RAG

For example, instead of answering a question about a single article, we want
the LLM to answer more questions about web.dev, using any article as the
additional context. While this may be possible for smaller sites, given
Gemini 1.5’s context-window of 1 million tokens,
larger prompts are slower and more expensive to run.

LLMs input and output lengths are measured and charged in tokens, which are a
way to represent a common sequence of characters found in a text input. The
number of tokens will generally be larger than the number of words. As an
example the input in the first example had 775 words, represented by 1097
tokens. Different LLMs might calculate tokens differently and most provide
an API or an endpoint to calculate the number of tokens for text input.

One solution is to provide the LLM prompt-relevant articles. This task should be
given two parts:

Add the content of top articles as a context when prompting the LLM.
Search the content for articles related to “What are the features included in
Interop 2024?”.

We want the Gemini results to return content based on the following articles:

Article 1: web.dev/blog/submit-your-proposals-for-interop-2024
Article 2: web.dev/blog/interop-2023-wrapup
Article 3: web.dev/blog/interop-2024

The input should look as follows:

Context:
Article 1:
Over the past two years... (trimmed)

Article 2:
At the end of last year Interop 2023 wrapped up. This effort... (trimmed)

Article 3:
Following on from the success of Interop 2022... (trimmed)

Input:
What are the features included in Interop 2024?

This context produces our expected output.

* Accessibility * CSS Nesting * Custom Properties
* Declarative Shadow DOM * font-size-adjust
* HTTPS URLs for WebSocket * IndexedDB * Layout
* Pointer and Mouse Events * Popover * Relative Color Syntax
* requestVideoFrameCallback * Scrollbar Styling
* @starting-style and transition-behavior * Text Directionality
* text-wrap: balance * URL

For those familiar with AI techniques, this approach uses RAG, a common practice
to improve the likelihood of real answers from generative AI tools.

Improve output with semantic search

While the RAG technique can work with regular full text search, there are
shortcomings to the approach.

Full text search helps AI find exact keyword matches. However, LLMs are unable
to determine the intended meaning behind a user’s query. This can lead to
outputs are incomplete or incorrect.
There may be problems when words have multiple meanings or the queries use
synonyms. For example, “bank” (financial institution versus riverbank) can lead
to irrelevant results.
Full text search may output results that happen to contain the keywords but
don’t align with the user’s objective.

Semantic search
is a technique to improve search accuracy by focusing on these key aspects:

Searcher’s intent: It tries to understand the reason why a user is searching
for something. What are they trying to find or accomplish?
Contextual meaning: It interprets words and phrases in relation to their
surrounding text, as well as other factors like the user’s location or search
history.
Relationship between concepts: Semantic search uses knowledge graphs (large
networks of related entities) and natural language processing to understand
how words and ideas are connected.

As a result, when you build tools with semantic search, the search output relies
on the overall purpose of the query, instead of keywords. This means a tool can
determine relevant documents, even when the exact keyword is not present. It can
also avoid results where the word is present, but has a different meaning.

Right now, you can implement two search tools which employ semantic search:
Vertex AI Search and
Algolia AI Search.

Draw answers from published content

You’ve learned how to use prompt engineering to enable a LLM to provide answers
related to content it’s never seen by adding context to the prompt. And, you’ve
learned how to scale this approach from individual articles to an entire corpus
of content using the
Retrieval-Augmented Generation (RAG)
technique. You learned how semantic search can further improve results for user
search queries, better implementing RAG into your product.

It’s a known problem that generative AI tools can “hallucinate,” which makes
them at best, sometimes unreliable, or at worst, actively harmful for a
business. With these techniques, both users and developers can improve the
reliability and, perhaps, build trust in the output from these applications.