Many of us are sometimes surprised by how accurately search engines understand our intent and return good search results even though we sometimes write complex queries (questions or phrases).
But these great results did not take place in a day and night, it took a lot of study and research.
search engines used to use the following algorithm – the user writes a query, and the search engine searches for the results that contain it. For instance, if you are searching in a list of documents titles and you write "cats food" in the search box, the system will only return the documents that contain the phrase "cats food" in their titles, and if there is no such title then the search results will be irrelevant.
But with the birth of semantic search, the search process has become more complex and sophisticated.
Now when the user writes a query the search engine will try to analyze it, understand its context and intent, maybe extract entities or search for other synonyms, therefore users will get better results.
For instance, in the previous example, the search will be more meaningful because the system will analyze the query and recognize that the word "cats" refers to the kind of animals, and the word "food" refers to the things we eat and finally the system will return the most relevant results even if it can't find documents with the word "cats" it will still return relatively good results like dogs food, animals food, ...etc
What is semantic search?
Normal search (keyword search) is quite literal in finding the results because it just matches the keywords to the pages, so the search results would be disappointing.
However, Semantic-Search is a way of searching where the goal is to understand the meaning of the query and retrieve relevant results for that meaning instead of simply finding literal matches.
Semantic-Search aims to determine the intent and contextual meaning of the words written by the user and then tries to find the most relevant results for that meaning. For example, when the user enters “Lenovo laptop 2022” the keyword search will give all the products from Lenovo even if they are not laptops, whereas the semantic search will take into consideration the search’s intent and give accurate results so it appears the latest laptop from Lenovo.
Why is semantic search important?
1- Improve the user experience
The main reason for using search engines is to make it easier for the user to find what he is looking for. That's why search engines are supposed to understand what the users want and seek to provide a good search experience for them.
Semantic search enhances the user experience, as it understands user intent and fetches relevant results to the user query.
2- Influence your business
The semantic search brings great benefit to the business as the user can access data and information that he wouldn't be able to access without the semantic search, which leads to a high level of engagement and traffic that benefits the business.
Understanding Semantic Search is essential in order to get a maximum amount of traffic and conversions. You want to understand how the space works before you play a role in it. the more relevant and understanding user searches are, the better your conversion rates are.
Semantic Search has a great effect on e-commerce, it boosts online revenue by leading the customers to the products even if they can’t describe them well and showing them the seasonal products or sale items as product suggestions and also give classification for the customer’s results by price, brand, date release ..etc.
How does semantic search work?
Semantic search is search with meaning, this means that if you want to perform it, you have to represent your query and the documents that you want to search into in a meaningful way.
The main idea behind the semantic search is to represent both the query and the documents we want to search in a shared vector space using some sort of encoder and then perform a similarity search within the documents vectors database using cosine similarity or Euclidean distance or other similarity measures.
a high-level overview is illustrated below:
As the previous picture shows, we encode the query and the documents using the text encoder which will give us a vector representation of them and then we apply similarity search by calculating the distance between the query vector and each one of the documents vectors, after that, we choose the documents whose their vector representation is close to the vector representation of the query and arrange them from the closest to the furthest to form the final search results.
But the main question here is how to represent the query and the documents in a meaningful way?
This is where the text encoding comes in.
As we know Text is commonly referred to as unstructured data which is hard for the computer to understand or represent but this is actually not accurate, because if the text were unstructured, we wouldn't be able to understand each other when we speak and we wouldn't be able to read and understand books.
There is definitely a structure behind the text but this structure is quite complex and arcane in a way that the computer cannot understand, and it is the data scientist's job to parse it and pull out meaningful information from it.
One of the best ways to represent text in a meaningful representation is by using sentence embedding which is a technique in NLP where sentences are mapped to vectors of real numbers in a shared semantic vector space.
In sentence embedding, sentences that have the same meaning have a similar representation in the shared semantic vector space,
which is exactly what we want (As we want to represent text in a meaningful way).
And this is actually how the text encoder that we mentioned in the high-level overview works, it is basically a sentence embedding model that takes a sentence as input and outputs a vector representation of it.
To summarize, we talked about semantic search technology. What is it? Why is it important? and how does it work?
We also saw how it plays an important role in business.
and as we learned by representing the data we want to search within in a meaningful way we can achieve great search results and this applies not only to textual data but to all types of data (image, audio, video ...etc).