Making Azure Cognitive Search Work Better with Kentico Xperience

Introduction

Almost nothing is more frustrating than when a developer or marketer receives feedback that their website search experience is broken. Hearing the words “When I search for this term, the wrong results are returned, or [worse yet] no results are returned at all”. I know it sure pains me to hear this.

Ultimately there are lots of possibilities that could cause a faulty search. Some issues tend to be more severe, but some issues also tend to be just small nuances. It can come down to how content is written, how content is indexed, and/or how calling the search service is performed through code.

This article contains a few tips and tricks to make the native integration between Kentico Xperience and Azure Cognitive Search work better than the out of the box configuration. Following one or more them may finally help you hear “Wow, search is working great on the site”.

The Setup

I am going to assume that your Kentico Xperience instance is setup correctly to use Smart Search with the Azure indexes as a baseline for this post. If you don’t already have that setup, you can check out the Xperience documentation for Using Azure Cognitive Search to configure your Smart Search indexes. Again, this post is for making the experience better for your end users as well as a few quick tips for working with the indexes themselves as a Kentico developer.

Also, if you are not sure about using Azure Cognitive Search with Kentico, did you know that you can easily try it out without spending any money? AND without having to write any code? Yes, you read that correct. There is basically no barrier to giving it a shot. To get started for free, you can use any subscription that you have with Azure, or you can create yourself a new Azure subscription in a matter of minutes. Simply log into that subscription and create a Azure Cognitive Search service resource using the free tier. That's right everything from the Kentico side will work with that free tier from a prototype standpoint. Once you have that resource created, you can also generate a complete working UI for performing a search on your CMS data (I will show that in a second later on in this post).

Let's start tackling a few items that we tend to run into when using Azure Cognitive Search with Kentico Xperience. The following items are meant to be quick tips that make working with Azure Search better in Kentico Xperience.

Why Are there So Many Fields in my Azure Search Index by Default?

This was one of the very first things I noticed when starting my learning journey with Kentico and Azure Search. Even with only one Page Type configured to be searchable by the Azure Search index, I had what seemed like hundreds of fields added into my index by default. The default result looks like this in JSON:

My immediate thought was well, that seems like overkill. I also happen to know that Azure Cognitive Search (and almost every other search service out there) has a limit on the number of fields per index. If you need to have a whole ton of content searched, chances are you have multiple Page Types with multiple fields that you want to include in that search. One quick way to make sure you don't hit that limit on the number of fields per index is to control what fields are searchable and which fields are not. Luckily Kentico Xperience has a great automatic way of doing this. The only problem is that it is not really straightforward on how to eliminate the "base" fields of the index that are not part of your custom Page Type.

The key here is to understand that everything in Xperience at the end of the day is a Document / TreeNode when it comes to Pages. That means that the base class or ClassDefinition in the database, if you want to get technical, includes all of the base fields of the CMS. Even if you only have these few fields available to you in your Search configuration of the Page Type. Below is such an example of my Article Page Type for Mcbeev.com. It doesn't really have that many fields.

That means that the default fields have to come from somewhere else right? They do. The quick tip here is that you can configure the Search fields of the base Document object in the Modules application. Navigate to the following place:

Navigate to Modules in the menu -> click edit on the Pages module -> click Classes vertical tab-> click edit on the Pages class -> click on Search vertical tab.

You can see above that many of the fields, in fact all of them that you most likely see in your JSON results are marked as Retrievable or Content. This is the reason they are included in the indexing process and in the Search Results document. The fix here is to simply uncheck them (yes, you do have to click the customize button first on this class). This is a safe operation, and it will cut down on the fields that are passed into the index. It will also most likely speed up the indexing process itself as well. The unchecked version looks like:

Now go ahead and rebuild your index all the way, and you won't see these fields as part of the index nor will they be in the results muddying up your Search Documents JSON. It will also save you on the index size and keep you from hitting that field limit of the index.

When I Search for Part of a Word Instead of the Whole Word Nothing is Returned?

This is a fun one. And really this question was the inspiration for this blog post. As it turns out most people expect to receive results from a search engine when typing in part of a word or phrase. Logically that makes sense to us all, as we have all been using Google, Amazon, Bing, and Yahoo for years now. Partial word matching is just something that these popular search engines do. People expect it to "just" work like that. Well you have some options here.

By default when you perform a search through Kentico Xperience's implementation of the Azure Search SDK you are using a SearchServiceClient instance and passing in a SearchString (the query term the user is using itself) and the SearchParameters object like so:

var result = await searchIndexClient.Documents.SearchAsync(searchString, searchParams);

The thing to try is of course the wildcard character in this simple mode of searching. By appending a '*' character you can get more of a wildcard behavior to work, but it is often too aggressive and does not work in all scenarios (especially if you are trying to search just one specific field or in combinations with Filters and Facets. I'll come back to the wildcard character at the end of this section (bear with me).

As we go deeper to solve the problem, all of the documentation focuses on setting the Facets, FilterText, or Highlights fields on that object, which is well and good, but they don't mention the queryType that much. The queryType parameter defaults to simple (which is good for general full text search). However, that is not good for when you want to do partial matching or fuzzy matching on the user's search term. Let's run through some examples that illustrate my point. 

If you do not specify queryType and only perform a normal search with just a searchString value, that is the same thing as calling the direct REST endpoint for your Azure Search service. If you are searching for a whole word it works great by default. It looks like this when I search for the term "blazor" across all of my blog post content here on mcbeev.com:

Since I am specifying the count property set to true, we can see that 7 results come back for the term "blazor" with a queryType of simple. But things fall apart if I change the term to "blaz" or "blazo". 0 results are returned, which is not exactly what we really want.

The way you can get results to return for partial matches in Azure Cognitive Search is to change the queryType parameter to full as the value. This enables you to use the entire full Lucene query search syntax. When full mode is turned on you can use Fuzzy search matching as another option in your search toolbox. Fuzzy search matching is meant to compensate for typos and misspelled terms in the input string. This can be helpful in some scenarios where you are worried about misspelling terms like "blazr" instead of the correct "blazor" spelling. However, Fuzzy search is slower to perform so you want to be a little careful with it. To enable Fuzzy search mode in the Kentico Xperience call, the C# would look like this (basically you just add the '~' character at the end of the string and set the queryType to full:

var result = await searchIndexClient.Documents.SearchAsync(searchString + "~", new SearchParameters(){ QueryType="full"});

That C# code again gets translated by the SDK and ends up with this REST call which now returns multiple results for the misspelling of "blazor" and partial matching looks like it is starting to work:

But hold on, now I actually have 34 matches / results. I know I like Blazor, but I also know I have not written that much about it. I'm getting too many matches now. That's because basically the Fuzzy match is too Fuzzy for my needs. There are too many other words that match to be useful. That's why we don't normally end up using Fuzzy.

The best method to is to actually use a contains syntax in Lucene with full queryType mode. This is a tad bit more complex, but not too bad. You just have to use Regular Expression search matching with Lucene syntax in mind. Which I know sounds very dangerous (unless you are Mark Schmidt, who absolutely loves Regular Expressions).  

To do a Regular Expression search that matches something that contains a partial word we recommend this:

var result = await searchIndexClient.Documents.SearchAsync("/.*" + searchString + "*./", new SearchParameters(){ QueryType="full"});

This gets me my partial match to "blazo" for anything with a real word of "blazor" in the content. 

With great power comes great responsibility though. This is a slower operation for sure. If you have a very large index you need to be a bit careful when and where you use this, but it does work well in our usage. There are a few other higher end options, but they require moving up to a different analyzer type which is more complex for sure. The important thing here is that you can use this either generally across the entire set of index fields with the searchText property, as well as you can use it on specific fields like values of tags (which could come in handy with dealing with trying to match a category or tag name that a user is searching for). For example:

Remember the simple wildcard possibility that I started with? If we had just used that, no results would have come back for the above scenario. That's why it doesn't always "just" work. Example of it NOT working:

Test the Search Service Away From Your Code

Azure Cognitive Search has one of the best debugging tools available built right into the Azure Portal, the Search Explorer. In fact, that's what I have been using above to do some quick searching and show the examples. If you are ever having an issue with your Kentico site not showing search results for content that you think it should, start with the Search Explorer. We always use this to isolate if the problem has to do with our code on the MVC Core live site, or with the document not even being present in the index itself. I can't recomend the Search Explorer more. There is also a side benefit of using it. It will teach you how to use the REST syntax for querying when it comes to using &$count=true, &$select=, &$filter=, and more. 

Generate an Automatic UI for the Azure Search Service

Want your own playground to work with results a bit more than what Search Explorer provides? Well you will be a fan of another newer feature from Cognitive services. The automatic scaffolding UI that creates a demo app for you to use on your very own index. Clicking the Create Demo App button inside of the Azure portal generates a React based SPA client that downloads as a ready to go HTML file that has code already hooked up to your service. The Create Demo App process even lets you map which fields should show up as search results in your UI. I was blown away at how easy this was to get things working. Start by clicking the Create Demo App button when viewing one of your indexe details (at the top of the portal). It opens this:

You can see that I've mapped the three preview fields to the right fields of my content in the index. Finishing this gives you a few more options, but ultimately downloads the html file that is your demo. 

By the way to get my thumbnail images to work, I did have to slightly tweak the code sample that downloads to replace the correct image path. If anyone is interested in that just let me know and I'll post it. 

When Out-of-the-box Fails, Build a Custom Azure Index

One of my fellow Kentico Xperience MVPs, Sean G. Wright says often that one of his favorite parts of working with Xperience and Azure Search is building custom indexes. I agree with him, this is a place that Kentico Xperience shines with customization. You can create a custom module class in a class library and register it in the CMSApp project to customize what happens with the indexing process of your content. Inside of this custom class you can do things like combine child nodes in the Tree to a parent node and consider them as one single SearchDocument. This is useful when dealing with complex content models that drive large product detail pages. That in itself would be it's own blog post idea, but Im out of time for today. If you are considering this know that it is the most powerful way to craft an index in Kentico. You can check out how to do that via the documentation at customizing Azure Search for Kentico Xperience websites

Take It to the Next Level with Semantic Search

Azure Cognitive Search also recently added the abiltiy to perform semantic search, or the ability of the search engine to consider the intent and contextual meaning of search phrases. Semantic is different than the standard way of just analyzing the exact phrasing of a search term. Without semantic search ACS (and local Smart Search) would consider the search phrase "past episodes of Kentico Rocks" as 4 different words [of would be ignored most likely], and present matches that match any of the 4 words indivdually. But with semantic search the search engine has a good chance of returning the results of the Kentico Rocks podcast based on reverse chronological order.

Now, the out-of-the-box integration with Kentico Xperience can't quite utilize this new feature yet. The queryType of semantic is not yet supported by the Xperience nuget packages, However, you could roll your own queries to your search service if you had this feature available to you. And maybe just maybe the Kentico Xperience product team would consider adding this new feature into a future Refresh of Kentico Xperience, just like they improved the Azure Search integration recently in Refresh 1 of Kentico Xperience.

Conclusion

As you can see, there is a lot you can do with Azure Cognitive Search and Kentico Xperience. Hopefully this post helps you to no longer here those cringeworthy words of "Our Search doesn't work on our website". If you are using Azure Search with your Kentico site let me know on Twitter at @mcbeev. I'd love to hear of some more real world scenarios of where this technique is deployed and how.

17