Growing into search: MS Azure Search

Introduction

This post is to share some of our experience working with enterprise search technology, and point up some possible stepping stones towards providing enterprise-grade search features in your online systems.

What is Azure Search?

Azure Search is a hosted search service provided by Microsoft within their Azure suite of web services.

Azure Search lets you index a body of data. Your data might be a set of PDF documents, a corpus of emails, the pages of your website, or records within a database. Broadly, you define what fields you want in the index, then submit information from your data into the index. For instance, you might choose to index the URL, page title, author name, author location, meta tag content, and the first 500 characters in the body of a web page.

Then, you’re able to submit search requests which search your index for matching content. You can define which fields should be returned; and importantly, you can also define how Azure Search decides which results are most relevant.

You can also ask for a particular page of results; control the number of results per page; get information to help your developers build results navigation features; get lists of sub-categories, with the number of matched results by each category (these are called facets – imagine searching for a brand name on a clothes store site, and getting clickable links for shoes, shirts, accessories)…

How does Azure Search beat my database? RELEVANCE

We’re used, with search engines like Google or Ecosia, to seeing relevant content at the top of search results. Relational databases can do a simple version of relevance sorting, but on a budget it’s not easy to achieve a great deal of control over how they do it – they’re much better at sorting alphabetically, by category or by value.

Search indexing services like MS Azure were designed for relevance search from the ground up – so they give you plenty of options for deciding, for instance, how much importance to give each field when weighing how relevant a search result match is: if a keyword appears in the page title, maybe that’s worth more in relevance terms than if the same keyword matches the body content.

You might also want to favour results where the author’s location is close to your user’s location. Or, you might want to cope with typos, by returning whole-word matches first, but allowing partial-word matches underneath. Or there might be synonyms that are important to you – if someone types in “footwear” you might want to show them records tagged with “shoes”.

Azure search isn’t Google, but it gives you many, many options that go far beyond what you can easily achieve with a traditional database. I consulted for a City of London recruitment company who were were using SQL Server – a traditional, relational database – and struggling with issues of both performance and sheer usefulness of their existing search system. I found that even the first step of their search process involved dozens of database queries, hammering their poor server and wasting their agents’ time while it wrangled results into something approaching a meaningful order. Azure Search would have been an excellent fit for them, if it had existed at the time (if you’re interested, my recommended solution was Apache Solr – see the Alternatives section below).

Other benefits

Separate search from content creation

Sometimes it’s good to separate the data store your users search, from the data store you write your information to.

This idea’s similar to a concept called command and query responsibility separation (CQRS). In simplistic terms, it might become unwieldy for you to have a team – and maybe automated processes, too – writing complex changes into your data, at the same time as potentially 10000s of users want to query that data. With Azure Search, you can write some subset of your data to the search index; then, those 10000s of users hit the search index more or less independently of your own team writing to your database. Not an issue for some businesses; immensely beneficial for others, especially if searching via your core database is becoming hard work.

It’s playful

Building a high-quality search experience on a relational database, for example, is hard work: any change is likely to take a lot of labour rewriting and restructuring tables, queries and stored procedures; plus an amount of input filtering and results processing in code… not to mention the challenges that come from launching changes to a database your whole enterprise might rely on.

With Azure Search, there’s a usually much simpler task of extracting the relevant data from your database, and creating the index; then a lighter, iterative process of tuning relevance weights and turning on features like geospatial search. It’s much easier to grow into Azure Search, and the ceiling of what you can achieve is potentially way, way higher.

Searches fast, scales high

Azure Search is blazing fast. If you’re a SME, even on one of the lower pricing tiers Azure wipes the floor with a relational database, particularly if relevance is important.

And if you grow, you can expand to multiple copies of your search index, to serve your users at true global scale.

…And you can start for free

There’s a generous free tier, which lets you set up with a modest search index and play with how you want your search to work.

If you’re lucky, you might even be able to support your business within the free tier for a while before scaling up.

Alternatives

My recommendation to the City recruitment company I mentioned earlier, before Azure Search was available, was Apache’s Solr search software.

Solr has the same search syntax (the way you phrase queries) as Azure Search. The two solutions have many features in common, too – like relevance weighting, faceting, and the ability to process data on its way into the index to let you match parts of words. At a simple level, Azure search seems almost interchangeable with Solr.

Solr’s all over the e-commerce world: when I trained to use it, I sat next to the search team from a US online apparel company, since acquired by Amazon for $1bn.

There are other alternatives, similar to or even based on Solr, like Elasticsearch; and other service-based solutions, like Algolia.

Azure has several features I love, though – and which, taken together, set it apart from other solutions:

  • There’s no infrastructure maintenance; you don’t even need to worry about the software configuration on a cloud server. It’s just there, waiting as a service.
  • You have full control over the parameters of your search index.
  • There are lots of advanced features you can grow into – EG you could explore AI as part of your indexing process, using OCR (optical character recognition) to extract searchable text from scanned documents, or natural language processing to pull keywords and categories from unstructured text (EG a news story).
  • It’s part of the overall Azure platform, which lets you set up cloud file storage, content delivery, database and AI services within the same account.
  • You can start for free.
  • If you want to move away from Azure Search to your own installation of Elasticsearch or Solr, the similarities in search syntax and features mean your team’s skills will be portable.

Summary

I think the key benefit of Azure Search is that you can get into the game of real, relevance-based search for minimal expense and configuration: you can focus on making the search feature work right in itself.

If you think we can help you decide if search is something you want to improve, or take the first step and get started, get in touch – it’s one of our favourite technologies, so we’d love to talk about it.

Thanks for reading!