A few colleagues and I have been quietly working away at a little hackathon project to try and help drive consumers towards Irish retailers in the run into Christmas. The premise was simple. Could we use our knowledge of product data acquisition and web crawling to create a meta-search engine, allowing a consumer to search for products and gift ideas (incl. price & availability) across lots of Irish retailers?

The result! https://buyirish.com. Follow us on Twitter for speedy updates about the website.

Gathering the Data

The first major challenge was acquiring the data. Typically, we need to tailor our crawlers on a per-site basis. We would also need to consider:

  • Is the info in the page source, or dynamically injected by javascript? 
  • How is that data presented? 
  • Do we need to identify specific places on the page to extract content? 
  • Do we need to get around anti-bot measures or use web proxies? 

Thankfully Alan, Technical Lead at ChannelSight and our DAX (Data Acquisition) team came up with a clever solution using a combination of services to gather the data in a generic manner. They maintained a central “list of lists” of all the retailers and a .NetCore application handled extracting it from our crawler services and bulk loading it into the BuyIrish platform.

A Rough Design

With the hackathons time constraints, we didn’t want the complexity of a SQL Instance, modelling and pushing migrations with Entity Framework. We use Azure Cosmos DB extensively in our core platform and our production APIs serve millions of requests daily from Azure Cosmos DB. So it was a great candidate for direct storage and could deal with requests at scale. But, we also knew that result relevancy was going to be important and the equivalent of a basic SQL SELECT statement wasn’t going to cut it.

After a little research, we had our answer. We could bulk load the data directly into Azure Cosmos DB and use Azure Cognitive Search to connect to the container. ACS would keep it’s own index up to date based on an hourly check. This gave the advantage of result relevance scoring, and the ability to tweak the scoring profiles if needed.

Getting the data into Azure Cosmos DB proved very easy with the new v3 Cosmos SDK. One of the first tests performed involved a bulk load upsert of 50,000 items into the container taking only fractions of a second. It’s also very easy to auto scale up & down the throughput provisioning on the fly via code.

Creating an Azure Cognitive Services Index

With the data safely in Azure Cosmos DB, our next step was to set up the Azure Cognitive Services (ACS) instance. Setting this up was a breeze. You can create a new instance directly from the Cosmos Resource, and there’s a walkthrough wizard to get the indexer setup using an hourly high-watermark check on the _ts timestamp.

One feature that ACS is sorely missing is the concept for Consistently Random Results. This would have been nice in order to give a fair distribution of views to similar products across multiple retailers, or to build an “Inspire Me” function to return completely random results from a * search.

Serving the results

While I continued with Azure Cosmos DB & ACS, my colleague Daniel was busy getting the website up and running. The front-end is built using a vanilla .NET 4.8 MVC5 project with a bootstrap theme and some custom css/js.

The use case is pretty simple. When the consumer arrives on the site, they can search for a term and ACS will return the most relevant products in order based on it’s internal scoring algorithm and a tweaked scoring profile we’ve provided. 

The consumer can also press “Inspire Me” and a random keyword will be chosen to provide a selection of different products. And finally you can see a retailer listing

Tweaking the algorithm

We did notice some discrepancies in the results. The problem was that some retailers provided extremely verbose product descriptions which might repeat a search term multiple times, while another retailer with a more relevant product might only mention the term once in the product title.

For example, if a user searched for “ACME Phone” which is the more relevant product

– ACME Phone X 

– ACME Phone compatible Phone Cover. Description: Works with ACME Phone Model X, ACME Phone Model Y, ACME Phone Model Z, ACME Phone Model Q

The solution was to provide a scoring profile which over-rode the weights for these results, and now gives a much higher weighting to term occurrence in the product name than in the product description.

Reports and Stats

We wanted to get some very lightweight metrics on two things.

1. What are people searching for?

2. What are people clicking on?

To achieve this, we created a very lightweight click handler that redirects to the retailer site. As the consumer is redirected, we capture some anonymous statistics about the retailer and product they’ve clicked, persisting the stats directly to App Insights. This is a nice quick solution as it allows us to write some quick kusto queries to see how things are performing. On the front end, we also push some basic user analytics to GoogleAnalytics.

Some outstanding bugbears

One thing which is still causing some head-aches is the ability to use Fuzzy Search. Azure Cognitive Services support the Lucene Query syntax. It should be possible to use keyword modifiers like ~ to specify fuzzy matching on certain words. This however led to spurious results. While beneficial for searches like tshirt~ to find results for t-shirts, it caused much poorer results for mis-spellings or keywords that clearly weren’t covered by any retailer. hurling~ led to hits for haflinger horse related products, and attempting to supply numeric modifiers like hurling~1 tanked the results entirely.

New Tech is Fun

Overall, this was a fun little project. It’s always nice to get out of the day-to-day JIRA Backlog and explore some new technology and I can definitely see us having a use for Azure Cognitive Search at some point in the future on our product roadmap. Thanks to my colleagues: Daniel P, Alan, Daniel G, Bogdan, Dorothy, Enda and John that chipped in to help this get live. 

[This post was originally published on Eoin’s Personal site. https://trycatch.me/building-buyirish-dot-com/ . Here you can find an expanded version of the post with additional technical details, links to Azure documentation and code samples].

Learn More about ChannelSight

ChannelSight has a team of experts that can help you to optimise your eCommerce strategy. Contact us today to learn about us.