Smart search keeps your database clean

#RPA

#smartsearch

#predictivedatabase

author

The correct company information is found even with typos in the name and a changed address.

Keeping your database clean

Let’s say you want to add a new company, “ABC Holdings”, in your customer database. Which version would you type in?

  1. ABC Holdings - Street 15, New York
  2. ABC Holdings Inc - 15 Street, New York
  3. ABC Holdings, Inc. - 15 Street, NY

You probably have a standard format you always use. But is it the same as what your colleagues use?

Now we get to the problem: How do we know for sure whether or not this company has already been contacted by our colleagues?

Unfortunately, the common solution is to implement strict and rigorous guidelines for adding new entries. Normally these systems consist of endless drop-down menus and dozens of fields that require you to fill in information you had no idea existed.

A more pleasant solution is to implement smart search that finds possible duplicates when a new entry is being registered. A simple keyword search helps a lot already, but for a more reliable and robust search you’ll need to consider the relationships between data points and their different variations. That’s when you need machine learning. I’ll show you how to get it done with Aito and it’ll take less than ten minutes, I promise.

Upload a dataset

I picked up the data for this experiment at the USA public catalogue. It contains the basic information of 3745 American companies in a tab separated .txt format. After wrestling with the file for a bit, I turned it into a nice and smooth .csv which you can get here.

IDNameZip_CodeStreetBuildingCityStateNumber
1904ABRAHAM & CO., INC.8294523724 47TH STREET-GIG HARBORWA98335
2303ROSPERA FINANCIAL SERVICES, INC.8281645429 LBJ FREEWAYSUITE 400DALLASTX75240
2554AEI SECURITIES, INC.8167501300 WELLS FARGO PLACE30 SEVENTH STREETST. PAULMN55101-4901
........................

Pretty normal looking stuff. Before we get to try out the smart search, we need to upload the dataset into your Aito instance to serve as the learning data. By far the easiest way to do this is to use the Quick Upload feature at the super secret instance management page. Sign up here to get invited and get your own instance.

File upload feature

You can also use Aito Python SDK and CLI or go straight for our REST API to upload your data.

Check the data

The first thing you should do after uploading is to have a quick look at your data to check for any errors or shenanigans. You can run the following commands on any cURL friendly terminal but using a REST client like Insomnia is way more convenient. Remember to replace the api URL and keys with your own. Here's the first cURL with our public instance:

curl -X POST \
https://public-1.aito.app/api/v1/_query \
-H 'content-type: application/json' \
-H 'x-api-key: bvss2i2dIkaWUfBCdzEO89LpxUkwO3A24hYg8MBq' \
-d '
{
  "from": "company_info",
  "limit": 1
}'

And the response below looks all good:

{
  "Building": "SUITE 210",
  "City": "CHERRY HILL",
  "ID": 9319,
  "Name": "BCG SECURITIES, INC.",
  "Number": "08002",
  "State": "NJ",
  "Street": "51 HADDONFIELD ROAD",
  "Zip_Code": 812680
}

Now onto the fun part!

Smart search

Aito offers the _similarity API endpoint specifically designed for identifying similar entries in the database.

Let’s use the above company information and give it a small twist. We’ll leave out some of the data points and remove the “, Inc.” from the company name.

curl -X POST \
https://public-1.aito.app/api/v1/_similarity \
-H 'content-type: application/json' \
-H 'x-api-key: bvss2i2dIkaWUfBCdzEO89LpxUkwO3A24hYg8MBq' \
-d '
{
  "from": "company_info",
    "similarity": {
      "Building": "SUITE 210",
      "City": "CHERRY HILL",
      "Name": "BCG SECURITIES",
      "State": "NJ",
      "Street": "51 HADDONFIELD ROAD"
    },
  "limit": 1
}'

Ta-da! Aito returns the right company information we wanted. As you can see in the response below, Aito also gives it a “$score” which indicates the strength of the match. We’ll see the score going much lower when the queries get more difficult. This one was pretty easy.

{
  "$score": 1226332.030858179,
  "Building": "SUITE 210",
  "City": "CHERRY HILL",
  "ID": 9319,
  "Name": "BCG SECURITIES, INC.",
  "Number": "08002",
  "State": "NJ",
  "Street": "51 HADDONFIELD ROAD",
  "Zip_Code": 812680
}

Trial by fire

Now we’ll make things much more complex. What if the company moved to a completely different location and there’s a typo in the name?

curl -X POST \
https://public-1.aito.app/api/v1/_similarity \
-H 'content-type: application/json' \
-H 'x-api-key: bvss2i2dIkaWUfBCdzEO89LpxUkwO3A24hYg8MBq' \
-d '
{
  "from": "company_info",
    "similarity": {
      "Building": "A 30",
      "City": "NEW YORK",
      "Name": "BCG SECURTY",
      "State": "NY",
      "Street": "92 HELM STREET"
    },
  "limit": 1
}'

Aito still finds the right company. This time the score is significantly lower, as expected, but it’s multiple times larger than the next closest match. You can see more suggestions and their scores in the response by changing the “limit”: 1 in the query to a higher number.

{
  "$score": 15.109117592387861,
  "Building": "SUITE 210",
  "City": "CHERRY HILL",
  "ID": 9319,
  "Name": "BCG SECURITIES, INC.",
  "Number": "08002",
  "State": "NJ",
  "Street": "51 HADDONFIELD ROAD",
  "Zip_Code": 812680
}

There are a lot more scenarios we could try and see how Aito responds. I encourage you to try it yourself. Copy any of the above queries to a REST client, change the values and see what happens to the score.

Wrapping up

What you probably really care about is how would this work with your own data. There’s only one way to find out. Request access to Aito and you’ll swiftly get your very own instance to test with. And it’s completely free.

And by the way, I made a simple UiPath demo for you to play around with. You'll need to enable UiPath Web Activities in the Manage Packages console. Have fun!

Back to blog list

Fast track to machine learning starts with a free Aito Sandbox.

Locations

Aito Intelligence Oy

Kaivokatu 10 A, 8th floor

00100 Helsinki

Finland

VAT ID FI28756352

See map

470 Ramona St.

Palo Alto

CA 94301, USA

See map

Contact

COVID-19 situation has driven us all to work from homes, please connect with us online. Stay safe & play with data!

About usContact usJoin our Slack workspace

Follow us