Quepinch

Fuzzy string matching

May 20, 2020

What is Fuzzy string matching?

Fuzzy string matching algorithm tries to give a value to a word matching with the word which the user wants to find. The value shows how much the word matches the word by calculating dissimilarity between the strings.

Metrics used to calculate similarity:

  • Hamming Distance - If we want to compare words of the same length.
  • Levenstein - the distance between two words is the minimum number of single-character edits (insertions, deletions, or substitutions) required to change one word into the other.\

Algorithms that can be used to calculate the values:

  • N-gram
  • NLTK/ word2Vec
  • Soundex - It’s a phonetic algorithm that is used for checking words with similar pronunciation. The hash function can be used to calculate the score for each soundex word comparison but it’s not reversible.

The package used: Fuzzywuzzy

Module: process

Reference: https://www.youtube.com/watch?v=6i5bHQZ-c5U, 
https://www.youtube.com/watch?v=4L0Py4GkmPU&t=47s, 
https://www.youtube.com/watch?v=NRAqIjXaZvw&t=1383s

Code to implement:

from fuzzywuzzy import process
with open("cities.txt", 'r') as f:
    cities = f.read().split("\n")

def get_matches(query, choices, limit = 3):
result = process.extract(query, choices, limit = limit)
Return results

get_matches("deli", cities)
[("Delhi", 89), ("Tirunelveli', 68), ("New Delhi", 70)]

get_matches("ahmedbad", cities)
[("Ahmedabad", 94), ("Ahmednagar', 67), ("Adilabad", 62)]
algorithms
create-fuzzywuzzy
machine-learning
matching
natural-language-processing
neural-networks
string-compare
Author
Ishika Saha

Biryani eater.