Levenshtein Distance

Levenshtein Distance

May 29, 2024 | seedling, permanent

tags :

Levenshtein Distance #

Informally, the Levenshtein distance between two words is the minimum number of single-character edits (insertions, deletions or substitutions) required to change one word into the other.

Levenshtein Distance in Postgres #

Different from pg_trgm, Levenshtein distance measures similarity by looking into how different two strings are. As defined in Wikipedia, “Informally, the Levenshtein distance between two words is the minimum number of single-character edits (insertions, deletions or substitutions) required to change one word into the other.”

The smaller the Levenshtein distance is, the more similar the two strings are. The following example shows the Levenshtein distance between ‘cat toys’ and ‘pet toy’ is 4 which indicates ‘cat toys’ is a closer match than ‘dog toys’.

CREATE EXTENSION IF NOT EXISTS fuzzystrmatch;
SELECT distinct
 product_sub_species,
    LEVENSHTEIN(product_sub_species, 'pet toy')
FROM ct
ORDER BY LEVENSHTEIN(product_sub_species, 'pet toy') ASC

OCR of Images #

2024-05-23_22-04-33_screenshot.png #

Query Editor Query History 1 CREATE EXTENSION IF NOT EXISTS fuzzystrmatch; 2 SELECT distinct 3 4 product.sub.species, LEVENSHTEIN (product_sub_species, 'pet toy') 5 FROM ct 6 ORDER BY LEVENSHTEIN(product.sub.species, 'pet toy') ASC a Output product.sub.species, levenshteins text integer 1 Cat Toys 4 a 2 Dog Toys 5 'Cat Toys' is the closest match with the smallest Levenshtein distance.


Links to this note

Go to random page

Previous Next