pg_trgm
tags :
Postgres Extension #
- The pg_trgm module from Postgres searches for similar strings based on trigram matching.
- Trigrams in pg_trgm are groups of three characters that ignore non-word characters.
- The similarity of two given strings can be indicated by how many trigrams they share.
The pg_trgm module provides functions and operators for determining the similarity of alphanumeric text based on trigram matching, as well as index operator classes that support fast searching for similar strings.
This module is considered “trusted”, that is, it can be installed by non-superusers who have CREATE privilege on the current database.

Trigrams in a given string #
It is always easier to understand a concept with examples. The set of trigrams in the string “mango” is
m, ma, man, ang, ngo, go
The ‘show_trgm’ function returns all trigrams in a given string, and it is often used for debugging in practice.

In another example, “the|bay”, we can see that only alphanumeric text is considered, which are
t, th, the, he, b, ba, bay, ay

Similairty score by pg_trgm #
The similarity score of two given strings can be measured by the number of trigrams they share. For example, the similarity score between “mango cake” and a string in a different order “cake with mango” is 0.68, “cat food” and “pet food” is 0.38, “coconuts” and the misspelled word “ciconuts” is 0.5.

Fuzzy search with pg_gram #
First, create an extension for pg_trgm if it hasn’t been loaded. Then search for targets that are similar to ‘coffee machine’ and sort the results in descending order on similarity scores. As illustrated in the screenshot below, ‘coffee maker’ is the most similar product to ‘coffee machine’.
CREATE EXTENSION IF NOT EXISTS pg_trgm;
select distinct product_sub_species,
similarity(product_sub_species, 'coffee machine') as score from
ct where product_sub_species % 'coffee machine'
order by score desc;
Let’s take a look at another example with a typo in the query target. The user wanted to search for “chopping board” but entered “chapping board” instead. We can see that the pg_trgm search is able to return “cutting boards” as the closest match and it is exactly what the user wants.

What are the limitations of fuzzy search in Postgres #
There are two main limitations in Postgres’s fuzzy search methods. First, the string matching in Postgres is not a semantic-based (Semantic Search) approach. Both pg_trgm and Levenshtein distance, or soundex, metaphone are character-level comparisons. For example, a SQL query search for “Dog Chews” will not find “Pet Treat” due to the lack of common trigrams.
Second, the fuzzy match in Postgres cannot be applied in a supervised learning scenario. For example, there are 2 datasets with some common columns: “employee_name”, “email”, and “address”. And we need to pair the employees from these datasets to join info about the same person. A typical solution is to compute the similarity score for each column, then run a random forest or other classifiers to learn the underlying rules to detect the same person from different datasets. This is beyond Postgres.
Extension present In Fully Managed services by #
GCP #
DO #
OCI #
Implementing Autocompletion Django #
OCR of Images #
2024-05-23_21-40-30_screenshot.png #

Function Description similarity (text, text ) + real Returns a number that indicates how similar the two arguments are. The range of the result is zero (indicating that the two strings are completely dissimilar) to one (indicating that the two strings are identical). show_trgm(text) + text[] Returns an array of all the trigrams in the given string. (In practice this is seldom useful except for debugging.) word_similarity ( text, text) + real Returns a number that indicates the greatest similarity between the set of trigrams in the first string and any continuous extent of an ordered set oft trigrams in the second string. For details, see the explanation below. strict_word_similarity (text, text) + real Same as word_similarity, but forces extent boundaries to match word boundaries. Since we don't have cross-word trigrams, this function actually returns greatest similarity between first string and any continuous extent of words of the second string. show_ limit0 + real Returns the current similarity threshold used by the % operator. This sets the minimum similarity between two words for them to be considered similar enough to be misspellings of each other, for example. (Deprecated; instead use SHOW pg_trgm. similarity.threshold) set_limit ( real) - real Sets the current similarity threshold that is used by the % operator. The threshold must be between 0 and 1 (default is 0.3). Returns the same value passed in. (Deprecated; instead use SET pg_trgm. similarity.threshold)
2024-05-23_21-55-14_screenshot.png #

Query Editor Query History 1 SELECT show trgm - mango') ; a Output show_trgm text] I - f" m"," ma",ang,'go man,ngo)
2024-05-23_21-56-27_screenshot.png #

Query Editor Query History 1 SELECT show_ trgm ('the bay'); a Output show_trgm text] I - (" b", t", ba", th","ay "bay"he the)
2024-05-23_21-56-55_screenshot.png #

Query Editor Query History 1 select similarity('mango cake', 'cake with mango'); a Output similarity real 0.6875 function similarity Query Editor Query History 1 select similarity('cat food', 'pet food'); a Output similarity real - 0.3846154 similar strings
2024-05-23_22-01-29_screenshot.png #

Query Editor Query History 1 CREATE EXTENSION IF NOT EXISTS pg_trgm; 2 select distinct product.sub.species, 3 similarity(product.sub.species, 'coffee machine') as score from 4 ct where product.sub.species % 'coffee machine' 5 order by score desc; 6 a Output product.sub.species text score real a 1 Coffee Maker 0.47368422 0.36363637 2 Machine Screws 3 Coffee Prep 0.35 4 Cold Beverage Machines 0.31034482 n D fuzzy search with pg_trgm
2024-05-23_22-01-58_screenshot.png #

Query Editor Query History 1 CREATE EXTENSION IF NOT EXISTS pg_trgm; 2 select distinct product.sub.species, 3 similarity(product.sub.species, 'chapping board') as score from 4 ct where product.sub.species % 'chapping board' 5 order by score desc; b a Output productL.sub.species, score text real 1 Cutting Boards 0.36363637 2 Ironing Boards 0.3043478 fuzzy search with pg_trgm
2024-05-23_22-47-43_screenshot.png #

trigram in postgl - AA trigram postgresql ils trigram postgres 8.3 ADVERTISEMENT
OCR of Images #
2024-05-23_21-40-30_screenshot.png #

Function Description similarity (text, text ) + real Returns a number that indicates how similar the two arguments are. The range of the result is zero (indicating that the two strings are completely dissimilar) to one (indicating that the two strings are identical). show_trgm(text) + text[] Returns an array of all the trigrams in the given string. (In practice this is seldom useful except for debugging.) word_similarity ( text, text) + real Returns a number that indicates the greatest similarity between the set of trigrams in the first string and any continuous extent of an ordered set oft trigrams in the second string. For details, see the explanation below. strict_word_similarity (text, text) + real Same as word_similarity, but forces extent boundaries to match word boundaries. Since we don't have cross-word trigrams, this function actually returns greatest similarity between first string and any continuous extent of words of the second string. show_ limit0 + real Returns the current similarity threshold used by the % operator. This sets the minimum similarity between two words for them to be considered similar enough to be misspellings of each other, for example. (Deprecated; instead use SHOW pg_trgm. similarity.threshold) set_limit ( real) - real Sets the current similarity threshold that is used by the % operator. The threshold must be between 0 and 1 (default is 0.3). Returns the same value passed in. (Deprecated; instead use SET pg_trgm. similarity.threshold)
2024-05-23_21-55-14_screenshot.png #

Query Editor Query History 1 SELECT show trgm - mango') ; a Output show_trgm text] I - f" m"," ma",ang,'go man,ngo)
2024-05-23_21-56-27_screenshot.png #

Query Editor Query History 1 SELECT show_ trgm ('the bay'); a Output show_trgm text] I - (" b", t", ba", th","ay "bay"he the)
2024-05-23_21-56-55_screenshot.png #

Query Editor Query History 1 select similarity('mango cake', 'cake with mango'); a Output similarity real 0.6875 function similarity Query Editor Query History 1 select similarity('cat food', 'pet food'); a Output similarity real - 0.3846154 similar strings
2024-05-23_22-01-29_screenshot.png #

Query Editor Query History 1 CREATE EXTENSION IF NOT EXISTS pg_trgm; 2 select distinct product.sub.species, 3 similarity(product.sub.species, 'coffee machine') as score from 4 ct where product.sub.species % 'coffee machine' 5 order by score desc; 6 a Output product.sub.species text score real a 1 Coffee Maker 0.47368422 0.36363637 2 Machine Screws 3 Coffee Prep 0.35 4 Cold Beverage Machines 0.31034482 n D fuzzy search with pg_trgm
2024-05-23_22-01-58_screenshot.png #

Query Editor Query History 1 CREATE EXTENSION IF NOT EXISTS pg_trgm; 2 select distinct product.sub.species, 3 similarity(product.sub.species, 'chapping board') as score from 4 ct where product.sub.species % 'chapping board' 5 order by score desc; b a Output productL.sub.species, score text real 1 Cutting Boards 0.36363637 2 Ironing Boards 0.3043478 fuzzy search with pg_trgm
2024-05-23_22-47-43_screenshot.png #

trigram in postgl - AA trigram postgresql ils trigram postgres 8.3 ADVERTISEMENT
OCR of Images #
2024-05-23_21-40-30_screenshot.png #

Function Description similarity (text, text ) + real Returns a number that indicates how similar the two arguments are. The range of the result is zero (indicating that the two strings are completely dissimilar) to one (indicating that the two strings are identical). show_trgm(text) + text[] Returns an array of all the trigrams in the given string. (In practice this is seldom useful except for debugging.) word_similarity ( text, text) + real Returns a number that indicates the greatest similarity between the set of trigrams in the first string and any continuous extent of an ordered set oft trigrams in the second string. For details, see the explanation below. strict_word_similarity (text, text) + real Same as word_similarity, but forces extent boundaries to match word boundaries. Since we don't have cross-word trigrams, this function actually returns greatest similarity between first string and any continuous extent of words of the second string. show_ limit0 + real Returns the current similarity threshold used by the % operator. This sets the minimum similarity between two words for them to be considered similar enough to be misspellings of each other, for example. (Deprecated; instead use SHOW pg_trgm. similarity.threshold) set_limit ( real) - real Sets the current similarity threshold that is used by the % operator. The threshold must be between 0 and 1 (default is 0.3). Returns the same value passed in. (Deprecated; instead use SET pg_trgm. similarity.threshold)
2024-05-23_21-55-14_screenshot.png #

Query Editor Query History 1 SELECT show trgm - mango') ; a Output show_trgm text] I - f" m"," ma",ang,'go man,ngo)
2024-05-23_21-56-27_screenshot.png #

Query Editor Query History 1 SELECT show_ trgm ('the bay'); a Output show_trgm text] I - (" b", t", ba", th","ay "bay"he the)
2024-05-23_21-56-55_screenshot.png #

Query Editor Query History 1 select similarity('mango cake', 'cake with mango'); a Output similarity real 0.6875 function similarity Query Editor Query History 1 select similarity('cat food', 'pet food'); a Output similarity real - 0.3846154 similar strings
2024-05-23_22-01-29_screenshot.png #

Query Editor Query History 1 CREATE EXTENSION IF NOT EXISTS pg_trgm; 2 select distinct product.sub.species, 3 similarity(product.sub.species, 'coffee machine') as score from 4 ct where product.sub.species % 'coffee machine' 5 order by score desc; 6 a Output product.sub.species text score real a 1 Coffee Maker 0.47368422 0.36363637 2 Machine Screws 3 Coffee Prep 0.35 4 Cold Beverage Machines 0.31034482 n D fuzzy search with pg_trgm
2024-05-23_22-01-58_screenshot.png #

Query Editor Query History 1 CREATE EXTENSION IF NOT EXISTS pg_trgm; 2 select distinct product.sub.species, 3 similarity(product.sub.species, 'chapping board') as score from 4 ct where product.sub.species % 'chapping board' 5 order by score desc; b a Output productL.sub.species, score text real 1 Cutting Boards 0.36363637 2 Ironing Boards 0.3043478 fuzzy search with pg_trgm
2024-05-23_22-47-43_screenshot.png #

trigram in postgl - AA trigram postgresql ils trigram postgres 8.3 ADVERTISEMENT