1. Intro to PostreSQL extensions
You've learned how basic full text search is a better option when searching text data for a string. Next we're going to learn about some common extensions, fuzzystrmatch and pg_trgm, that enhance the full text search capabilities of PostgreSQL.
2. Intro to PostgreSQL extensions
But first we will learn about the PostgreSQL extension framework in more detail. Most PostgreSQL distributions come bundled with a common set of widely used and supported extensions from the community that can be used by simply enabling them.
Here are a few common extensions:
PostGIS adds support for allowing location queries to be run in SQL.
PostPic allows for image processing within the database.
fuzzystrmatch and pg_trgm provide functions that extend full text search capabilities by finding similarities between strings.
3. Querying extension meta data
To help you discover what extensions are available in your specific PostgreSQL distribution, you can query the pg_available_extensions system view, as shown in this example, to determine a list of extensions that are available to be installed and enabled for use. The results return the name of the first two available extensions.
A similar query of the pg_extension system table will tell you which extensions have already been enabled in your database and are currently available for your use. Here we see only one result, the extension plpgsql.
4. Loading extensions into your database
Any of the extensions that are returned from the pg_available_extensions system view can be loaded into your database and enabled with a simple query using the CREATE EXTENSION command, an example of which is shown here.
The IF NOT EXISTS commands can be used to ensure that if the extension has previously been enabled, the query will not generate an error message.
Now if we query the pg_extension table again by selecting the extname column for all records, we should see that fuzzystrmatch is now listed with plpgsql.
5. Using fuzzystrmatch or fuzzy searching
When preforming a full text search based on user input or looking to perform an analysis and comparison of text data in a natural language processing exercise, a function that you will use often is levenshtein from the fuzzystrmatch extension. The levenshtein function calculates the levenshtein distance between two strings which is the number of edits required for the strings to be a perfect match.
In this example, you see the distance returned is 2 because in order to convert GUMBO to GAMBOL it would require replacing the U with an A and adding an L to the end of the word or two edits.
6. Compare two strings with pg_trgm
The pg_trgm extension provides functions and operators to determine the similarity of two strings using trigram matchings. Trigrams are groups of 3 consecutive characters in a string and based on the number of matching trigrams in two strings will provide a measurement of how similar they are.
This measurement can be calculated using the similarity function of this extension. The similarity function accepts two parameters; the first being the string you wish to compare and the second being the string you wish to compare against. This function will return a number between 0-1 with zero representing no matching trigrams at all and 1 representing a perfect match. In this example we see that using similarity on GUMBO and GAMBOL returns a value of 0.181818.
7. Let's practice!
Great job! Let's practice