Comparing names with DIFFERENCE()
In the previous exercise, you used SOUNDEX()
to check the names of the statisticians from the flight_statistics
table.
This time, you want to do something similar, but using the DIFFERENCE()
function. DIFFERENCE()
returns 4 when there is a similar or identically matching between two strings, and 0 when there is little or no similarity,
If the result of DIFFERENCE()
between two strings is 4, but the texts you are comparing are different, you will find the data you need to clean.
This is a part of the course
“Cleaning Data in SQL Server Databases”
Exercise instructions
- Select the distinct values of
statistician_name
andstatistician_surname
columns fromS1
. - Inner join the
flight_statistics
table asS2
on similar-sounding first names and surnames on instances where theDIFFERENCE
between each table's column is 4. - Filter out values where the
statistician_name
andstatistician_surname
columns are different from each other inS1
andS2
respectively.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
SELECT
-- First name and surnames of the statisticians
DISTINCT S1.___, S1.___
-- Join flight_statistics with itself
FROM ___ S1 INNER JOIN ___ S2
-- The DIFFERENCE of the first name and surname has to be equals to 4
ON ___(S1.___, S2.___) = 4
AND ___(S1.___, S2.___) = 4
-- The texts of the first name or the texts of the surname have to be different
WHERE S1.___ <> S2.___
OR S1.___ <> S2.___