Get Started

Comparing names with DIFFERENCE()

In the previous exercise, you used SOUNDEX() to check the names of the statisticians from the flight_statistics table.

This time, you want to do something similar, but using the DIFFERENCE() function. DIFFERENCE() returns 4 when there is a similar or identically matching between two strings, and 0 when there is little or no similarity,

If the result of DIFFERENCE() between two strings is 4, but the texts you are comparing are different, you will find the data you need to clean.

This is a part of the course

“Cleaning Data in SQL Server Databases”

View Course

Exercise instructions

  • Select the distinct values of statistician_name and statistician_surname columns from S1.
  • Inner join the flight_statistics table as S2 on similar-sounding first names and surnames on instances where the DIFFERENCE between each table's column is 4.
  • Filter out values where the statistician_name and statistician_surname columns are different from each other in S1 and S2 respectively.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

SELECT 
    -- First name and surnames of the statisticians
	DISTINCT S1.___, S1.___
-- Join flight_statistics with itself
FROM ___ S1 INNER JOIN ___ S2 
	-- The DIFFERENCE of the first name and surname has to be equals to 4
	ON ___(S1.___, S2.___) = 4
	AND ___(S1.___, S2.___) = 4
-- The texts of the first name or the texts of the surname have to be different
WHERE S1.___ <> S2.___
	OR S1.___ <> S2.___

This exercise is part of the course

Cleaning Data in SQL Server Databases

IntermediateSkill Level
5.0+
1 reviews

Develop the skills you need to clean raw data and transform it into accurate insights.

To begin the course, you will learn why cleaning data is important. You will solve simple problems such as leading and trailing spaces in strings, unifying formats for flight registrations, combining strings and more.

Exercise 1: Introduction to Cleaning DataExercise 2: Unifying flight formats IExercise 3: Unifying flight formats IIExercise 4: Cleaning messy stringsExercise 5: Trimming strings IExercise 6: Trimming strings IIExercise 7: Unifying stringsExercise 8: Comparing the similarity between stringsExercise 9: SOUNDEX() and DIFFERENCE()Exercise 10: Comparing names with SOUNDEX()Exercise 11: Comparing names with DIFFERENCE()

What is DataCamp?

Learn the data skills you need online at your own pace—from non-coding essentials to data science and machine learning.

Start Learning for Free