Unifying flight formats I
Cleaning data is important because frequently, you may acquire messy data that is not ready for analysis.
In this exercise, you need to get every register with more than 100 delays from the flight_statistics
table. In a unique column, you have to concatenate the carrier_code
, registration_code
, and airport_code
, having a similar format to this one: "AA - 000000119, JFK"
.
When analyzing the flight_statistics
table, you realize that some registration_code
values have different formats. A correct registration_code
must have nine digits, and if it has fewer, you need to complete it with leading zeros.
To do this, you can use the REPLICATE()
function in combination with LEN()
and CONCAT()
.
This exercise is part of the course
Cleaning Data in SQL Server Databases
Exercise instructions
- Use the appropriate function to concatenate the
carrier_code
, the leading zeros before a registration code, theregistration_code
, andairport_code
columns. - Replicate as many zeros as needed by subtracting 9 from the length of each
registration_code
. - Filter the registers where the
delayed
column is more than 100.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
SELECT
-- Concat the strings
___(
carrier_code,
' - ',
-- Replicate zeros
___('___', 9 - ___(registration_code)),
registration_code,
', ',
airport_code)
AS registration_code
FROM flight_statistics
-- Filter registers with more than 100 delays
WHERE ___ > 100