Pipeline data issue
After creating your quick pipeline, you provide the json file to an analyst on your team. After loading the data and performing a couple exploratory tasks, the analyst tells you there's a problem in the dataset while trying to sort the duration data. She's not sure what the issue is beyond the sorting operation not working as expected.
Date Flight Number Airport Duration ID
09/30/2015 2287 ANC 409 107962
12/28/2015 1408 OKC 41 141917
08/11/2015 2287 ANC 410 87978
After analyzing the data, which command would fix the issue?
This exercise is part of the course
Cleaning Data with PySpark
Hands-on interactive exercise
Turn theory into action with one of our interactive exercises
