Session Ready
Exercise

Customizing your pandas import

The pandas package is also great at dealing with many of the issues you will encounter when importing data as a data scientist, such as comments occurring in flat files, empty lines and missing values. Note that missing values are also commonly referred to as NA or NaN. To wrap up this Chapter, you're now going to import a slightly corrupted copy of the Titanic dataset titanic_corrupt.txt, which

  • contains comments after the character '#';
  • is tab-delimited;
Instructions
100 XP
  • Complete the sep (the pandas' version of delim), comment and na_values arguments of pd.read_csv(). comment takes characters that comments occur after in the file, which in this case is '#'. na_values takes a list of strings to recognize as NA/NaN.
  • Execute the rest of the code to print the head of the resulting DataFrame and plot the histogram of the 'Age' of passengers aboard the Titanic.