Session Ready
Exercise

Splitting into columns

You've cleaned up your data considerably by removing the invalid rows from the DataFrame. Now you want to perform some further transformations by generating specific meaningful columns based on the DataFrame content.

You have the spark context and the latest version of the annotations_df DataFrame. pyspark.sql.functions is available under the alias F.

Instructions
100 XP
  • Split the content of the '_c0' column on the tab character and store in a variable called split_cols.
  • Add the following columns based on the first four entries in the variable above: folder, filename, width, height on a DataFrame named split_df.
  • Add the split_cols variable as a column.