1. Learn
  2. /
  3. Courses
  4. /
  5. Feature Engineering for Machine Learning in Python

Exercise

Dealing with stray characters (II)

In the last exercise, you could tell quickly based off of the df.head() call which characters were causing an issue. In many cases this will not be so apparent. There will often be values deep within a column that are preventing you from casting a column as a numeric type so that it can be used in a model or further feature engineering.

One approach to finding these values is to force the column to the data type desired using pd.to_numeric(), coercing any values causing issues to NaN, Then filtering the DataFrame by just the rows containing the NaN values.

Try to cast the RawSalary column as a float and it will fail as an additional character can now be found in it. Find the character and remove it so the column can be cast as a float.

Instructions 1/2

undefined XP
    1
    2
  • Attempt to convert the RawSalary column of so_survey_df to numeric values coercing all failures into null values.
  • Find the indexes of the rows containing NaNs.
  • Print the rows in RawSalary based on these indexes.