Noun usage in fake news
In this exercise, you have been given a dataframe headlines
that contains news headlines that are either fake or real. Your task is to generate two new features num_propn
and num_noun
that represent the number of proper nouns and other nouns contained in the title
feature of headlines
.
Next, we will compute the mean number of proper nouns and other nouns used in fake and real news headlines and compare the values. If there is a remarkable difference, then there is a good chance that using the num_propn
and num_noun
features in fake news detectors will improve its performance.
To accomplish this task, the functions proper_nouns
and nouns
that you had built in the previous exercise have already been made available to you.
This exercise is part of the course
Feature Engineering for NLP in Python
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
headlines[____] = headlines['title'].apply(____)
# Compute mean of proper nouns
real_propn = headlines[headlines['label'] == 'REAL']['num_propn'].mean()
fake_propn = headlines[headlines['label'] == 'FAKE']['num_propn'].____
# Print results
print("Mean no. of proper nouns in real and fake headlines are %.2f and %.2f respectively"%(real_propn, fake_propn))