Hashtags en vermeldingen in Russische tweets

Laten we de tweets-dataframe met Russische tweets erbij pakken. In deze oefening bereken je het aantal hashtags en vermeldingen in elke tweet door twee functies te definiëren, count_hashtags() en count_mentions(), en die toe te passen op de feature content van tweets.

Voor het geval je het niet meer weet: de tweets staan in de feature content van tweets.

Deze oefening maakt deel uit van de cursus

Feature Engineering voor NLP in Python

Bekijk cursus

Interactieve oefening met praktijkervaring

Probeer deze oefening door deze voorbeeldcode aan te vullen.

# Function that returns numner of hashtags in a string
def count_hashtags(string):
	# Split the string into words
    words = string.split()
    
    # Create a list of words that are hashtags
    hashtags = [word for word in words if ____.____(____)]
    
    # Return number of hashtags
    return(len(hashtags))

# Create a feature hashtag_count and display distribution
tweets['hashtag_count'] = tweets['content'].apply(count_hashtags)
tweets['hashtag_count'].hist()
plt.title('Hashtag count distribution')
plt.show()

Code bewerken en uitvoeren