Manual calculations

Given the following 4 cleaned statements below:

t1 <- "government turtle blue ocean"
t2 <- "crazy turtle ocean waves"
t3 <- "massive turtle washington lion"
t4 <- "lion pride massive ocean dinner"

The \(TFIDF\) for "lion" in t4 can be calculated as follows:

  • \(TF = \frac{1}{5} = 0.2\)

  • \(IDF = log(4/2) = 0.693\)

  • \(TFIDF = .2 * 0.693\)

Calculate the \(TF\) and \(IDF\) weights for 'turtle' in t1. Use \(IDF = log \frac{N}{n_{t}}\)

This exercise is part of the course

Introduction to Natural Language Processing in R

View Course

Hands-on interactive exercise

Turn theory into action with one of our interactive exercises

Start Exercise