Manual calculations
Given the following 4 cleaned statements below:
t1 <- "government turtle blue ocean"
t2 <- "crazy turtle ocean waves"
t3 <- "massive turtle washington lion"
t4 <- "lion pride massive ocean dinner"
The \(TFIDF\) for "lion"
in t4
can be calculated as follows:
\(TF = \frac{1}{5} = 0.2\)
\(IDF = log(4/2) = 0.693\)
\(TFIDF = .2 * 0.693\)
Calculate the \(TF\) and \(IDF\) weights for 'turtle'
in t1
. Use \(IDF = log \frac{N}{n_{t}}\)
This exercise is part of the course
Introduction to Natural Language Processing in R
Hands-on interactive exercise
Turn theory into action with one of our interactive exercises
