Trying out different methods
Perfect, you already have learned about multiple methods of calculating string distances. Which method to use depends on a lot of circumstances, so it's a good idea to play around with the different methods and their parameters a bit to get to know them better. For this exercise you'll use the search term "Marya Carey" - a mistyped version of the name "Mariah Carey". How similar is the mistyped name to the real one with different methods of string distances?
The goal is to find parameters that will yield a low distance on the two names described above while maintaining a large distance to the other names in the list that are not the person one is searching for.
Este exercício faz parte do curso
Intermediate Regular Expressions in R
Instruções do exercício
- Generate the q-grams for substring length values of
1and2. - Calculate the string distance between
searchandnamesusing the q-gram method for substring length values of1and2. - Calculate the string distance between
searchandnamesby using the"osa"method.
Exercício interativo prático
Experimente este exercício completando este código de exemplo.
search <- "Mariah Carey"
names <- c("M. Carey", "Mick Jagger", "Michael Jackson")
# Pass the values 1 and 2 as "q" and inspect the qgrams
qgrams("Mariah Carey", "M. Carey", q = ___)
qgrams("Mariah Carey", "M. Carey", q = ___)
# Try the qgram method on the variables search and names
stringdist(___, ___, method = "___", q = 1)
stringdist(___, ___, method = "___", q = 2)
# Try the default method (osa) on the same input and compare
stringdist(___, ___, method = "___")