1. Explore cuisines
In this video, we will build an app to explore distinctive ingredients across cuisines.
2. Explore data
It is always good to start by exploring the data. The recipes dataset has three columns recipe_id, cuisine and ingredient. It is a long-form dataset where the ingredients in a recipe are spread across multiple rows.
3. Preview app
Our app has two inputs, placed in a sidebar, and three outputs, laid out as tabs in a main panel.
4. Add UI: code
Let us scaffold the layout of the UI first, using a sidebarLayout with a sidebarPanel and a mainPanel, containing a tabsetPanel to hold tabPanels.
While we have advocated for handling layouts at a later stage, for complex apps, doing this upfront leads to better previews.
Let us add a selectInput to select the cuisine. We can use the function unique() to get a vector of cuisines.
Next we add a sliderInput to control the number of ingredients displayed.
Now, we will add the outputs, each one wrapped inside a tabPanel.
The first is a d3wordcloudOutput to house the wordcloud.
The second is a plotlyOutput to display an interactive plot of the top ingredients.
Finally, we add a DTOutput for the interactive table of top ingredients.
5. Add UI: preview
Let us preview the app at this point. Looks neat right?
6. Add output: interactive table
Let us set up the interactive data table output using the rendering function DT::renderDT.
To get the top ingredients by selected cuisine, we first filter the recipes data for the selected cuisine given by input$cuisine.
Next, we count the number of recipes by ingredient. Note that
the count function is a handy shortcut for group_by(ingredient) followed by summarize(nb_recipes = n()).
Then we arrange the ingredients in descending order of the number of recipes.
Finally, we select the top N ingredients using input$nb_ingredients.
7. Compute TFIDF
Sorting ingredients by number of recipes they occur in does NOT lead to the most distinctive ingredients in a cuisine. For example, the top ingredients in indian cuisine are salt and onions, which are ubiquitous across cuisines.
An efficient way to solve this problem is to use a metric termed term frequency - inverse document frequency (TFIDF) that is used to surface distinctive words in a collection of documents.
We can compute tf_idf easily by counting the number of recipes in which each ingredient occurs by cuisine, and using the handy bind_tf_idf function from the tidytext package to compute the tf_idf.
Recall that the count function is as a handy shortcut for group_by(ingredients, cuisine) followed by summarize(nb_recipes = n()).
Note how sorting in decreasing order of tf_idf surfaces the most distinctive ingredients of indian cuisine.
8. Add a reactive expression
Let us create a reactive expression to get the top distinctive ingredient.
We start with recipes_enriched, the dataset created previously, filter it for the selected cuisine, arrange in descending order of the tf_idf, select the top ingredients using input$ingredients, and finally, reorder the ingredients in decreasing order of their tf_idf using forcats::fct_reorder.
9. Add outputs: interactive plot and word cloud
Let us finish the app by adding the two outputs.
The first is an interactive horizontal bar plot of ingredient vs. tf_idf. We can create this using ggplot2, and wrap it in renderPlotly to make it interactive.
For the interactive wordcloud, we make use of the d3wordcloud package. It allows us to plot a wordcloud by passing a vector of words and their frequencies. In this case, the words are the ingredients and the counts are the number of recipes they occur in.
Note how the reactive expression is used in both outputs leading to better performance.
10. Preview final app
Putting it all together, we get this wonderful Shiny app.
11. Let's practice!
It is your turn to create this app from scratch.