LoslegenKostenlos loslegen

Mosaic plot

The spine plot you have created in the previous exercise allows you to study missing data patterns between two variables at a time. This idea is generalized to more variables in the form of a mosaic plot.

In this exercise, you will start by creating a dummy variable indicating whether the United States was involved in the production of each movie. To do this, you will use the grepl() function, which checks if the string passed as its first argument is present in the object passed as its second argument. Then, you will draw a mosaic plot to see if the subject's gender correlates with the amount of missing data on earnings for both US and non-US movies.

The biopics data as well as the VIM package are already loaded for you. Let's do some exploratory plotting!

Note that a proprietydisplay_image()function has been created to return the output from the latestVIMpackage version. Make sure to expand theHTML Viewer section.

Diese Übung ist Teil des Kurses

Handling Missing Data with Imputations in R

Kurs anzeigen

Anleitung zur Übung

  • Feed the biopics data into the dplyr pipeline.
  • Create a dummy variable is_US_movie that is TRUE if country contains the string "US" and is FALSE otherwise.
  • Draw a mosaic plot that shows the amount of missing data in "earnings" split by"is_US_movie" and "sub_sex", while remembering to pass variable names as strings.

Interaktive Übung

Vervollständige den Beispielcode, um diese Übung erfolgreich abzuschließen.

# Prepare data for plotting and draw a mosaic plot
___ %>%
	# Create a dummy variable for US-produced movies
	mutate(is_US_movie = grepl(___, ___)) %>%
	# Draw mosaic plot
	mosaicMiss(highlight = ___, 
             plotvars = c(___, ___))

# Return plot from latest VIM package - expand the HTML viewer section
display_image()
Code bearbeiten und ausführen