Get startedGet started for free

KDE with lots of data

The code supplied will make a basic KDE of the percentage_over_limit for all citations. At first glance, the settings makes sense. We have a lot of data so we set our bin width nice and narrow: just one percent. Also, the rug plot, which has been thoughtfully added, has a lowered alpha of 0.7.

Running this code, you will immediately see it's not a great plot. The density estimate looks like a porcupine and the rug plot is essentially a thick black bar due to massive overlap.

Fix it by upping the bin width a bit to 2.5 and lowering the alpha of the rug plot to 0.05 to try and get some sense of the point overlap. Don't forget to change the subtitle to reflect the change in the kernel width!

This exercise is part of the course

Visualization Best Practices in R

View Course

Exercise instructions

  • Change kernel sd to 2.5
  • Set alpha of rugplot to 0.05
  • Change the subtitle to "Gaussian kernel SD = 2.5" to reflect the new kernel width.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

ggplot(md_speeding, aes(x = percentage_over_limit)) +
    # Increase bin width to 2.5
    geom_density(fill = 'steelblue', bw = 1,  alpha = 0.7) + 
    # lower rugplot alpha to 0.05
    geom_rug(alpha = 0.5) + 
    labs(
        title = 'Distribution of % over speed limit', 
        # modify subtitle to reflect change in kernel width
        subtitle = "Gaussian kernel SD = 1"
    )
Edit and Run Code