Get startedGet started for free

Hate in NY state?

1. Hate in NY state?

During this case study, we will examine if the number of hate crimes in New York State is changing over time for different counties. This case study illustrates how generalized linear mixed models can be used for repeated measures over time.

2. Overview of data

This data comes from data dot gov, a federal government data clearinghouse that also includes some state data. This data spans from 2010 to 2016. The data was collected by each county and year. Additionally, the data was broken down by whether the crime was against people, for example, assault, or property, such as vandalism. The data also includes what group was targeted. For this exercise, we will examine the total crimes against all people. Because of how this data is broken down into small counts, a Poisson error term is appropriate for examining the data. I have cleaned up the dataset for analysis in the following exercises.

3. Questions with data

With this data, we will examine if the number of reported hate crimes is changing across the state. We will also examine if hate crimes are changing differently in different counties. Also, as part of this case study, we will go over different methods for presenting our results.

4. Know your target audiences

An important part of being a data scientist is communicating results. To do this, we need to know our target audience. For example, writing a popular blog piece such as FiveThirtyEight or a New York Times article differs greatly from a scientific article. The technical details will vary based upon our target audience. Additionally, sometimes a figure may be better than a table and in other situations a table may be better than a figure.

5. Presenting for "pop" audiences

When presenting to a wider audience, make sure you blend your data into your story. For example, rather than saying "we found the year effect to be statistically significant", you might say, "crime increased over time". Additionally, avoid getting bogged down with technical details. The DataCamp course, Communicating with Data in the Tidyverse, was developed by Timo Grossenbacher, a data journalist, who covers these points in greater detail.

6. Presenting for scientific audiences

In contrast to a broad audience, when presenting to a scientific audience, details are important. Your results should be reproducible and include the technical details necessary to recreate your findings and understand them. Increasingly, this involves the release of your code. Lastly, presenting for scientific audiences requires matching the style of the given academic discipline. The best way to learn this style is to read the journals and proceedings from where you want to publish.

7. Let's practice!

Now, let's look at hate in New York State!