Get startedGet started for free

Dataset contents and descriptive statistics

1. Descriptive statistics with R

In this video you will learn how to read in the abalone CSV formatted dataset, inspect it, and get descriptive statistics. These tasks are important for evaluating data quality and understanding what is contained within your dataset.

2. Loading external CSV datasets

The abalone dataset contains 9 variables including three size dimensions, whole weight and component weights, sex and number of rings. Measurements are provided for 4177 abalones.

3. Loading external CSV datasets

The abalone dataset is available in CSV format. You will use the read_csv function from the readr package to import the abalone CSV dataset.

4. Importing CSV datasets

The import procedure is used in SAS to import a CSV file into the WORK library. In R you can use the read_csv function from the readr package to import the abalone dataset into R's global environment.

5. Importing CSV datasets

To use the read_csv function from readr, the readr package must first be loaded using the library command. The syntax of two colons is used between the readr package name and the read_csv function name.

6. Importing CSV datasets

Notice that in both SAS and R the filename abalone.csv is provided between quotation marks.

7. Importing CSV datasets

To save the abalone data into R's global environment, the output from the read_csv function is assigned to an object named abalone. The assign operator is the less than symbol followed by a dash.

8. Contents of dataset

Similar to SAS's proc contents, the str or structure function in R can be used to inspect the abalone dataset. The str function outputs a list of the variables in the dataset, the variable types (like character, numeric or integer), and a quick view of the data.

9. Contents of dataset

The R functions dim and names are helpful for getting details on datasets. The dim or dimension function shows 4177 rows and 9 columns in the abalone dataset. The names function shows the 9 variable names.

10. Dataset contents and variable types

In addition to the str, dim and names functions, the head and tail functions display the top or bottom 6 rows of the dataset. The number of rows displayed can be changed by adding a number as the second argument to the head or tail function. The bottom 7 rows are shown here.

11. Working with data using dplyr approach

In this course, you will learn the dplyr package and the pipe operator syntax which is explained on the next slide. In addition to the pipe operator, you will also learn about dplyr's arrange, pull and select functions.

12. dplyr arrange function and pipe %>% approach

The first argument of dplyr's arrange function is the name of the dataset abalone and the second argument is the variable diameter to arrange or sort the data by.

13. dplyr arrange function and pipe %>% approach

The dplyr programming approach using the pipe operator, which is the percent greater than percent symbol, is shown here.

14. dplyr arrange function and pipe %>% approach

To use the pipe operator, you list the dataset object first, so that the dataset is piped or sent to the arrange function to sort the abalone dataset by diameter.

15. dplyr arrange function and pipe %>% approach

This second line of code with the pipe operator can be read to say load the abalone data and then arrange it by diameter.

16. Arrange abalones by diameter

Here is abalone dataset sorted by diameter. The top 10 rows are displayed.

17. Extract one variable from abalone

The dplyr pull function extracts one variable. Let's pull out abalone shuckedWeight. The output shows the first 108 shucked weights.

18. Compute mean and median shucked weight

After extracting one variable using the dplyr pull function, you can pipe that output into functions like mean and median to compute summary statistics.

19. Select two variables from abalone

To select two or more variables, you will use the dplyr select function. Abalone length and height are shown here.

20. Get summary statistics of length and height

After selecting variables length and height, the summary function can be run to provide the minimum, maximum, mean, median and quartile statistics.

21. Let's go find out about abalones

Let's go find out about abalones

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.