More datasets

Welcome to the Logistic regression chapter.

During the next exercises, we will be combining, wrangling and analysing two new data sets retrieved from the UCI Machine Learning Repository, a great source for open data.

The data are from two identical questionaires related to secondary school student alcohol comsumption in Portugal. Read about the data and the variables here.

R offers the convenient paste() function which makes it easy to combine characters. Let's utilize it to get our hands on the data!

This exercise is part of the course

Helsinki Open Data Science

View Course

Exercise instructions

Create and print out the object url_math.
Create object math by reading the math class questionaire data from the web address defined in url_math.
Create and print out url_por.
Adjust the code: similarily to url_math, make url_por into a valid web address using paste() and the url object.
Create object por by reading the portuguese class questionaire data from the web address defined in url_por.
Print out the names of the columns in both data sets.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

url <- "http://s3.amazonaws.com/assets.datacamp.com/production/course_2218/datasets"

# web address for math class data
url_math <- paste(url, "student-mat.csv", sep = "/")

# print out the address
url_math

# read the math class questionaire data into memory
math <- read.table(url_math, sep = ";" , header=TRUE)

# web address for portuguese class data
url_por <- paste("replace me!", "student-por.csv", sep =" change me! ")

# print out the address


# read the portuguese class questionaire data into memory
por <- read.table(url_por, sep = ";", header = TRUE)

# look at the column names of both data
colnames(math)

Edit and Run Code