CommencerCommencer gratuitement

Preparing the data

Unlike prior chapters, where we prepared the data for you for unsupervised learning, the goal of this chapter is to step you through a more realistic and complete workflow.

Recall from the video that the first step is to download and prepare the data.

Cet exercice fait partie du cours

Unsupervised Learning in R

Afficher le cours

Instructions

  • Use read.csv() function to download the CSV (comma-separated values) file containing the data from the URL provided. Assign the result to wisc.df.
  • Use as.matrix() to convert the features of the data (in columns 3 through 32) to a matrix. Store this in a variable called wisc.data.
  • Assign the row names of wisc.data the values currently contained in the id column of wisc.df. While not strictly required, this will help you keep track of the different observations throughout the modeling process.
  • Finally, set a vector called diagnosis to be 1 if a diagnosis is malignant ("M") and 0 otherwise. Note that R coerces TRUE to 1 and FALSE to 0.

Exercice interactif pratique

Essayez cet exercice en complétant cet exemple de code.

url <- "https://assets.datacamp.com/production/course_1903/datasets/WisconsinCancer.csv"

# Download the data: wisc.df


# Convert the features of the data: wisc.data


# Set the row names of wisc.data
row.names(wisc.data) <- wisc.df$___

# Create diagnosis vector
diagnosis <- as.numeric(wisc.df$diagnosis == ___)
Modifier et exécuter le code