Session Ready
Exercise

Logistic regression for breast cancer

In the last exercise, we did a first evaluation of the data. In this exercise, you will define a training and testing split for a logistic regression model on a breast cancer dataset. This is an important first step to running all machine learning models.

The breast cancer dataset is a sample dataset from sklearn with various features from patients, and a target value of whether or not the patient has breast cancer. The data comes in a dictionary format, where the main data is stored in an array called data, and the target values are stored in an array called target. Hence, cancer_data.data will be features and cancer_data.target as targets. Sample data is loaded as cancer_data along with pandas as pd. LogisticRegression is available via sklearn.linear_model.

Instructions
100 XP
  • Define both X and y using data and target, respectively.
  • Make X_train and y_train the first 300 samples of X and y, respectively, using X[:300] for X_train.
  • Make X_test and y_test the remainder of X and y, respectively (excluding those first 300 samples), using X[300:] for X_test.