Get startedGet started for free

Logistic regression for breast cancer

In the last exercise, we did a first evaluation of the data. In this exercise, you will define a training and testing split for a logistic regression model on a breast cancer dataset. This is an important first step to running all machine learning models.

The breast cancer dataset is a sample dataset from sklearn with various features from patients, and a target value of whether or not the patient has breast cancer. The data comes in a dictionary format, where the main data is stored in an array called data, and the target values are stored in an array called target. Hence, cancer_data.data will be features and cancer_data.target as targets. Sample data is loaded as cancer_data along with pandas as pd. LogisticRegression is available via sklearn.linear_model.

This exercise is part of the course

Predicting CTR with Machine Learning in Python

View Course

Exercise instructions

  • Define both X and y using data and target, respectively.
  • Make X_train and y_train the first 300 samples of X and y, respectively, using X[:300] for X_train.
  • Make X_test and y_test the remainder of X and y, respectively (excluding those first 300 samples), using X[300:] for X_test.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Define X and y 
X = cancer_data.____
y = cancer_data.____

# Define training and testing data
X_train = X[____]
X_test = X[____]
y_train = y[____]
y_test = y[____] 
Edit and Run Code