Divisão em treinamento/teste + cálculo da precisão

É hora de praticar a divisão de dados em conjuntos de treinamento e teste usando o conjunto de dados churn_df!

Foram criadas matrizes do NumPy para você, contendo as variáveis independentes como X e a variável dependente como y.

Este exercicio faz parte do curso

Aprendizado Supervisionado com o scikit-learn

Instruções do exercicio

Importe train_test_split de sklearn.model_selection.
Divida X e y em conjuntos de treinamento e teste, definindo test_size igual a 20%, random_state igual a 42 e garantindo que as proporções dos rótulos da variável dependente reflitam as do conjunto de dados original.
Ajuste o modelo knn aos dados de treinamento.
Calcule e imprima a precisão do modelo para os dados de teste.

exercicio interativo prático

Tente este exercicio completando este código de exemplo.

# Import the module
from ____ import ____

X = churn_df.drop("churn", axis=1).values
y = churn_df["churn"].values

# Split into training and test sets
X_train, X_test, y_train, y_test = ____(____, ____, test_size=____, random_state=____, stratify=____)
knn = KNeighborsClassifier(n_neighbors=5)

# Fit the classifier to the training data
____

# Print the accuracy
print(knn.score(____, ____))

Editar e Executar Código

Este exercicio faz parte do curso

Aprendizado Supervisionado com o scikit-learn

IntermediárioNível de habilidade

4.8+

Comece o curso gratuitamente

In this chapter, you'll be introduced to classification problems and learn how to solve them using supervised learning techniques. You'll learn how to split data into training and test sets, fit a model, make predictions, and evaluate accuracy. You’ll discover the relationship between model complexity and performance, applying what you learn to a churn dataset, where you will classify the churn status of a telecom company's customers.

Exercise 1: Aprendizado de máquina com o scikit-learn Exercise 2: Classificação binária Exercise 3: Fluxo de trabalho de aprendizado supervisionado Exercise 4: O desafio da classificação Exercise 5: k vizinhos mais próximos: ajuste Exercise 6: k vizinhos mais próximos: previsão Exercise 7: Avaliação do desempenho do modelo Exercise 8: Divisão em treinamento/teste + cálculo da precisão

Exercicio Atual

Exercise 9: Sobreajuste e subajuste Exercise 10: Visualização da complexidade do modelo

In this chapter, you will be introduced to regression, and build models to predict sales values using a dataset on advertising expenditure. You will learn about the mechanics of linear regression and common performance metrics such as R-squared and root mean squared error. You will perform k-fold cross-validation, and apply regularization to regression models to reduce the risk of overfitting.

Exercise 1: Introduction to regression Exercise 2: Creating features Exercise 3: Building a linear regression model Exercise 4: Visualizing a linear regression model Exercise 5: The basics of linear regression Exercise 6: Fit and predict for regression Exercise 7: Regression performance Exercise 8: Cross-validation Exercise 9: Cross-validation for R-squared Exercise 10: Analyzing cross-validation metrics Exercise 11: Regularized regression Exercise 12: Regularized regression: Ridge Exercise 13: Lasso regression for feature importance

Having trained models, now you will learn how to evaluate them. In this chapter, you will be introduced to several metrics along with a visualization technique for analyzing classification model performance using scikit-learn. You will also learn how to optimize classification and regression models through the use of hyperparameter tuning.

Exercise 1: How good is your model?Exercise 2: Deciding on a primary metric Exercise 3: Assessing a diabetes prediction classifier Exercise 4: Logistic regression and the ROC curve Exercise 5: Building a logistic regression model Exercise 6: The ROC curve Exercise 7: ROC AUC Exercise 8: Hyperparameter tuning Exercise 9: Hyperparameter tuning with GridSearchCV Exercise 10: Hyperparameter tuning with RandomizedSearchCV

Learn how to impute missing values, convert categorical data to numeric values, scale data, evaluate multiple supervised learning models simultaneously, and build pipelines to streamline your workflow!

Exercise 1: Preprocessing data Exercise 2: Creating dummy variables Exercise 3: Regression with categorical features Exercise 4: Handling missing data Exercise 5: Dropping missing data Exercise 6: Pipeline for song genre prediction: I Exercise 7: Pipeline for song genre prediction: II Exercise 8: Centering and scaling Exercise 9: Centering and scaling for regression Exercise 10: Centering and scaling for classification Exercise 11: Evaluating multiple models Exercise 12: Visualizing regression model performance Exercise 13: Predicting on the test set Exercise 14: Visualizing classification model performance Exercise 15: Pipeline for predicting song popularity Exercise 16: Congratulations