¡Tu primer pipeline, otra vez!

De vuelta en la startup de arritmias, se acerca tu revisión mensual y, como parte de ella, una persona experta en Python revisará tu código. Decides dejarlo todo más limpio siguiendo buenas prácticas y reemplazar tu script de selección de características y clasificación con random forest por un pipeline. Estás usando un conjunto de entrenamiento disponible como X_train y y_train, y varios módulos: RandomForestClassifier, SelectKBest() y f_classif() para la selección de características, además de GridSearchCV y Pipeline.

Este ejercicio forma parte del curso

Diseño de flujos de trabajo de Machine Learning en Python

Ver curso

Instrucciones del ejercicio

Crea un pipeline con el selector de características dado en el código de ejemplo y un clasificador random forest. Nombra el primer paso feature_selection.
Añade dos pares clave-valor en params: uno para el número de características k del selector con valores 10 y 20, y otro para n_estimators del bosque con valores posibles 2 y 5.
Inicializa un objeto GridSearchCV con el pipeline y la rejilla de parámetros dados.
Ajusta el objeto a los datos e imprime la mejor combinación de parámetros.

ejercicio interactivo práctico

Prueba este ejercicio completando este código de ejemplo.

# Create pipeline with feature selector and classifier
pipe = ___([
    (___, SelectKBest(f_classif)),
    ('clf', ___(random_state=2))])

# Create a parameter grid
params = {
   'feature_selection__k':___,
    ___:[2, 5]}

# Initialize the grid search object
grid_search = ___(___, ___=params)

# Fit it to the data and print the best value combination
print(grid_search.fit(___, ___).___)

Editar y ejecutar código

Diseño de flujos de trabajo de Machine Learning en Python

AvanzadoNivel de habilidad

4.8+

94 reviews

In the previous chapters you established a solid foundation in supervised learning, complete with knowledge of deploying models in production but always assumed you a labeled dataset would be available for your analysis. In this chapter, you take on the challenge of modeling data without any, or with very few, labels. This takes you into a journey into anomaly detection, a kind of unsupervised modeling, as well as distance-based learning, where beliefs about what constitutes similarity between two examples can be used in place of labels to help you achieve levels of accuracy comparable to a supervised workflow. Upon completing this chapter, you will clearly stand out from the crowd of data scientists in confidently knowing what tools to use to modify your workflow in order to overcome common real-world challenges.

Exercise 1: Anomaly detection Exercise 2: A simple outlier Exercise 3: LoF contamination Exercise 4: Novelty detection Exercise 5: A simple novelty Exercise 6: Three novelty detectors Exercise 7: Contamination revisited Exercise 8: Distance-based learning Exercise 9: Find the neighbor Exercise 10: Not all metrics agree Exercise 11: Unstructured data Exercise 12: Restricted Levenshtein Exercise 13: Bringing it all together Exercise 14: Concluding remarks