Uniéndolo todo

Una de las ingenieras de tu startup de detección de arritmias entra corriendo en tu despacho para avisarte de que hay un problema con el sensor de ECG para personas con sobrepeso. Decides reducir en un 50 % la influencia de los ejemplos con peso superior a 80. También te dicen que, como tu startup se dirige al mercado del fitness y no hace afirmaciones médicas, asustar innecesariamente a un atleta sale más caro que pasar por alto un posible caso de arritmia. Decides crear una función de pérdida personalizada que haga que cada «falsa alarma» cueste diez veces más que perder un caso de arritmia. ¿Reducir el peso de los sujetos con sobrepeso mejora esta pérdida personalizada? Tus datos de entrenamiento X_train, y_train y de prueba X_test, y_test ya están precargados, al igual que confusion_matrix(), numpy como np y DecisionTreeClassifier().

Este ejercicio forma parte del curso

Diseño de flujos de trabajo de Machine Learning en Python

Ver curso

ejercicio interactivo práctico

Prueba este ejercicio completando este código de ejemplo.

# Create a scorer assigning more cost to false positives
def my_scorer(y_test, y_est, cost_fp=10.0, cost_fn=1.0):
    tn, fp, fn, tp = ____
    return ____

Editar y ejecutar código

Diseño de flujos de trabajo de Machine Learning en Python

AvanzadoNivel de habilidad

4.8+

94 reviews

In the previous chapters you established a solid foundation in supervised learning, complete with knowledge of deploying models in production but always assumed you a labeled dataset would be available for your analysis. In this chapter, you take on the challenge of modeling data without any, or with very few, labels. This takes you into a journey into anomaly detection, a kind of unsupervised modeling, as well as distance-based learning, where beliefs about what constitutes similarity between two examples can be used in place of labels to help you achieve levels of accuracy comparable to a supervised workflow. Upon completing this chapter, you will clearly stand out from the crowd of data scientists in confidently knowing what tools to use to modify your workflow in order to overcome common real-world challenges.

Exercise 1: Anomaly detection Exercise 2: A simple outlier Exercise 3: LoF contamination Exercise 4: Novelty detection Exercise 5: A simple novelty Exercise 6: Three novelty detectors Exercise 7: Contamination revisited Exercise 8: Distance-based learning Exercise 9: Find the neighbor Exercise 10: Not all metrics agree Exercise 11: Unstructured data Exercise 12: Restricted Levenshtein Exercise 13: Bringing it all together Exercise 14: Concluding remarks