Limite padrão (threshold)

Você quer confirmar que o DecisionTreeClassifier() usa o mesmo limite padrão de classificação mencionado na lição anterior, isto é, 0,5. Parece estranho para você que todos os classificadores usem o mesmo limite. Vamos conferir! Um classificador de árvore de decisão ajustado clf já foi pré-carregado, assim como os dados de treino e teste com seus nomes usuais: X_train, X_test, y_train e y_test. Você precisará extrair as probabilidades do classificador usando o método .predict_proba().

Este exercicio faz parte do curso

Projetando Workflows de Machine Learning em Python

Ver curso

Instruções do exercicio

Gere pontuações para os exemplos de teste usando o classificador pré-carregado clf.
Agora extraia os rótulos a partir das pontuações. Lembre-se de que você tem um par de pontuações para cada exemplo, não uma única, e o segundo elemento é a probabilidade da classe positiva.
Agora rotule os dados de teste usando o método padrão .predict()
Por fim, compare com as previsões que você obteve antes. Elas são idênticas?

exercicio interativo prático

Tente este exercicio completando este código de exemplo.

# Score the test data using the given classifier
scores = clf.____(____)

# Get labels from the scores using the default threshold
preds = [s[____] > ____ for s in scores]

# Use the predict method to label the test data again
preds_default = clf.____(____)

# Compare the two sets of predictions
____(preds == preds_default)

Editar e Executar Código

Projetando Workflows de Machine Learning em Python

AvançadoNível de habilidade

4.8+

94 reviews

In the previous chapters you established a solid foundation in supervised learning, complete with knowledge of deploying models in production but always assumed you a labeled dataset would be available for your analysis. In this chapter, you take on the challenge of modeling data without any, or with very few, labels. This takes you into a journey into anomaly detection, a kind of unsupervised modeling, as well as distance-based learning, where beliefs about what constitutes similarity between two examples can be used in place of labels to help you achieve levels of accuracy comparable to a supervised workflow. Upon completing this chapter, you will clearly stand out from the crowd of data scientists in confidently knowing what tools to use to modify your workflow in order to overcome common real-world challenges.

Exercise 1: Anomaly detection Exercise 2: A simple outlier Exercise 3: LoF contamination Exercise 4: Novelty detection Exercise 5: A simple novelty Exercise 6: Three novelty detectors Exercise 7: Contamination revisited Exercise 8: Distance-based learning Exercise 9: Find the neighbor Exercise 10: Not all metrics agree Exercise 11: Unstructured data Exercise 12: Restricted Levenshtein Exercise 13: Bringing it all together Exercise 14: Concluding remarks