Multivariate outlier detection
100 persons living in the same area have filed a claim because their houses were damaged by hail from Sunday night's storm. The dataset hailinsurance
contains 100 observation and 2 variables. The first column contains the payments that were done by the insurance company to each customer whereas the second column is the most recent house price.
In this exercise, you're first going to use classical estimators on the dataset. You will then compare the results with those of robust estimators.
Cet exercice fait partie du cours
Fraud Detection in R
Exercice interactif pratique
Essayez cet exercice en complétant cet exemple de code.
# Create a scatterplot
plot(hailinsurance, xlab = "price house", ylab = "claim")
# Compute the sample mean and sample covariance matrix
clcenter <- colMeans(___)
clcov <- cov(___)
# Add 97.5% tolerance ellipsoid
rad <- sqrt(qchisq(___, ___))
ellipse(center = clcenter, shape = clcov, radius = rad,col = "blue", lty = 2)