CommencerCommencer gratuitement

Multivariate outlier detection

100 persons living in the same area have filed a claim because their houses were damaged by hail from Sunday night's storm. The dataset hailinsurance contains 100 observation and 2 variables. The first column contains the payments that were done by the insurance company to each customer whereas the second column is the most recent house price.

In this exercise, you're first going to use classical estimators on the dataset. You will then compare the results with those of robust estimators.

Cet exercice fait partie du cours

Fraud Detection in R

Afficher le cours

Exercice interactif pratique

Essayez cet exercice en complétant cet exemple de code.

# Create a scatterplot
plot(hailinsurance, xlab = "price house", ylab = "claim")

# Compute the sample mean and sample covariance matrix
clcenter <- colMeans(___)
clcov <- cov(___)

# Add 97.5% tolerance ellipsoid
rad <- sqrt(qchisq(___, ___))
ellipse(center = clcenter, shape = clcov, radius = rad,col = "blue", lty = 2)
Modifier et exécuter le code