1. Applying nonparametric tests in experimental analysis
We'll now explore the world of nonparametric tests, which are vital tools in situations where parametric test assumptions don't hold.
2. When to use nonparametric tests
Nonparametric tests come into play when data challenges the usual assumptions of parametric tests. For example, they serve as an alternative to needing to transform data in order for normality assumptions to hold.
They're ideal for ordinal data or distributions far from normality,
offering resilience against outliers and accommodating a wider range of data behaviors.
3. Exploring nonparametric methods
When data doesn't meet parametric assumptions, nonparametric methods offer a solution.
The Mann-Whitney U Test is our go-to for comparing two independent groups - the non-parametric alternative to the independent two-sample t-test.
When our experiment involves more than two groups with a numeric response, we turn to the Kruskal-Wallis Test - the non-parametric version of the one-way ANOVA test.
4. Visualizing nonparametric data
Visualizing nonparametric data effectively can reveal underlying patterns. Violin plots offer a comprehensive view of our data's distribution across multiple groups.
Let's compare MineralHardness for Igneous and Metamorphic rocks from our data. We begin by using the .isin() method to extract these two groups of data into a DataFrame called condensed_data.
Next, we use Seaborn's violinplot function on the two variables of interest.
This violin plot contrasts MineralHardness between metamorphic and igneous rocks. Notice that the violins for each do not have a normal shape mirrored vertically, but instead exhibit some skew. Metamorphic rocks show a greater hardness range and lower median than igneous rocks (denoted by the white line in the center of each "violin"). Igneous rocks display smaller hardness variability and higher median values.
5. Visualizing nonparametric data
Boxen plots are an extended version of box plots that provide more information about the shape of the distribution.
We use Seaborn's boxenplot function to display the distribution of MineralHardness across three rock types: metamorphic, igneous, and sedimentary.
Sedimentary rocks show the smallest median hardness value, with outliers indicating some extreme values. Metamorphic rocks show the most skew of the three rock types and have a median hardness between that of sedimentary and igneous. They also have a wider interquartile range, indicating significant variability. Igneous rocks exhibit the highest median hardness and a narrower interquartile range, suggesting less variability.
6. Applying nonparametric tests - Mann Whitney U
We perform the Mann-Whitney U test to compare the distributions of MineralHardness between igneous and sedimentary rocks using data from the mineral_rocks DataFrame. We select the hardness values corresponding to each rock type and apply the test to determine if there's a statistically significant difference in their medians.
The test returns a p-value of 0.9724. The high p-value indicates that there is no significant difference in the median mineral hardness between igneous and sedimentary rocks at the common significance levels.
7. Applying nonparametric tests - Kruskal-Wallis
We apply the Kruskal-Wallis test, a nonparametric method, to determine if there are statistically significant differences in mineral hardness distributions across igneous, sedimentary, and metamorphic rock types from the mineral_rocks dataset. It computes the p-value for the hypothesis that the medians of all groups are equal.
This test returns a p-value of 0.0630, which indicates that there's a suggestion of a difference in medians, but it does not reach the conventional significance threshold of 0.05. Therefore, while there may be differences in mineral hardness by rock type, they are not statistically significant at the 5% level.
8. Let's practice!
Time to test out your nonparametric knowledge!