ggskewboxplots: Enhanced Boxplots for Skewed Data in R
By: Mustafa Cavus
Potential Business Impact:
Finds real outliers in messy data better.
Traditional boxplots are widely used for summarizing and visualizing the distribution of numerical data, yet they exhibit significant limitations when applied to skewed or heavy-tailed distributions, often leading to misclassification of outliers through swamping -- flagging typical observations as outliers -- or masking -- failing to detect true outliers. This paper addresses these limitations by systematically evaluating several alternative boxplots specifically designed to accommodate distributional asymmetry. We introduce ggskewboxplots, an R package that integrates multiple robust and skewness-aware boxplot variants, providing a unified and user-friendly framework for exploratory data analysis. Using extensive Monte Carlo simulations under controlled skewness and kurtosis conditions, implemented via the mosaic approach based on the Skewed Exponential Power distribution, we assess the sensitivity and specificity of each method. Simulation results indicate that classical Tukey-style boxplots are highly prone to swamping and masking, whereas robust skewness-adjusted variants -- particularly those leveraging quartile-based skewness measures or medcouple-based adjustments -- achieve substantially better performance. These findings offer practical guidance for selecting reliable boxplot methods in applied settings and demonstrate how the ggskewboxplots package facilitates accessible, distribution-aware visualizations within the familiar ggplot2 workflow.
Similar Papers
Unifying Boxplots: A Multiple Testing Perspective
Methodology
Makes charts better at finding weird data points.
The bixplot: A variation on the boxplot suited for bimodal data
Methodology
Shows hidden groups in data.
A novel approach to generate distributions
Methodology
Makes math models fit real-world numbers better.