Score: 0

ggskewboxplots: Enhanced Boxplots for Skewed Data in R

Published: November 21, 2025 | arXiv ID: 2511.17091v1

By: Mustafa Cavus

Potential Business Impact:

Finds real outliers in messy data better.

Business Areas:
A/B Testing Data and Analytics

Traditional boxplots are widely used for summarizing and visualizing the distribution of numerical data, yet they exhibit significant limitations when applied to skewed or heavy-tailed distributions, often leading to misclassification of outliers through swamping -- flagging typical observations as outliers -- or masking -- failing to detect true outliers. This paper addresses these limitations by systematically evaluating several alternative boxplots specifically designed to accommodate distributional asymmetry. We introduce ggskewboxplots, an R package that integrates multiple robust and skewness-aware boxplot variants, providing a unified and user-friendly framework for exploratory data analysis. Using extensive Monte Carlo simulations under controlled skewness and kurtosis conditions, implemented via the mosaic approach based on the Skewed Exponential Power distribution, we assess the sensitivity and specificity of each method. Simulation results indicate that classical Tukey-style boxplots are highly prone to swamping and masking, whereas robust skewness-adjusted variants -- particularly those leveraging quartile-based skewness measures or medcouple-based adjustments -- achieve substantially better performance. These findings offer practical guidance for selecting reliable boxplot methods in applied settings and demonstrate how the ggskewboxplots package facilitates accessible, distribution-aware visualizations within the familiar ggplot2 workflow.

Country of Origin
🇹🇷 Turkey

Page Count
16 pages

Category
Statistics:
Methodology