penguins |>
ggplot(aes(x = flipper_len)) +
stat_ecdf()
36-315: Statistical Graphics and Visualization, Summer 2026
Smoothed densities are good for visualizing the shape of a distribution
There are two ways to produce smoothed densities:
Nonparametric (previous lecture)
Parametric (this lecture)
A distribution is a mathematical function \(f(x \mid \theta)\) where
Let \(f\) denote the distribution for its
Note:
Given \(X \sim N(\mu, \sigma^2)\)
PDF: \(\displaystyle{f}(x \mid \mu, \sigma^2) = \frac{1}{\sigma\sqrt{2\pi}} \exp \left\{ -\frac{1}{2} \left(\frac{x-\mu}{\sigma}\right)^2 \right\}\) for \(x \in (- \infty, \infty)\)
Standard normal distribution: \(N(0, 1)\)
RExample: normal distribution
dnorm(): normal density function
pnorm(): normal cumulative distribution function (fraction of values smaller than)
qnorm(): normal quantile function (inverse of cumulative distribution)
rnorm(): generate normal random variables
Note: Replace “norm” with the name of another distribution, all the same functions apply
See this manual for more details
Instead of trying to estimate the whole \(f(x)\) non-parametrically, assume a particular distribution \(f(x)\) and estimate its parameters
For example, assume \(X_i \sim N(\mu, \sigma^2)\). Then use the observed data to estimate the parameters \[\hat{\mu} = \bar{x} = \frac{1}{n} \sum_{i=1}^n x_i \quad \text{and} \quad \hat{\sigma}^2 = \frac{1}{n-1} \sum_{i=1}^n (x_i - \bar{x})^2\]
The (plug-in) density estimate is \[\hat f(x) = \frac{1}{\hat\sigma\sqrt{2\pi}} \exp \left\{ -\frac{1}{2} \left(\frac{x-\hat\mu}{\hat\sigma}\right)^2 \right\}\]
Compare the ECDF \(\hat{F}(x)\) to the CDF \(F(x)\) of a theoretical distribution
Null hypothesis: the observed data follow a particular theoretical distribution
Test statistic: \(\displaystyle \quad \max_x |\hat{F}(x) - F(x)|\)
If \(\hat{F}(x)\) is far away from \(F(x)\), reject the null hypothesis
flipper_len follows Normal distribution? (i.e., \(H_0:\) flipper_len \(\sim N(\mu, \sigma^2)\))flipper_len is not normally distributedSee Appendix for code
Examples:
clinical trials with multiple treatments
assessing differences across race, gender, socioeconomic status, etc.
industrial experiments, A/B testing
Remember:
It’s useful to visualize and compare conditional distributions
But when are differences in a graphic statistically significant?
We need formal statistical inference (e.g., hypothesis tests)
Use a two-sample KS to compare two empirical distributions \(\hat{F}_A(x)\) and \(\hat{F}_B\)
Null hypothesis: two samples \(A\) and \(B\) follow the same distribution
Test statistic: \(\displaystyle \quad \max_x |\hat{F}_A(x) - \hat{F}_B(x)|\)
If \(\hat F_A\) and \(\hat F_B\) are far away from each other, reject the null hypothesis
spotify_songs <- read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-01-21/spotify_songs.csv") |>
mutate(duration = duration_ms / 60000)
rap_duration <- spotify_songs |> filter(playlist_genre == "rap") |> pull(duration)
rock_duration <- spotify_songs |> filter(playlist_genre == "rock") |> pull(duration)
ks.test(rap_duration, y = rock_duration)
Asymptotic two-sample Kolmogorov-Smirnov test
data: rap_duration and rock_duration
D = 0.22386, p-value < 2.2e-16
alternative hypothesis: two-sided
Any difference at all: two-sample KS test
Difference in means
Null hypothesis: \(H_0: \mu_1 = \mu_2 = \cdots = \mu_K\) (use t.test() or oneway.test())
Can assume the variances are all the same or differ
If \(H_0\) is rejected, can only conclude not all means are equal
Difference in variances
Null hypothesis: \(H_0: \sigma^2_1 = \sigma^2_2 = \cdots = \sigma^2_K\) (use bartlett.test())
If \(H_0\) is rejected, can only conclude not all variances are equal
Note: unlike the KS test, difference in means and variances are sensitive to non-normality
edm latin pop r&b rap rock
6043 5155 5507 5431 5746 4951
Asymptotic two-sample Kolmogorov-Smirnov test
data: rap_duration and pop_duration
D = 0.14569, p-value < 2.2e-16
alternative hypothesis: two-sided
There is a statistically significant difference in duration between rap and pop songs (given large sample size)
edm latin pop r&b rap rock
100 100 100 100 100 100
Asymptotic two-sample Kolmogorov-Smirnov test
data: subset_rap_duration and subset_pop_duration
D = 0.16, p-value = 0.1545
alternative hypothesis: two-sided
There is NO statistically significant difference in duration between rap and pop songs
Using a two-sample \(t\)-test
Null hypothesis: the means of two groups (populations) are equal
Welch Two Sample t-test
data: subset_rap_duration and subset_pop_duration
t = 0.83091, df = 172.78, p-value = 0.4072
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.1461026 0.3585440
sample estimates:
mean of x mean of y
3.694096 3.587875
Using a Bartlett test
Null hypothesis: all variances are equal
Bartlett test of homogeneity of variances
data: list(subset_rap_duration, subset_pop_duration)
Bartlett's K-squared = 15.54, df = 1, p-value = 8.08e-05
Rejects at \(\alpha = 0.05\) even with this smaller sample size!
Why did the KS test reveals no differences while the graphs are clearly different? Two possible reasons:
The sample size might be too small to detect a difference
The KS test is known to have low power
Definition: \(\quad \textsf{power} = P(p\text{-value} \leq \alpha \mid H_0 \text{ is false})\)
Things that affect statistical power:
Larger differences in the data \(\rightarrow\) more power
Smaller variance/error in differences \(\rightarrow\) more power
Larger sample size \(\rightarrow\) more power
More appropriate statistical test \(\rightarrow\) more power
Consider two samples \(\mathbf{X} = (X_1,\dots,X_n) \sim N(0, 1)\) and \(\mathbf{Y} = (Y_1,\dots,Y_n) \sim N(\delta, 1)\)
Use a \(t\)-test for difference in means between \(\mathbf{X}\) and \(\mathbf{Y}\)
Simulate \(\mathbf{X}\) and \(\mathbf{Y}\) 1000 times for some \(n\) and \(\delta > 0\)
Count the number of rejections
\[ \begin{aligned} \textsf{power} &= P(p\text{-value} \leq \alpha \mid H_0 \text{ is false}) \\ &= P(p\text{-value} \leq \alpha \mid \delta > 0) \\ &\approx \frac{\text{# rejections}}{1000} \end{aligned} \]
Consider two samples: \(\mathbf{X} = (X_1,\dots,X_n) \sim N(0, 1)\) and \(\mathbf{Y} = (Y_1,\dots,Y_n) \sim N(0, 1.5)\)
Consider three ways to test differences between \(\mathbf{X}\) and \(\mathbf{Y}\)
\(t\)-test
Bartlett test
KS test
Simulate \(\mathbf{X}\) and \(\mathbf{Y}\) 1000 times for samples sizes \(n = 10, 20, \dots, 1000\)
What do you think the power curves will look like for these methods?
Graphics should be paired with statistical analyses to determine if a true effect versus noise is displayed
Even if there is a true effect, there may be limited power to detect it (some effects are easier to detect than others)
Remember: Power is the probability of rejecting when the null is false
Things that increase statistical power:
Increase sample size
Reduce variance/error
Increase differences/effects
Choose appropriate tests!
# create the ECDF function
fl_ecdf <- ecdf(penguins$flipper_len)
# compute absolute value of the differences between ECDF and theoretical (normal distribution)
abs_ecdf_diff <- abs(fl_ecdf(penguins$flipper_len) - pnorm(penguins$flipper_len, mean = fl_mean, sd = fl_sd))
# find the maximum difference value
max_fl_diff <- penguins$flipper_len[which.max(abs_ecdf_diff)]
penguins |>
ggplot(aes(x = flipper_len)) +
stat_ecdf(color = "darkblue") +
# display normal ECDF
stat_function(fun = pnorm, args = list(mean = fl_mean, sd = fl_sd), color = "black", linetype = "dashed") +
# display KS test line
geom_vline(xintercept = max_fl_diff, color = "red") +
# add text with the test results
annotate(geom = "text", x = 215, y = 0.25, label = "KS test stat = 0.12428\np-value = 5.163e-05") +
labs(x = "Flipper length (mm)", y = "Fn(x)")