library(tidyverse)
theme_set(theme_light())
penguins |>
ggplot(aes(y = species)) +
geom_bar()
36-315: Statistical Graphics and Visualization, Summer 2026
Not only messing up orinithology, but also making a mockery of R tutorials. #Rstats
— brucy (@realbrucy.bsky.social) 10:31 AM · May 14, 2026
[image or embed]
Marginal distribution: probability that a categorical variable \(X\) (e.g., species) takes each particular category \(x\) (Adelie, Chinstrap, Gentoo)
Proportion/percent bar charts display class probabilities
species \(=\) Adelie\()\)Compute proportions “by hand” with count() and mutate()
Use the pipe operator |> to perform multiple operations
Use geom_col(), since we want the bar length to represent values in the data
species = \(C_j\) ) with \(\hat p_j\) for each category \(C_j\) ( \(\hat p_\texttt{Adelie}\), \(\hat p_\texttt{Chinstrap}\), \(\hat p_\texttt{Gentoo}\) )Quantify uncertainty for \(\displaystyle \hat p_j = \frac{n_j}{n}\) with the standard error \[\textsf{se}(\hat{p}_j) = \sqrt{\frac{\hat{p}_j(1 - \hat{p}_j)}{n}}\]
Compute \(\alpha\)-level confidence interval \(\hat{p}_j \pm z_{1 - \alpha / 2} \cdot \textsf{se}(\hat{p}_j)\)
Good rule-of-thumb: construct 95% confidence interval using \(\hat{p}_j \pm 2 \cdot \textsf{se}(\hat{p}_j)\)
Order the bars by proportion
penguins |>
count(species) |>
mutate(prop = n / sum(n),
se = sqrt(prop * (1 - prop) / sum(n)),
lower = prop - 2 * se,
upper = prop + 2 * se,
species = fct_reorder(species, prop)) |>
ggplot(aes(x = prop, y = species)) +
geom_col() +
geom_errorbar(aes(xmin = lower, xmax = upper),
color = "blue", width = 0.2, linewidth = 1)Compute the \(p\)-value
The \(p\)-value is the probability of observing a test statistic at least as extreme as the observed statistic, under the assumption that null is true
Is test statistic “unusual” compared to what we would expect under the null?
Decide whether to reject the null hypothesis
Compare \(p\)-value to the target error rate (or significance level) \(\alpha\)
Typically choose \(\alpha = 0.05\) (the origins of 0.05)
In other words, if we reject the null hypothesis at \(\alpha = 0.05\), then, assuming \(H_0\) is true, there is a 5% chance it is a false positive (also known as Type I error)
Null hypothesis: \(H_0\): \(p_1 = p_2 = \cdots = p_K\)
Test statistic: \(\displaystyle \chi^2 = \sum_{j=1}^K \frac{(O_j - E_j)^2}{E_j}\), where
\(O_j\): observed counts in category \(j\)
\(E_j\) : expected counts under \(H_0\)
(each category is equally likely to occur with probability \(n/K = p_1 = p_2 = \cdots = p_K\))
Reminder: Anscombe’s Quartet
Statistical inference is the same,
but the graphics are very different
The opposite can be true!
Graphics can be the same,
but statistical inference is very different
Simply add \(p\)-values (or other info) to graph via text
Add confidence intervals to the graph
Need to remember what each CI is for
The CIs on previous slides are for each \(\hat{p}_j\) marginally, NOT jointly
Have to be careful with multiple testing
Comparing overlap between two CIs is NOT exactly the same as directly testing for a significant difference
What we really want is a CI for the difference \(\textsf{CI}(\hat{p}_1 - \hat{p}_2)\), rather than \(\textsf{CI}(\hat{p}_1)\) and \(\textsf{CI}(\hat{p}_2)\) separately
If \(\textsf{CI}(\hat{p}_1)\) and \(\textsf{CI}(\hat{p}_2)\) do not overlap, then \(0 \notin\) \(\textsf{CI}(\hat{p}_1 - \hat{p}_2)\)
However, \(\textsf{CI}(\hat{p}_1)\) and \(\textsf{CI}(\hat{p}_2)\) overlapping does not necessarily imply that \(0 \in\) \(\textsf{CI}(\hat{p}_1 - \hat{p}_2)\)
Roughly speaking:
If CIs do not overlap: evidence of a significant difference
If CIs overlap slightly: ambiguous
If CIs overlap substantially: likely no significant difference
If we compare more than two CIs simultaneously, we must account for multiple testing
In those bar plots, when we determine whether CIs overlap, we make 3 comparisons:
A vs B
A vs C
B vs C
Making multiple comparisons increases the probability of a Type I error beyond 5%
Type I error: rejecting \(H_0\) when \(H_0\) is true
Example: concluding that A and B differ because their CIs don’t overlap, although \(H_0: p_A = p_B\) is true
If we are only interested in comparing A vs B, then just construct 95% CI for A vs B and control error rate at 5%
However, if we perform several comparisons simultaneously, the overall probability of making at least one Type I error becomes greater than 5%.
Basic idea:
Multiple testing corrections make hypothesis tests more conservative (e.g., make \(p\)-values larger)
Equivalently, they produce wider CIs
Goal: control Type I error rate \(\leq 5\%\)
Bonferroni correction: simple, easy to implement, but most conservative
Normally, we reject \(H_0\) when \(p\)-value \(\leq 0.05\)
If making \(K\) comparisons, the Bonferroni correction rejects only if \(\displaystyle p\text{-value} \leq \frac{0.05}{K}\)
Equivalently, instead of plotting 95% CIs, we plot \(\displaystyle\left(1 - \frac{0.05}{K}\right) \times 100 \%\) CIs
Graphics for 1D categorical data (e.g., bar charts) show the empirical distribution
of the categorical variable ( \(\hat{p}_1, \dots, \hat{p}_K\) )
Chi-squared test is a common test for 1D categorical data, testing \(H_0 : p_1 = \cdots = p_K\)
However, from this (global) test alone, we can’t tell which probabilities differ
We can compute individual confidence intervals for each \(\hat{p}_1, \cdots, \hat{p}_K\)
Allows for easy visualization
But can be complicated, especially with respect to multiple testing
Graphs with the same trends can display very different statistical significance (largely due to sample size)