summary(penguins$flipper_len) Min. 1st Qu. Median Mean 3rd Qu. Max. NAs
172.0 190.0 197.0 200.9 213.0 231.0 2
sd(penguins$flipper_len, na.rm = TRUE)[1] 14.06171
36-315: Statistical Graphics and Visualization, Summer 2026
Discrete: countable and has clear space between values (i.e. whole number only)
Continuous: can take any value within some interval
Center: mean, median, number and location of modes
Spread: range, variance, standard deviation, IQR, etc.
Shape: symmetry, skew, kurtosis (“peakedness”)
Compute various statistics in R with summary(), mean(), median(), quantile(), range(), sd(), var(), etc.
Pros:
Displays outliers, percentiles, spread, skew
Useful for side-by-side comparison
Cons:
Does not display the full distribution shape
Does not display modes
Another example of same stats, different graphs
Three clearly different distributions of data, but all result in the exact same box plot!
Probability that continuous variable \(X\) takes a particular value is 0
flipper_len \(= 200) = 0\) (why?)Split observed data into bins
Count number of observations in each bin
Need to choose the number of bins, adjust with:
bins: number of bins (default is 30)
binwidth: width of bins (overrides bins),
various rules of thumb
breaks: vector of bin boundaries (overrides both bins and binwidth)
A binwidth that is too narrow shows too much detail
A binwidth that is too wide hides detail
Try several values, the ggplot2 default is NOT guaranteed to be an optimal choice
Histograms approximate the PDF with bins, and points are equally likely within a bin
PDF is the derivative of the cumulative distribution function (CDF)
Check out this great interactive tutorial
Goal: estimate the PDF \(f(x)\) for all possible values (assuming it is smooth)
The kernel density estimator (KDE) is \(\displaystyle \hat{f}(x) = \frac{1}{n} \sum_{i=1}^n \frac{1}{h} K_h(x - x_i)\)
\(n\): sample size
\(x\): new point to estimate \(f(x)\) (does NOT have to be in the dataset!)
\(h\): bandwidth, analogous to histogram binwidth, ensures \(\hat{f}(x)\) integrates to 1
\(x_i\): \(i\)th observation in the dataset
\(K_h(x - x_i)\): kernel function, creates weight given distance of \(i\)th observation from new point
as \(|x - x_i| \rightarrow \infty\) then \(K_h(x - x_i) \rightarrow 0\)
(i.e. the further apart the \(i\)th observation is from \(x\), the smaller the weight)
as bandwidth \(h\) increases, weights are more evenly spread out
\(K_h(x - x_i)\) is large when \(x_i\) is close to \(x\)
The Gaussian (normal) kernel is typically used, but there are many other choices
See help(geom_density) for the default bandwidth
Modify the bandwidth using the adjust argument (value to multiply default bandwidth by)
In KDE, the bandwidth parameter is analogous to the binwidth in histograms
Too small bandwidth: density estimate can become overly peaky, main trends in the data may be obscured
Too large bandwidth: features in the distribution of the data may disappear
Always pick a value that is “just right”
The choice of the kernel can affect the shape of the density curve.
A Gaussian kernel typically gives density estimates that look bell-shaped (ish)
A rectangular kernel can generate the appearance of steps in the density curve
Kernel choice matters less with more data points