Visualizing Uncertainty

Author

Emorie D Beck

Code

library(RColorBrewer)
library(knitr)
library(kableExtra)
library(plyr)
library(broom)
library(modelr)
library(lme4)
library(broom.mixed)
library(tidyverse)
library(ungeviz)
library(ggdist)
library(tidybayes)
library(distributional)
library(gganimate)

Visualizing Uncertainty

Why is visualizing uncertainty important?
- Point estimates are over-emphasized and interval estimates are unemphasized (or ignored)
- Most people misperceive both (1) common uncertainty visualizations and (2) most common uncertainty metrics
- In other words, people make errors about error
- Probability is hard, and most aren’t taught about probability distributions (and more)

Theories of Visualizing Uncertainty

Why do people misperceive uncertainty, and how can we mitigate it?

Quick Side Note: Custom Themes

We have a lot to cover today, so I’m going to skip over some of the usual “how to start with the basics and make it aesthetically pleasing”
Instead, we’ll create a custom these that captures some of our usual additions
This will save us both time and text!
I highly recommend doing this in your own work

Code

my_theme <- function(){
  theme_classic() + 
  theme(
    legend.position = "bottom"
    , legend.title = element_text(face = "bold", size = rel(1))
    , legend.text = element_text(face = "italic", size = rel(1))
    , axis.text = element_text(face = "bold", size = rel(1.1), color = "black")
    , axis.title = element_text(face = "bold", size = rel(1.2))
    , plot.title = element_text(face = "bold", size = rel(1.2), hjust = .5)
    , plot.subtitle = element_text(face = "italic", size = rel(1.2), hjust = .5)
    , strip.text = element_text(face = "bold", size = rel(1.1), color = "white")
    , strip.background = element_rect(fill = "black")
    )
}

Error Bars

First, let’s examine the usual ways that we show uncertainty around point estimates (e.g., means, model parameter estimates, etc.) using interval estimates (+/- 1 SE/D, confidence interval)

Code

pomp <- function(x) (x - min(x, na.rm = T))/(max(x, na.rm = T) - min(x, na.rm = T))*10
gsoep %>%
  filter(year == 2005) %>%
  filter(SID %in% sample(unique(.$SID, 500))) %>%
  mutate(SRhealth = pomp(SRhealth)) %>%
  group_by(SID, age) %>%
  mutate(health = rowMeans(cbind(SRhealth, satHealth))) %>%
  ungroup() %>%
  select(SID, age, health) %>%
  drop_na()

Code

gsoep_desc <- gsoep %>%
  filter(year == 2005 & age < 30) %>%
  filter(SID %in% sample(unique(.$SID), 500)) %>%
  mutate(SRhealth = pomp(SRhealth)) %>%
  group_by(SID, age) %>%
  mutate(health = rowMeans(cbind(SRhealth, satHealth), na.rm = T)) %>%
  ungroup() %>%
  select(SID, age, health) %>%
  drop_na() %>%
  mutate(
    mean = mean(health)
    , sd = sd(health)
    , se = sd/sqrt(n())
    , ci99 = se*2.576
    , ci95 = se*1.96
    , ci80 = se*1.282
    )
gsoep_desc

Code

gsoep_desc <- gsoep_desc %>% 
  select(mean:ci80, health, SID) %>%
  pivot_longer(
    cols = c(-mean, -SID)
    , names_to = "measure"
    , values_to = "value"
    ) %>%
  mutate(SID = ifelse(measure == "health" | row_number() %in% 1:5, SID, NA)) %>%
  drop_na() %>%
  mutate(measure = factor(measure, rev(c("health", "sd", "se", "ci99", "ci95", "ci80"))))
gsoep_desc

Code

gsoep_desc %>%
  ggplot(aes(y = measure, x = mean)) +
    geom_point(size = 3, color = "darkorange3") + 
    geom_jitter(
      data = gsoep_desc %>% filter(measure == "health")
      , aes(x = value), alpha = .5, height = .3, width = 0
    ) + 
    geom_errorbar(
      data =  gsoep_desc %>% filter(measure != "health")
      , aes(xmin = mean - value, xmax = mean + value)
      , width = .1
      ) + 
    geom_point(size = 3, color = "darkorange3") + 
    my_theme()

So What Do We Do?

Lots of things, not all of which we have time for today. See also:
- https://janhove.github.io/visualise_uncertainty/
- https://wilkelab.org/SDS375/slides/visualizing-uncertainty.html#1
- https://wilkelab.org/ungeviz/
- #TeamBayes: http://mjskay.github.io/tidybayes/
- #TeamFrequentist: https://mjskay.github.io/ggdist/

Outline for Today:

Proportions / Probability
- icon array
Point Estimates
- half-eye
- gradient interval
- quantile dotplot
- raincloud
Animated (sometimes)
- hypothetical outcome plots
- ensemble display

Proportions / Probability

We already covered proportions and probability, but this one deserves being highlighted itself
How much of our sample was unmarried?

Code

gsoep %>%
  filter(year == 2012 & !is.na(marital)) %>%
  mutate(marital = ifelse(marital == 4, "Never Married", "Married")) %>%
  group_by(marital) %>%
  tally() %>%
  ungroup() %>%
  mutate(prop = round(n/sum(n)*100))

We have to trick ggplot2 into making this figure with a grid

Code

tibble(
  value = c(rep(1, 76), rep(2,24))
  , x = rep(1:10, each = 10)
  , y = rep(1:10, times = 10)
  )

Code

tibble(
  value = c(rep(1, 76), rep(2,24))
  , y = rep(1:10, each = 10)
  , x = rep(1:10, times = 10)
  ) %>%
  ggplot(aes(x = x, y = y, color = factor(value))) +
    geom_point(shape = "square", size = 5) + 
    theme_void() + 
    theme(legend.position = "none")

Let’s clean it up:

Code

tibble(
  value = c(rep(1, 76), rep(2,24))
  , y = rep(1:10, each = 10)
  , x = rep(1:10, times = 10)
  ) %>%
  ggplot(aes(x = x, y = y, color = factor(value))) +
    geom_point(shape = "square", size = 8) + 
    scale_color_manual(values = c("lightgrey", "darkblue")) + 
    labs(title = "24% Remained Unmarried in 2012") + 
    theme_void() + 
    theme(legend.position = "none"
          , plot.title = element_text(hjust = .5, face = "bold"))

Point Estimates

Most often, we want to visualize either point esimates or other visualizations of models
We touched on this a couple of weeks ago when we talked about broom and broom.mixed
I mentioned then that one of the challenges comes from where the interval estimate comes from, which includes:
- (Frequentist) Standard Errors
- (Frequentist) Confidence Intervals
- Bootstrapped / Profile Confidence Intervals
- Prediction Intervals
- (Bayesian) Credible Intervals
- (Bayesian) Posterior Distributions
I’ll stay out of Bayes for now :(

Code

gsoep_ex <- gsoep %>%
  filter(year == "2000") %>%
  select(SID, age, marital, gender) %>%
  inner_join(
    gsoep %>%
      filter(year == "2015") %>%
      select(SID, SRhealth)
  ) %>%
  mutate(marital = factor(
    marital
    , 1:4
    , c("Married", "Separated", "Widowed", "Never Married")
    ), age = age/10
    , gender = factor(gender, c(1,2), c("Male", "Female"))) %>%
  drop_na()
gsoep_ex

Point Estimates From Model Predictions

Marginal Means from Model Predictions

Point Estimates and Marginal Means

stat_halfeye(): Visual Boundaries
stat_eye(): Visual Boundaries
stat_gradientinterval(): Visual Semiotics
stat_dots(): Frequency Framing
stat_dotsinterval(): Frequency Framing
stat_halfeye()+ stat_dots(): Visual Boundaries + Frequency Framing

Core syntax

The benefit of ggdist is that it allows you to use essentially identical syntax to produce lots of different kinds of plots
All we have to do is swap out the geom

Code

m1 <- lm(SRhealth ~ marital + age, data = gsoep_ex)
tidy1 <- tidy(m1)

tidy1 %>%
  filter(term != "(Intercept)") %>%
  ggplot(aes(y = term)) + 
    stat_halfeye(
        aes(xdist = dist_student_t(df = df.residual(m1), mu = estimate, sigma = std.error))
    ) +
  my_theme()

`stat_halfeye()`

We can pull the predictions from model terms or marginal means

Model Terms:

Code

tidy1 %>%
  filter(term != "(Intercept)") %>%
  ggplot(aes(y = term)) + 
    stat_halfeye(
        aes(xdist = dist_student_t(df = df.residual(m1), mu = estimate, sigma = std.error))
    ) +
  my_theme()

Marginal Means:

Code

gsoep_ex %>%
  data_grid(marital) %>%
  mutate(age = mean(gsoep_ex$age)) %>%
  augment(m1, newdata = ., se_fit = T) %>%
  ggplot(aes(y = marital)) + 
    stat_halfeye(
        aes(xdist = dist_student_t(df = df.residual(m1), mu = .fitted, sigma = .se.fit)), 
    ) +
  my_theme()

Let’s do a little hack and create our whole plots except the geom, so that we can build them with less syntax:

Code

p1 <- tidy1 %>%
  filter(term != "(Intercept)") %>%
  ggplot(aes(y = term)) + 
  labs(
    x = "Parameter Estimate"
    , y = NULL
    , title = "Model Estimates"
    , caption = "Outcome: Self-Rated Health"
    ) +
  my_theme()

Code

p2 <- gsoep_ex %>%
  data_grid(marital) %>%
  mutate(age = mean(gsoep_ex$age)) %>%
  augment(m1, newdata = ., se_fit = T) %>%
  ggplot(aes(y = marital)) + 
  labs(
    x = "Model Predicted Self-Rated Health"
    , title = "Marginal Means"
    , y = NULL
    ) +
  my_theme()

We can pull the predictions from model terms or marginal means

Model Terms:

Code

p1 +
  stat_halfeye(
      aes(xdist = dist_student_t(df = df.residual(m1), mu = estimate, sigma = std.error))
  ) + 
  labs(subtitle = "stat_halfeye()")

Marginal Means:

Code

p2 + 
  stat_halfeye(
      aes(xdist = dist_student_t(df = df.residual(m1), mu = .fitted, sigma = .se.fit)), 
  ) + 
  labs(subtitle = "stat_halfeye()")

`stat_eye()`

Model Terms:

Code

p1 +
  stat_eye(
      aes(xdist = dist_student_t(df = df.residual(m1), mu = estimate, sigma = std.error))
  ) + 
  labs(subtitle = "stat_eye()")

Marginal Means:

Code

p2 + 
  stat_eye(
      aes(xdist = dist_student_t(df = df.residual(m1), mu = .fitted, sigma = .se.fit)), 
  ) + 
  labs(subtitle = "stat_eye()")

`stat_gradientinterval()`

Model Terms:

Code

p1 +
  stat_gradientinterval(
      aes(xdist = dist_student_t(df = df.residual(m1), mu = estimate, sigma = std.error))
  ) + 
  labs(subtitle = "stat_gradientinterval()")

Marginal Means:

Code

p2 + 
  stat_gradientinterval(
      aes(xdist = dist_student_t(df = df.residual(m1), mu = .fitted, sigma = .se.fit)), 
  ) + 
  labs(subtitle = "stat_gradientinterval()")

`stat_dots()`

Model Terms:

Code

p1 +
  stat_dots(
      aes(xdist = dist_student_t(df = df.residual(m1), mu = estimate, sigma = std.error))
  ) + 
  labs(subtitle = "stat_dots()")

Marginal Means:

Code

p2 + 
  stat_dots(
      aes(xdist = dist_student_t(df = df.residual(m1), mu = .fitted, sigma = .se.fit)), 
  ) + 
  labs(subtitle = "stat_dots()")

You can also change the number of dots:

Model Terms:

Code

p1 +
  stat_dots(
      aes(xdist = dist_student_t(df = df.residual(m1), mu = estimate, sigma = std.error))
      , quantiles = 50
  ) + 
  labs(subtitle = "stat_dots()")

Marginal Means:

Code

p2 + 
  stat_dots(
      aes(xdist = dist_student_t(df = df.residual(m1), mu = .fitted, sigma = .se.fit)) 
      , quantiles = 50
  ) + 
  labs(subtitle = "stat_dots()")

There are also three different layouts

layout = "bin":

Code

p2 + 
  stat_dots(
      aes(xdist = dist_student_t(df = df.residual(m1), mu = .fitted, sigma = .se.fit)) 
      , quantiles = 50
      , layout = "bin"
  ) + 
  labs(subtitle = "stat_dots()")

layout = "weave":

Code

p2 + 
  stat_dots(
      aes(xdist = dist_student_t(df = df.residual(m1), mu = .fitted, sigma = .se.fit)) 
      , quantiles = 50
      , layout = "weave"
  ) + 
  labs(subtitle = "stat_dots()")

layout = "swarm":

Code

p2 + 
  stat_dots(
      aes(xdist = dist_student_t(df = df.residual(m1), mu = .fitted, sigma = .se.fit)) 
      , quantiles = 50
      , layout = "swarm"
  ) + 
  labs(subtitle = "stat_dots()")

`stat_dotsinterval()`

Model Terms:

Code

p1 +
  stat_dotsinterval(
      aes(xdist = dist_student_t(df = df.residual(m1), mu = estimate, sigma = std.error))
      , quantiles = 50
  ) + 
  labs(subtitle = "stat_dotsinterval()")

Marginal Means:

Code

p2 + 
  stat_dotsinterval(
      aes(xdist = dist_student_t(df = df.residual(m1), mu = .fitted, sigma = .se.fit)) 
      , quantiles = 50
  ) + 
  labs(subtitle = "stat_dotsinterval()")

You can apply many of the same arguments as “regular” stat_dots()

Model Terms:

Code

p1 +
  stat_dotsinterval(
      aes(xdist = dist_student_t(df = df.residual(m1), mu = estimate, sigma = std.error))
      , quantiles = 50
      , layout = "swarm"
  ) + 
  labs(subtitle = "stat_dotsinterval()")

Marginal Means:

Code

p2 + 
  stat_dotsinterval(
      aes(xdist = dist_student_t(df = df.residual(m1), mu = .fitted, sigma = .se.fit)) 
      , quantiles = 50
      , layout = "swarm"
  ) + 
  labs(subtitle = "stat_dotsinterval()")

`stat_halfeye()`+ `stat_dots()`

Model Terms:

Code

p1 +
  stat_dots(
      aes(xdist = dist_student_t(df = df.residual(m1), mu = estimate, sigma = std.error))
      , quantiles = 50
      , side = "bottomleft"
      , layout = "swarm"
  ) + 
  stat_halfeye(
    aes(xdist = dist_student_t(df = df.residual(m1), mu = estimate, sigma = std.error))
  ) + 
  labs(subtitle = "`stat_halfeye()`+ `stat_dots()")

Marginal Means:

Code

p2 + 
  stat_dots(
      aes(xdist = dist_student_t(df = df.residual(m1), mu = .fitted, sigma = .se.fit)) 
      , quantiles = 50
      , side = "bottomleft"
      , layout = "swarm"
  ) + 
  stat_halfeye(
    aes(xdist = dist_student_t(df = df.residual(m1), mu = .fitted, sigma = .se.fit)) 
  ) + 
  labs(subtitle = "`stat_halfeye()`+ `stat_dots()")

Simple Slopes

Let’s say we want to know if married and unmarried people differ in their self-rated health as a function of their age:

Code

gsoep_ex2 <- gsoep_ex %>%
  filter(marital %in% c("Married", "Never Married"))
gsoep_ex2

Code

m2 <- lm(SRhealth ~ age + marital + age:marital, data = gsoep_ex2)
tidy(m2)

We can plot this using an extension of geom_ribbon(), stat_lineribbon()

Code

gsoep_ex2 %>%
  group_by(marital) %>%
  data_grid(age= seq_range(age, n = 101)) %>%
  ungroup() %>%
  augment(m2, newdata = ., se_fit = T) %>%
  ggplot(aes(x = age*10, fill = ordered(marital), color = ordered(marital))) +
    stat_lineribbon(
      aes(ydist = dist_student_t(df = df.residual(m2), mu = .fitted, sigma = .se.fit)),
      alpha = 1/4) + 
    scale_fill_brewer(palette = "Set2") +
    scale_color_brewer(palette = "Dark2") +
    labs(x = "Age (Years)", y = "Predicted Self-Rated Health"
         , fill = "Marital Status", color = "Marital Status") + 
    my_theme()

I promised I wouldn’t go Bayes, but I never promised I wouldn’t bootstrap!
Knowing how to get bootstrapped confidence and/or prediction intervals is important, especially if you work with any sort of multilevel / hierarchical models
Getting bootstrapped interval estimates is easy to get point estimates, but it’s a little harder if we want to get it around prediction lines
- (i.e. there’s no great built in funcitons in R that get it for you, in my opinion)

But first, let’s get some longitudinal data that let’s us look at the interaction between marital status and changes in self-rated health within a person as they age:

Code

set.seed(5)
gsoep_ex3 <- gsoep %>%
  select(SID, age, marital, gender, SRhealth) %>%
  filter(marital %in% c(1,4)) %>%
  group_by(SID) %>%
  mutate(marital = min(marital, na.rm = T)) %>%
  ungroup() %>%
  mutate(marital = factor(
    marital
    , c(1,4)
    , c("Married", "Never Married")
    ), age = age/10
    , gender = factor(gender, c(1,2), c("Male", "Female"))) %>%
  drop_na()

gsoep_ex3 <- gsoep_ex3 %>%
  filter(SID %in% sample(unique(gsoep_ex3$SID), 2000))

The critical term is the interaction between the two:

Code

m3 <- lmer(SRhealth ~ age + marital + age:marital + (age | SID), data = gsoep_ex3)
tidy(m3)

Changes in health differ across marital groups

But how?
- Interactions can often be tricky to unpack by point estimates alone
- So we may want to plot separate trajectories for married and unmarried people

Code

predIntlme4 <- function(m, mod_frame, ref){
  ## get bootstrapped estimates
  b <- bootMer(
    m
    , FUN = function(x) lme4:::predict.merMod(
      x
      , newdata = mod_frame
      , re.form = ref
      )
    , nsim = 100 # do not use 100 in practice, please
    , parallel = "multicore"
    , ncpus = 16
    )
  
  ## get long form bootstrapped draws
  b_df <- bind_cols(
    mod_frame
    , t(b$t) %>%
    data.frame()
  ) %>%
    pivot_longer(
       cols = c(-age, -marital)
      , names_to = "boot"
      , values_to = "pred"
    )
  return(list(boot = b, b_df = b_df))
}

pred_fx_fun <- function(m){
  mod_frame <- crossing(
    age = seq(min(m@frame$age), max(m@frame$age), .1)
    , marital = levels(m@frame$marital)
  )
  boot <- predIntlme4(m = m, mod_frame = mod_frame, ref = NA)
}

boot3 <- pred_fx_fun(m3)
b_df3 <- boot3$b_df
b3 <- boot3$boot

`geom_line()`

Code

b_df3 %>%
  ggplot(aes(x = age, y = pred)) + 
    geom_line(
      aes(color = marital, group = interaction(marital, boot))
      , alpha = .2, size = .25
      ) + 
    my_theme()

`geom_lineribbon()`: summarized

Code

b_df3 %>%
  group_by(marital, age) %>%
  median_qi(pred) %>%
  ggplot(aes(x = age, y = pred, ymin = .lower, ymax = .upper)) +
  geom_lineribbon(aes(fill = marital), size = .9) + 
  scale_fill_brewer(palette = "Set2") +
  my_theme()

`geom_lineribbon()` bands: summarized

Code

b_df3 %>%
  group_by(marital, age) %>%
  median_qi(pred, .width = c(.50, .80, .95)) %>%
  ggplot(aes(x = age, y = pred, ymin = .lower, ymax = .upper)) +
  geom_lineribbon(size = .9) + 
  scale_fill_brewer() +
  facet_grid(~marital) + 
  my_theme()

`stat_lineribbon()` bands: samples

Code

b_df3 %>%
  ggplot(aes(x = age, y = pred, fill = marital)) + 
  stat_lineribbon(alpha = .25) + 
  my_theme()

We can also use a new aesthetic: fill_ramp:

Code

b_df3 %>%
  ggplot(aes(x = age, y = pred, fill = marital)) + 
  stat_lineribbon(aes(fill_ramp = stat(level))) +
  my_theme()

`geom_lineribbon()` gradient: samples

Code

b_df3 %>%
  ggplot(aes(x = age, y = pred, fill = marital)) + 
  stat_lineribbon(alpha = .25, .width = ppoints(25)) + 
  scale_fill_brewer(palette = "Set2") +
  my_theme()

Let’s clean it up:

Code

ms <- b_df3 %>% filter(age == max(age)) %>% group_by(marital) %>% summarize(m = mean(pred))

b_df3 %>%
  ggplot(aes(x = age*10, y = pred, fill = marital, fill_ramp = stat(.width))) + 
  stat_lineribbon(alpha = .25, .width = ppoints(25)) +
  scale_x_continuous(limits = c(15,100), breaks = seq(15,90,15)) + 
  scale_fill_manual(values = c("grey", "darkorange")) + 
  annotate("text", label = "Married", x = max(b_df3$age)*10+1, y = ms$m[1], hjust = 0) + 
  annotate("text", label = "Never\nMarried", x = max(b_df3$age)*10+1, y = ms$m[2], hjust = 0) + 
  labs(
    x = "Age (Years)"
    , y = "Predicted Self Rated Health\n(Bootstrapped Interval Estimates)"
    , fill = NULL
    , title = "Self-Rated Health Declines More Rapidly for Unmarried People"
    ) + 
  guides(fill = "none") + 
  my_theme()

Animated Uncertainty

Ensemble Displays

Ensemble displays are an alternative to putting hard boundaries around an interval estimate
Remember that hard boundaries make people interpret categorical differences even when the underlying distribution is continuous
We’ve already seen this:

But the challenge with visualizing uncertainty is between inference and understanding
We need to leverage a knowledge of perception and cognitive processes to help us leverage strengths and overcome weaknesses
Animating visualizations can help us nudge people to process was they see and update their uncertainty estimates over time

Here’s a quick example:

Let’s break this down:

Code

b_df3 %>%
  group_by(marital, age) %>%
  median_qi(pred)

Now, let’s plot the ribbon:

Code

b_df3 %>%
  group_by(marital, age) %>%
  median_qi(pred) %>%
  ggplot(aes(x = age, y = pred)) +
    geom_ribbon(
      aes(fill = marital, ymin = .lower, ymax = .upper)
      , size = .9, alpha = .5
      ) + 
    scale_fill_brewer(palette = "Set2") +
    facet_grid(~marital) + 
    my_theme() + 
    theme(legend.position = "none")

And add the lines (all of them, it will be ugly):

Code

b_df3 %>%
  group_by(marital, age) %>%
  median_qi(pred) %>%
  ggplot(aes(x = age, y = pred)) +
    geom_ribbon(
      aes(fill = marital, ymin = .lower, ymax = .upper)
      , size = .9, alpha = .5
      ) + 
    scale_fill_brewer(palette = "Set2") +
    geom_line(
      data = b_df3
      , aes(group = interaction(marital, boot))
      , size = 1
      ) +
    facet_grid(~marital) + 
    my_theme() + 
    theme(legend.position = "none")

And finally, use transition_states() to animate it

Code

b_df3 %>%
  group_by(marital, age) %>%
  median_qi(pred) %>%
  ggplot(aes(x = age, y = pred)) +
    geom_ribbon(
      aes(fill = marital, ymin = .lower, ymax = .upper)
      , size = .9, alpha = .5
      ) + 
    scale_fill_brewer(palette = "Set2") +
    geom_line(
      data = b_df3
      , aes(group = interaction(marital, boot))
      , size = 1
      ) +
    facet_grid(~marital) + 
    my_theme() + 
    theme(legend.position = "none") + 
    transition_states(boot, 1, 1)

Hypothetical Outcome Plots (HOPs)

Similarly, hypothetical outcome plots let us see plausible mean estimates among raw data
Here’s self-rated health (1-5) across married and unmarried individuals:

Code

gsoep_ex3 %>%
  ggplot(aes(y = marital, x = SRhealth)) + 
    geom_jitter(aes(color = marital), alpha = .5) + 
    my_theme()

Using the ungeviz package, we can then use the geom_vpline() function to sample from the data across groups and plot the mean from different samples:

Code

gsoep_ex3 %>%
  ggplot(aes(y = marital, x = SRhealth)) + 
    geom_jitter(aes(color = marital), alpha = .25) + 
    geom_vpline(
      data = sampler(25, group = marital)
      , height = 0.6
      , color = "#D55E00"
      ) +
    scale_color_manual(values = c("seagreen2", "darkorange")) + 
    my_theme()

And finally, we can animate those samples the transition_states() function from the gganimate package again:

Code

gsoep_ex3 %>%
  ggplot(aes(y = marital, x = SRhealth)) + 
    geom_jitter(aes(color = marital), alpha = .5) + 
    geom_vpline(
      data = sampler(25, group = marital)
      , height = 0.6
      , color = "#D55E00"
      ) +
    scale_color_manual(values = c("seagreen2", "darkorange")) + 
    my_theme() + 
    transition_states(.draw, 1, 3)

--- title: "Visualizing Uncertainty" author: "Emorie D Beck" format: html: code-tools: true code-copy: true code-line-numbers: true code-link: true theme: united highlight-style: tango df-print: paged code-fold: show toc: true toc-float: true self-contained: true # height: 900 footer: "PSC 290 - Data Visualization" logo: "https://github.com/emoriebeck/psc290-data-viz-2022/raw/main/01-week1-intro/02-code/02-images/ucdavis_logo_blue.png" editor_options: chunk_output_type: console --- ```{r, echo = F} knitr::opts_chunk$set(echo = TRUE, warning = F, message = F, error = F, out.width = "90%") options(knitr.kable.NA = '') ``` ```{r, echo = T} #| code-line-numbers: "6,10,11,12,13,14" library(RColorBrewer) library(knitr) library(kableExtra) library(plyr) library(broom) library(modelr) library(lme4) library(broom.mixed) library(tidyverse) library(ungeviz) library(ggdist) library(tidybayes) library(distributional) library(gganimate) ``` # Visualizing Uncertainty ## Visualizing Uncertainty * Why is visualizing uncertainty important? + Point estimates are over-emphasized and interval estimates are unemphasized (or ignored) + Most people <font color="red">*misperceive*</font> both (1) common uncertainty visualizations and (2) most common uncertainty metrics + In other words, people make errors about error + Probability is hard, and most aren't taught about probability distributions (and more) ## Theories of Visualizing Uncertainty Why do people misperceive uncertainty, and how can we mitigate it? ::::{.fragment} ```{r, echo = F, fig.align='center', out.width="70%"} knitr::include_graphics("https://github.com/emoriebeck/psc290-data-viz-2022/raw/main/06-week6-uncertainty/02-images/uncert-theories.png") ``` :::: ## Quick Side Note: Custom Themes * We have a lot to cover today, so I'm going to skip over some of the usual "how to start with the basics and make it aesthetically pleasing" * Instead, we'll create a custom these that captures some of our usual additions * This will save us both time and text! * I highly recommend doing this in your own work ```{r} my_theme <- function(){ theme_classic() + theme( legend.position = "bottom" , legend.title = element_text(face = "bold", size = rel(1)) , legend.text = element_text(face = "italic", size = rel(1)) , axis.text = element_text(face = "bold", size = rel(1.1), color = "black") , axis.title = element_text(face = "bold", size = rel(1.2)) , plot.title = element_text(face = "bold", size = rel(1.2), hjust = .5) , plot.subtitle = element_text(face = "italic", size = rel(1.2), hjust = .5) , strip.text = element_text(face = "bold", size = rel(1.1), color = "white") , strip.background = element_rect(fill = "black") ) } ``` ## Error Bars * First, let's examine the usual ways that we show uncertainty around point estimates (e.g., means, model parameter estimates, etc.) using interval estimates (+/- 1 SE/D, confidence interval) ```{r, echo=F} load(url("https://github.com/emoriebeck/psc290-data-viz-2022/blob/main/05-week5-time-series/01-data/gsoep.RData?raw=true")) gsoep <- gsoep %>% haven::zap_labels() ``` ```{r} pomp <- function(x) (x - min(x, na.rm = T))/(max(x, na.rm = T) - min(x, na.rm = T))*10 gsoep %>% filter(year == 2005) %>% filter(SID %in% sample(unique(.$SID, 500))) %>% mutate(SRhealth = pomp(SRhealth)) %>% group_by(SID, age) %>% mutate(health = rowMeans(cbind(SRhealth, satHealth))) %>% ungroup() %>% select(SID, age, health) %>% drop_na() ``` ```{r} gsoep_desc <- gsoep %>% filter(year == 2005 & age < 30) %>% filter(SID %in% sample(unique(.$SID), 500)) %>% mutate(SRhealth = pomp(SRhealth)) %>% group_by(SID, age) %>% mutate(health = rowMeans(cbind(SRhealth, satHealth), na.rm = T)) %>% ungroup() %>% select(SID, age, health) %>% drop_na() %>% mutate( mean = mean(health) , sd = sd(health) , se = sd/sqrt(n()) , ci99 = se*2.576 , ci95 = se*1.96 , ci80 = se*1.282 ) gsoep_desc ``` ```{r} gsoep_desc <- gsoep_desc %>% select(mean:ci80, health, SID) %>% pivot_longer( cols = c(-mean, -SID) , names_to = "measure" , values_to = "value" ) %>% mutate(SID = ifelse(measure == "health" | row_number() %in% 1:5, SID, NA)) %>% drop_na() %>% mutate(measure = factor(measure, rev(c("health", "sd", "se", "ci99", "ci95", "ci80")))) gsoep_desc ``` ```{r, fig.dim=c(10,3)} gsoep_desc %>% ggplot(aes(y = measure, x = mean)) + geom_point(size = 3, color = "darkorange3") + geom_jitter( data = gsoep_desc %>% filter(measure == "health") , aes(x = value), alpha = .5, height = .3, width = 0 ) + geom_errorbar( data = gsoep_desc %>% filter(measure != "health") , aes(xmin = mean - value, xmax = mean + value) , width = .1 ) + geom_point(size = 3, color = "darkorange3") + my_theme() ``` ## So What Do We Do? * Lots of things, not all of which we have time for today. See also: + [https://janhove.github.io/visualise_uncertainty/](https://janhove.github.io/visualise_uncertainty/) + [https://wilkelab.org/SDS375/slides/visualizing-uncertainty.html#1](https://wilkelab.org/SDS375/slides/visualizing-uncertainty.html#1) + [https://wilkelab.org/ungeviz/](https://wilkelab.org/ungeviz/) + #TeamBayes: [http://mjskay.github.io/tidybayes/](http://mjskay.github.io/tidybayes/) + #TeamFrequentist: [https://mjskay.github.io/ggdist/](https://mjskay.github.io/ggdist/) ## Outline for Today: * Proportions / Probability + icon array * Point Estimates + half-eye + gradient interval + quantile dotplot + raincloud * Animated (sometimes) + hypothetical outcome plots + ensemble display # Proportions / Probability * We already covered proportions and probability, but this one deserves being highlighted itself * How much of our sample was unmarried? ```{r} gsoep %>% filter(year == 2012 & !is.na(marital)) %>% mutate(marital = ifelse(marital == 4, "Never Married", "Married")) %>% group_by(marital) %>% tally() %>% ungroup() %>% mutate(prop = round(n/sum(n)*100)) ``` * We have to trick `ggplot2` into making this figure with a grid ```{r} tibble( value = c(rep(1, 76), rep(2,24)) , x = rep(1:10, each = 10) , y = rep(1:10, times = 10) ) ``` ```{r} #| code-line-numbers: "6-9" tibble( value = c(rep(1, 76), rep(2,24)) , y = rep(1:10, each = 10) , x = rep(1:10, times = 10) ) %>% ggplot(aes(x = x, y = y, color = factor(value))) + geom_point(shape = "square", size = 5) + theme_void() + theme(legend.position = "none") ``` Let's clean it up: ```{r fig.dim=c(4,4.2)} #| code-line-numbers: "7-9,12" tibble( value = c(rep(1, 76), rep(2,24)) , y = rep(1:10, each = 10) , x = rep(1:10, times = 10) ) %>% ggplot(aes(x = x, y = y, color = factor(value))) + geom_point(shape = "square", size = 8) + scale_color_manual(values = c("lightgrey", "darkblue")) + labs(title = "24% Remained Unmarried in 2012") + theme_void() + theme(legend.position = "none" , plot.title = element_text(hjust = .5, face = "bold")) ``` # Point Estimates * Most often, we want to visualize either point esimates or other visualizations of models * We touched on this a couple of weeks ago when we talked about `broom` and `broom.mixed` * I mentioned then that one of the challenges comes from where the interval estimate comes from, which includes: + (Frequentist) Standard Errors + (Frequentist) Confidence Intervals + Bootstrapped / Profile Confidence Intervals + Prediction Intervals + (Bayesian) Credible Intervals + (Bayesian) Posterior Distributions * I'll stay out of Bayes for now :( ```{r} gsoep_ex <- gsoep %>% filter(year == "2000") %>% select(SID, age, marital, gender) %>% inner_join( gsoep %>% filter(year == "2015") %>% select(SID, SRhealth) ) %>% mutate(marital = factor( marital , 1:4 , c("Married", "Separated", "Widowed", "Never Married") ), age = age/10 , gender = factor(gender, c(1,2), c("Male", "Female"))) %>% drop_na() gsoep_ex ``` ## Point Estimates From Model Predictions ```{r, echo = F, fig.dim=c(12, 6), fig.align='center'} m1 <- lm(SRhealth ~ marital + age, data = gsoep_ex) tidy1 <- tidy(m1) p1 <- tidy1 %>% filter(term != "(Intercept)") %>% ggplot(aes(y = term)) + stat_halfeye( aes(xdist = dist_student_t(df = df.residual(m1), mu = estimate, sigma = std.error)), scale = .5 ) + labs( x = "Parameter Estimates" , y = NULL , title = "stat_halfeye()" ) + my_theme() p2 <-tidy1 %>% filter(term != "(Intercept)") %>% ggplot(aes(y = term)) + stat_gradientinterval( aes(xdist = dist_student_t(df = df.residual(m1), mu = estimate, sigma = std.error)) , scale = .75 ) + labs( x = "Parameter Estimates" , y = NULL , title = "stat_gradientinterval()" ) + my_theme() p3 <-tidy1 %>% filter(term != "(Intercept)") %>% ggplot(aes(y = term)) + stat_dotsinterval( aes(xdist = dist_student_t(df = df.residual(m1), mu = estimate, sigma = std.error)) , quantiles = 50 ) + labs( x = "Parameter Estimates" , y = NULL , title = "stat_dotsinterval()" ) + my_theme() p4 <-tidy1 %>% filter(term != "(Intercept)") %>% ggplot(aes(y = term)) + stat_dots( aes(xdist = dist_student_t(df = df.residual(m1), mu = estimate, sigma = std.error)) , quantiles = 50 ) + labs( x = "Parameter Estimates()" , y = NULL , title = "stat_dots()" ) + my_theme() p5 <-tidy1 %>% filter(term != "(Intercept)") %>% ggplot(aes(y = term)) + stat_dots( aes(xdist = dist_student_t(df = df.residual(m1), mu = estimate, sigma = std.error)) , side = "bottomleft", layout = "swarm" ) + stat_histinterval( aes(xdist = dist_student_t(df = df.residual(m1), mu = estimate, sigma = std.error)) ) + labs( x = "Parameter Estimates" , y = NULL , title = "stat_halfeye() + stat_dots()" ) + my_theme() p6 <-tidy1 %>% filter(term != "(Intercept)") %>% ggplot(aes(y = term)) + stat_eye( aes(xdist = dist_student_t(df = df.residual(m1), mu = estimate, sigma = std.error)) ) + labs( x = "Parameter Estimates" , y = NULL , title = "stat_eye()" ) + my_theme() cowplot::plot_grid(p1, p2, p3, p6, p5, p4, nrow = 2) ``` ## Marginal Means from Model Predictions ```{r, echo = F, fig.dim=c(12, 6), fig.align='center'} m1 <- lm(SRhealth ~ marital + age, data = gsoep_ex) tidy1 <- tidy(m1) p1 <- gsoep_ex %>% data_grid(marital) %>% mutate(age = mean(gsoep_ex$age)) %>% augment(m1, newdata = ., se_fit = T) %>% ggplot(aes(y = marital)) + stat_halfeye( aes(xdist = dist_student_t(df = df.residual(m1), mu = .fitted, sigma = .se.fit)), scale = .5 ) + labs( x = "Parameter Estimates" , y = NULL , title = "stat_halfeye()" ) + my_theme() p2 <- gsoep_ex %>% data_grid(marital) %>% mutate(age = mean(gsoep_ex$age)) %>% augment(m1, newdata = ., se_fit = T) %>% ggplot(aes(y = marital)) + stat_gradientinterval( aes(xdist = dist_student_t(df = df.residual(m1), mu = .fitted, sigma = .se.fit)) , scale = .75 ) + labs( x = "Parameter Estimates" , y = NULL , title = "stat_gradientinterval()" ) + my_theme() p3 <- gsoep_ex %>% data_grid(marital) %>% mutate(age = mean(gsoep_ex$age)) %>% augment(m1, newdata = ., se_fit = T) %>% ggplot(aes(y = marital)) + stat_dotsinterval( aes(xdist = dist_student_t(df = df.residual(m1), mu = .fitted, sigma = .se.fit)) , quantiles = 50 ) + labs( x = "Parameter Estimates" , y = NULL , title = "stat_dotsinterval()" ) + my_theme() p4 <- gsoep_ex %>% data_grid(marital) %>% mutate(age = mean(gsoep_ex$age)) %>% augment(m1, newdata = ., se_fit = T) %>% ggplot(aes(y = marital)) + stat_dots( aes(xdist = dist_student_t(df = df.residual(m1), mu = .fitted, sigma = .se.fit)) , quantiles = 50 ) + labs( x = "Parameter Estimates()" , y = NULL , title = "stat_dots()" ) + my_theme() p5 <- gsoep_ex %>% data_grid(marital) %>% mutate(age = mean(gsoep_ex$age)) %>% augment(m1, newdata = ., se_fit = T) %>% ggplot(aes(y = marital)) + stat_dots( aes(xdist = dist_student_t(df = df.residual(m1), mu = .fitted, sigma = .se.fit)) , side = "bottomleft", layout = "swarm" ) + stat_histinterval( aes(xdist = dist_student_t(df = df.residual(m1), mu = .fitted, sigma = .se.fit)) ) + labs( x = "Parameter Estimates" , y = NULL , title = "stat_halfeye() + stat_dots()" ) + my_theme() p6 <- gsoep_ex %>% data_grid(marital) %>% mutate(age = mean(gsoep_ex$age)) %>% augment(m1, newdata = ., se_fit = T) %>% ggplot(aes(y = marital)) + stat_eye( aes(xdist = dist_student_t(df = df.residual(m1), mu = .fitted, sigma = .se.fit)) ) + labs( x = "Parameter Estimates" , y = NULL , title = "stat_eye()" ) + my_theme() cowplot::plot_grid(p1, p2, p3, p6, p5, p4, nrow = 2) ``` ## Point Estimates and Marginal Means * `stat_halfeye()`: Visual Boundaries * `stat_eye()`: Visual Boundaries * `stat_gradientinterval()`: Visual Semiotics * `stat_dots()`: Frequency Framing * `stat_dotsinterval()`: Frequency Framing * `stat_halfeye()`+ `stat_dots()`: Visual Boundaries + Frequency Framing ### Core syntax * The benefit of ggdist is that it allows you to use essentially identical syntax to produce lots of different kinds of plots * All we have to do is swap out the `geom` ```{r} m1 <- lm(SRhealth ~ marital + age, data = gsoep_ex) tidy1 <- tidy(m1) tidy1 %>% filter(term != "(Intercept)") %>% ggplot(aes(y = term)) + stat_halfeye( aes(xdist = dist_student_t(df = df.residual(m1), mu = estimate, sigma = std.error)) ) + my_theme() ``` ### `stat_halfeye()` We can pull the predictions from model terms or marginal means Model Terms: ```{r} #| code-line-numbers: "4" tidy1 %>% filter(term != "(Intercept)") %>% ggplot(aes(y = term)) + stat_halfeye( aes(xdist = dist_student_t(df = df.residual(m1), mu = estimate, sigma = std.error)) ) + my_theme() ``` Marginal Means: ```{r} #| code-line-numbers: "6-8" gsoep_ex %>% data_grid(marital) %>% mutate(age = mean(gsoep_ex$age)) %>% augment(m1, newdata = ., se_fit = T) %>% ggplot(aes(y = marital)) + stat_halfeye( aes(xdist = dist_student_t(df = df.residual(m1), mu = .fitted, sigma = .se.fit)), ) + my_theme() ``` Let's do a little hack and create our whole plots **except** the `geom`, so that we can build them with less syntax: ```{r} p1 <- tidy1 %>% filter(term != "(Intercept)") %>% ggplot(aes(y = term)) + labs( x = "Parameter Estimate" , y = NULL , title = "Model Estimates" , caption = "Outcome: Self-Rated Health" ) + my_theme() ``` ```{r} p2 <- gsoep_ex %>% data_grid(marital) %>% mutate(age = mean(gsoep_ex$age)) %>% augment(m1, newdata = ., se_fit = T) %>% ggplot(aes(y = marital)) + labs( x = "Model Predicted Self-Rated Health" , title = "Marginal Means" , y = NULL ) + my_theme() ``` We can pull the predictions from model terms or marginal means Model Terms: ```{r} p1 + stat_halfeye( aes(xdist = dist_student_t(df = df.residual(m1), mu = estimate, sigma = std.error)) ) + labs(subtitle = "stat_halfeye()") ``` Marginal Means: ```{r} p2 + stat_halfeye( aes(xdist = dist_student_t(df = df.residual(m1), mu = .fitted, sigma = .se.fit)), ) + labs(subtitle = "stat_halfeye()") ``` ### `stat_eye()` Model Terms: ```{r} p1 + stat_eye( aes(xdist = dist_student_t(df = df.residual(m1), mu = estimate, sigma = std.error)) ) + labs(subtitle = "stat_eye()") ``` Marginal Means: ```{r} p2 + stat_eye( aes(xdist = dist_student_t(df = df.residual(m1), mu = .fitted, sigma = .se.fit)), ) + labs(subtitle = "stat_eye()") ``` ### `stat_gradientinterval()` Model Terms: ```{r} p1 + stat_gradientinterval( aes(xdist = dist_student_t(df = df.residual(m1), mu = estimate, sigma = std.error)) ) + labs(subtitle = "stat_gradientinterval()") ``` Marginal Means: ```{r} p2 + stat_gradientinterval( aes(xdist = dist_student_t(df = df.residual(m1), mu = .fitted, sigma = .se.fit)), ) + labs(subtitle = "stat_gradientinterval()") ``` ### `stat_dots()` Model Terms: ```{r} p1 + stat_dots( aes(xdist = dist_student_t(df = df.residual(m1), mu = estimate, sigma = std.error)) ) + labs(subtitle = "stat_dots()") ``` Marginal Means: ```{r} p2 + stat_dots( aes(xdist = dist_student_t(df = df.residual(m1), mu = .fitted, sigma = .se.fit)), ) + labs(subtitle = "stat_dots()") ``` You can also change the number of dots: Model Terms: ```{r} p1 + stat_dots( aes(xdist = dist_student_t(df = df.residual(m1), mu = estimate, sigma = std.error)) , quantiles = 50 ) + labs(subtitle = "stat_dots()") ``` Marginal Means: ```{r} p2 + stat_dots( aes(xdist = dist_student_t(df = df.residual(m1), mu = .fitted, sigma = .se.fit)) , quantiles = 50 ) + labs(subtitle = "stat_dots()") ``` There are also three different layouts `layout = "bin"`: ```{r} p2 + stat_dots( aes(xdist = dist_student_t(df = df.residual(m1), mu = .fitted, sigma = .se.fit)) , quantiles = 50 , layout = "bin" ) + labs(subtitle = "stat_dots()") ``` `layout = "weave"`: ```{r} p2 + stat_dots( aes(xdist = dist_student_t(df = df.residual(m1), mu = .fitted, sigma = .se.fit)) , quantiles = 50 , layout = "weave" ) + labs(subtitle = "stat_dots()") ``` `layout = "swarm"`: ```{r} p2 + stat_dots( aes(xdist = dist_student_t(df = df.residual(m1), mu = .fitted, sigma = .se.fit)) , quantiles = 50 , layout = "swarm" ) + labs(subtitle = "stat_dots()") ``` ### `stat_dotsinterval()` Model Terms: ```{r} p1 + stat_dotsinterval( aes(xdist = dist_student_t(df = df.residual(m1), mu = estimate, sigma = std.error)) , quantiles = 50 ) + labs(subtitle = "stat_dotsinterval()") ``` Marginal Means: ```{r} p2 + stat_dotsinterval( aes(xdist = dist_student_t(df = df.residual(m1), mu = .fitted, sigma = .se.fit)) , quantiles = 50 ) + labs(subtitle = "stat_dotsinterval()") ``` You can apply many of the same arguments as "regular" `stat_dots()` Model Terms: ```{r} p1 + stat_dotsinterval( aes(xdist = dist_student_t(df = df.residual(m1), mu = estimate, sigma = std.error)) , quantiles = 50 , layout = "swarm" ) + labs(subtitle = "stat_dotsinterval()") ``` Marginal Means: ```{r} p2 + stat_dotsinterval( aes(xdist = dist_student_t(df = df.residual(m1), mu = .fitted, sigma = .se.fit)) , quantiles = 50 , layout = "swarm" ) + labs(subtitle = "stat_dotsinterval()") ``` ### `stat_halfeye()`+ `stat_dots()` Model Terms: ```{r} p1 + stat_dots( aes(xdist = dist_student_t(df = df.residual(m1), mu = estimate, sigma = std.error)) , quantiles = 50 , side = "bottomleft" , layout = "swarm" ) + stat_halfeye( aes(xdist = dist_student_t(df = df.residual(m1), mu = estimate, sigma = std.error)) ) + labs(subtitle = "`stat_halfeye()`+ `stat_dots()") ``` Marginal Means: ```{r} p2 + stat_dots( aes(xdist = dist_student_t(df = df.residual(m1), mu = .fitted, sigma = .se.fit)) , quantiles = 50 , side = "bottomleft" , layout = "swarm" ) + stat_halfeye( aes(xdist = dist_student_t(df = df.residual(m1), mu = .fitted, sigma = .se.fit)) ) + labs(subtitle = "`stat_halfeye()`+ `stat_dots()") ``` # Simple Slopes Let's say we want to know if married and unmarried people differ in their self-rated health as a function of their age: ```{r} gsoep_ex2 <- gsoep_ex %>% filter(marital %in% c("Married", "Never Married")) gsoep_ex2 m2 <- lm(SRhealth ~ age + marital + age:marital, data = gsoep_ex2) tidy(m2) ``` We can plot this using an extension of `geom_ribbon()`, `stat_lineribbon()` ```{r} gsoep_ex2 %>% group_by(marital) %>% data_grid(age= seq_range(age, n = 101)) %>% ungroup() %>% augment(m2, newdata = ., se_fit = T) %>% ggplot(aes(x = age*10, fill = ordered(marital), color = ordered(marital))) + stat_lineribbon( aes(ydist = dist_student_t(df = df.residual(m2), mu = .fitted, sigma = .se.fit)), alpha = 1/4) + scale_fill_brewer(palette = "Set2") + scale_color_brewer(palette = "Dark2") + labs(x = "Age (Years)", y = "Predicted Self-Rated Health" , fill = "Marital Status", color = "Marital Status") + my_theme() ``` * I promised I wouldn't go Bayes, but I never promised I wouldn't bootstrap! * Knowing how to get bootstrapped confidence and/or prediction intervals is important, especially if you work with any sort of multilevel / hierarchical models * Getting bootstrapped interval estimates is easy to get point estimates, but it's a little harder if we want to get it around prediction lines + (i.e. there's no great built in funcitons in `R` that get it for you, in my opinion) But first, let's get some longitudinal data that let's us look at the interaction between marital status and changes in self-rated health within a person as they age: ```{r} set.seed(5) gsoep_ex3 <- gsoep %>% select(SID, age, marital, gender, SRhealth) %>% filter(marital %in% c(1,4)) %>% group_by(SID) %>% mutate(marital = min(marital, na.rm = T)) %>% ungroup() %>% mutate(marital = factor( marital , c(1,4) , c("Married", "Never Married") ), age = age/10 , gender = factor(gender, c(1,2), c("Male", "Female"))) %>% drop_na() gsoep_ex3 <- gsoep_ex3 %>% filter(SID %in% sample(unique(gsoep_ex3$SID), 2000)) ``` The critical term is the interaction between the two: ```{r} m3 <- lmer(SRhealth ~ age + marital + age:marital + (age | SID), data = gsoep_ex3) tidy(m3) ``` Changes in health differ across marital groups * But how? + Interactions can often be tricky to unpack by point estimates alone + So we may want to plot separate trajectories for married and unmarried people ```{r, cache = T} predIntlme4 <- function(m, mod_frame, ref){ ## get bootstrapped estimates b <- bootMer( m , FUN = function(x) lme4:::predict.merMod( x , newdata = mod_frame , re.form = ref ) , nsim = 100 # do not use 100 in practice, please , parallel = "multicore" , ncpus = 16 ) ## get long form bootstrapped draws b_df <- bind_cols( mod_frame , t(b$t) %>% data.frame() ) %>% pivot_longer( cols = c(-age, -marital) , names_to = "boot" , values_to = "pred" ) return(list(boot = b, b_df = b_df)) } pred_fx_fun <- function(m){ mod_frame <- crossing( age = seq(min(m@frame$age), max(m@frame$age), .1) , marital = levels(m@frame$marital) ) boot <- predIntlme4(m = m, mod_frame = mod_frame, ref = NA) } boot3 <- pred_fx_fun(m3) b_df3 <- boot3$b_df b3 <- boot3$boot ``` ## `geom_line()` ```{r} b_df3 %>% ggplot(aes(x = age, y = pred)) + geom_line( aes(color = marital, group = interaction(marital, boot)) , alpha = .2, size = .25 ) + my_theme() ``` ## `geom_lineribbon()`: summarized ```{r} b_df3 %>% group_by(marital, age) %>% median_qi(pred) %>% ggplot(aes(x = age, y = pred, ymin = .lower, ymax = .upper)) + geom_lineribbon(aes(fill = marital), size = .9) + scale_fill_brewer(palette = "Set2") + my_theme() ``` ## `geom_lineribbon()` bands: summarized ```{r} b_df3 %>% group_by(marital, age) %>% median_qi(pred, .width = c(.50, .80, .95)) %>% ggplot(aes(x = age, y = pred, ymin = .lower, ymax = .upper)) + geom_lineribbon(size = .9) + scale_fill_brewer() + facet_grid(~marital) + my_theme() ``` ## `stat_lineribbon()` bands: samples ```{r} b_df3 %>% ggplot(aes(x = age, y = pred, fill = marital)) + stat_lineribbon(alpha = .25) + my_theme() ``` We can also use a new `aesthetic`: `fill_ramp`: ```{r} b_df3 %>% ggplot(aes(x = age, y = pred, fill = marital)) + stat_lineribbon(aes(fill_ramp = stat(level))) + my_theme() ``` ## `geom_lineribbon()` gradient: samples ```{r} b_df3 %>% ggplot(aes(x = age, y = pred, fill = marital)) + stat_lineribbon(alpha = .25, .width = ppoints(25)) + scale_fill_brewer(palette = "Set2") + my_theme() ``` Let's clean it up: ```{r} ms <- b_df3 %>% filter(age == max(age)) %>% group_by(marital) %>% summarize(m = mean(pred)) b_df3 %>% ggplot(aes(x = age*10, y = pred, fill = marital, fill_ramp = stat(.width))) + stat_lineribbon(alpha = .25, .width = ppoints(25)) + scale_x_continuous(limits = c(15,100), breaks = seq(15,90,15)) + scale_fill_manual(values = c("grey", "darkorange")) + annotate("text", label = "Married", x = max(b_df3$age)*10+1, y = ms$m[1], hjust = 0) + annotate("text", label = "Never\nMarried", x = max(b_df3$age)*10+1, y = ms$m[2], hjust = 0) + labs( x = "Age (Years)" , y = "Predicted Self Rated Health\n(Bootstrapped Interval Estimates)" , fill = NULL , title = "Self-Rated Health Declines More Rapidly for Unmarried People" ) + guides(fill = "none") + my_theme() ``` # Animated Uncertainty ## Ensemble Displays * Ensemble displays are an alternative to putting hard boundaries around an interval estimate * Remember that hard boundaries make people interpret categorical differences even when the underlying distribution is continuous * We've already seen this: ```{r, echo = F} b_df3 %>% ggplot(aes(x = age, y = pred)) + geom_line( aes(color = marital, group = interaction(marital, boot)) , alpha = .2, size = .5 ) + my_theme() ``` * But the challenge with visualizing uncertainty is between inference and understanding * We need to leverage a knowledge of perception and cognitive processes to help us leverage strengths and overcome weaknesses * Animating visualizations can help us nudge people to process was they see and update their uncertainty estimates over time Here's a quick example: ```{r, echo = F} b_df3 %>% group_by(marital, age) %>% median_qi(pred) %>% ggplot(aes(x = age, y = pred)) + geom_ribbon( aes(fill = marital, ymin = .lower, ymax = .upper) , size = .9, alpha = .5 ) + scale_fill_brewer(palette = "Set2") + geom_line( data = b_df3 , aes(group = interaction(marital, boot)) , size = 1 ) + facet_grid(~marital) + my_theme() + theme(legend.position = "none") + transition_states(boot, 1, 1) ``` Let's break this down: ```{r} b_df3 %>% group_by(marital, age) %>% median_qi(pred) ``` Now, let's plot the ribbon: ```{r} b_df3 %>% group_by(marital, age) %>% median_qi(pred) %>% ggplot(aes(x = age, y = pred)) + geom_ribbon( aes(fill = marital, ymin = .lower, ymax = .upper) , size = .9, alpha = .5 ) + scale_fill_brewer(palette = "Set2") + facet_grid(~marital) + my_theme() + theme(legend.position = "none") ``` And add the lines (all of them, it will be ugly): ```{r} b_df3 %>% group_by(marital, age) %>% median_qi(pred) %>% ggplot(aes(x = age, y = pred)) + geom_ribbon( aes(fill = marital, ymin = .lower, ymax = .upper) , size = .9, alpha = .5 ) + scale_fill_brewer(palette = "Set2") + geom_line( data = b_df3 , aes(group = interaction(marital, boot)) , size = 1 ) + facet_grid(~marital) + my_theme() + theme(legend.position = "none") ``` And finally, use `transition_states()` to animate it ```{r, eval = F} b_df3 %>% group_by(marital, age) %>% median_qi(pred) %>% ggplot(aes(x = age, y = pred)) + geom_ribbon( aes(fill = marital, ymin = .lower, ymax = .upper) , size = .9, alpha = .5 ) + scale_fill_brewer(palette = "Set2") + geom_line( data = b_df3 , aes(group = interaction(marital, boot)) , size = 1 ) + facet_grid(~marital) + my_theme() + theme(legend.position = "none") + transition_states(boot, 1, 1) ``` ## Hypothetical Outcome Plots (HOPs) * Similarly, hypothetical outcome plots let us see plausible mean estimates among raw data * Here's self-rated health (1-5) across married and unmarried individuals: ```{r} gsoep_ex3 %>% ggplot(aes(y = marital, x = SRhealth)) + geom_jitter(aes(color = marital), alpha = .5) + my_theme() ``` * Using the `ungeviz` package, we can then use the `geom_vpline()` function to sample from the data across groups and plot the mean from different samples: ```{r} gsoep_ex3 %>% ggplot(aes(y = marital, x = SRhealth)) + geom_jitter(aes(color = marital), alpha = .25) + geom_vpline( data = sampler(25, group = marital) , height = 0.6 , color = "#D55E00" ) + scale_color_manual(values = c("seagreen2", "darkorange")) + my_theme() ``` And finally, we can animate those samples the `transition_states()` function from the `gganimate` package again: ```{r} gsoep_ex3 %>% ggplot(aes(y = marital, x = SRhealth)) + geom_jitter(aes(color = marital), alpha = .5) + geom_vpline( data = sampler(25, group = marital) , height = 0.6 , color = "#D55E00" ) + scale_color_manual(values = c("seagreen2", "darkorange")) + my_theme() + transition_states(.draw, 1, 3) ```

Visualizing Uncertainty