Advanced Modeling Topics

This section of notes will use the following packages.

library(modelr)
library(broom)
library(forcats)
library(stringr)
library(tidyverse)

Interactions

Interactions are an important modeling concept that can greatly increase model fit, prediction accuracy, and explained variance. Interactions can be difficult to interpret, however, we will explore them in more detail here with particular attention to graphical displays of interactions and also exploring the design matrix for how interactions are included in the model fitting procedure.

We will use the heights data from the modelr package to explore interactions. The primary interactions that we will explore are between two (or more) categorical predictors and also the interaction between a categorical predictor and a continuous predictor. Interpretations are similar between two continuous predictors as well.

Here are the first few rows of the data:

heights
## # A tibble: 7,006 × 8
##    income height weight   age marital  sex    education  afqt
##     <int>  <dbl>  <int> <int> <fct>    <fct>      <int> <dbl>
##  1  19000     60    155    53 married  female        13  6.84
##  2  35000     70    156    51 married  female        10 49.4 
##  3 105000     65    195    52 married  male          16 99.4 
##  4  40000     63    197    54 married  female        14 44.0 
##  5  75000     66    190    49 married  male          14 59.7 
##  6 102000     68    200    49 divorced female        18 98.8 
##  7      0     74    225    48 married  male          16 82.3 
##  8  70000     64    160    54 divorced female        12 50.3 
##  9  60000     69    162    55 divorced male          12 89.7 
## 10 150000     69    194    54 divorced male          13 96.0 
## # … with 6,996 more rows

Suppose we were interested in exploring the relationship between sex and afqt (armed forces qualifications test, in percentiles) and if this relationship is moderated by marital status. First, it may be useful to get a baseline to see the relationship between sex and afqt.

afqt_sex <- lm(afqt ~ sex, data = heights)
summary(afqt_sex)
## 
## Call:
## lm(formula = afqt ~ sex, data = heights)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -41.876 -26.034  -4.429  24.046  59.406 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  41.8760     0.5093  82.221   <2e-16 ***
## sexfemale    -1.2824     0.7074  -1.813   0.0699 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 29.03 on 6742 degrees of freedom
##   (262 observations deleted due to missingness)
## Multiple R-squared:  0.0004872,  Adjusted R-squared:  0.000339 
## F-statistic: 3.287 on 1 and 6742 DF,  p-value: 0.06989

Notice that there is a small effect, which is not significant if using an alpha value of 0.05. Also, notice the extremely small r-square value here, this is actually a good finding, we would hope there would be no statistical differences between males and females on this qualifications test. Now lets start adding in marital status. We can do this as follows (Note, I have combined separated and widowed into a single category due to relatively small sample sizes):

heights <- heights %>%
  mutate(
    marital_comb = fct_recode(marital,
          'Other' = 'separated',
          'Other' = 'widowed'
    )
  )
afqt_sex_marital <- lm(afqt ~ sex + marital_comb, data = heights)
summary(afqt_sex_marital)
## 
## Call:
## lm(formula = afqt ~ sex + marital_comb, data = heights)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -47.957 -23.174  -4.386  22.389  74.002 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           31.7982     0.9093  34.970  < 2e-16 ***
## sexfemale             -0.6819     0.6869  -0.993 0.320829    
## marital_combmarried   16.1587     0.9727  16.611  < 2e-16 ***
## marital_combOther     -5.5953     1.5186  -3.685 0.000231 ***
## marital_combdivorced   6.2811     1.1232   5.592 2.33e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 28.02 on 6739 degrees of freedom
##   (262 observations deleted due to missingness)
## Multiple R-squared:  0.069,  Adjusted R-squared:  0.06845 
## F-statistic: 124.9 on 4 and 6739 DF,  p-value: < 2.2e-16

This model only contains what are often referred to as main effects. Namely, these are only the additive effects of sex and marital variables. To get a sense as to what the design matrix looks like, we can use model_matrix.

model_matrix(heights, afqt ~ sex + marital_comb)
## # A tibble: 6,744 × 5
##    `(Intercept)` sexfemale marital_combmarried marital_combOth… marital_combdiv…
##            <dbl>     <dbl>               <dbl>            <dbl>            <dbl>
##  1             1         1                   1                0                0
##  2             1         1                   1                0                0
##  3             1         0                   1                0                0
##  4             1         1                   1                0                0
##  5             1         0                   1                0                0
##  6             1         1                   0                0                1
##  7             1         0                   1                0                0
##  8             1         1                   0                0                1
##  9             1         0                   0                0                1
## 10             1         0                   0                0                1
## # … with 6,734 more rows

To add the interaction between the two variables (multiplicative effects), we can add one additional term to the lm function call.

interact_mod <- lm(afqt ~ sex + marital_comb + sex:marital_comb, data = heights)
summary(interact_mod)
## 
## Call:
## lm(formula = afqt ~ sex + marital_comb + sex:marital_comb, data = heights)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -47.981 -22.978  -4.336  22.488  75.275 
## 
## Coefficients:
##                                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                     31.4205     1.1536  27.237  < 2e-16 ***
## sexfemale                        0.1564     1.7184   0.091    0.928    
## marital_combmarried             16.5603     1.3273  12.477  < 2e-16 ***
## marital_combOther               -2.5872     2.4677  -1.048    0.294    
## marital_combdivorced             6.2796     1.5814   3.971 7.24e-05 ***
## sexfemale:marital_combmarried   -0.8856     1.9516  -0.454    0.650    
## sexfemale:marital_combOther     -4.7414     3.1645  -1.498    0.134    
## sexfemale:marital_combdivorced  -0.1494     2.2535  -0.066    0.947    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 28.02 on 6736 degrees of freedom
##   (262 observations deleted due to missingness)
## Multiple R-squared:  0.06936,    Adjusted R-squared:  0.0684 
## F-statistic: 71.72 on 7 and 6736 DF,  p-value: < 2.2e-16

There is also an anova function that gives more traditional anova and sum of squares information.

anova(interact_mod)
## Analysis of Variance Table
## 
## Response: afqt
##                    Df  Sum Sq Mean Sq  F value  Pr(>F)    
## sex                 1    2769    2769   3.5267 0.06043 .  
## marital_comb        3  389378  129793 165.3055 < 2e-16 ***
## sex:marital_comb    3    2058     686   0.8738 0.45380    
## Residuals        6736 5288896     785                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

In this model, there are not actually any significant results for the interaction, but lets explore the design matrix to see exactly what is happening.

model_matrix(heights, afqt ~ sex + marital_comb + sex:marital_comb)
## # A tibble: 6,744 × 8
##    `(Intercept)` sexfemale marital_combmarried marital_combOth… marital_combdiv…
##            <dbl>     <dbl>               <dbl>            <dbl>            <dbl>
##  1             1         1                   1                0                0
##  2             1         1                   1                0                0
##  3             1         0                   1                0                0
##  4             1         1                   1                0                0
##  5             1         0                   1                0                0
##  6             1         1                   0                0                1
##  7             1         0                   1                0                0
##  8             1         1                   0                0                1
##  9             1         0                   0                0                1
## 10             1         0                   0                0                1
## # … with 6,734 more rows, and 3 more variables:
## #   `sexfemale:marital_combmarried` <dbl>, `sexfemale:marital_combOther` <dbl>,
## #   `sexfemale:marital_combdivorced` <dbl>

This model specifically adds columns that literally are multiplications of other columns in the design matrix. This is why interactions are often depicted with the symbol “x”, e.g. marital x sex. R uses : as interactions. The interaction model can also be specified in an alternate more compact formula:

summary(lm(afqt ~ sex * marital_comb, data = heights))
## 
## Call:
## lm(formula = afqt ~ sex * marital_comb, data = heights)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -47.981 -22.978  -4.336  22.488  75.275 
## 
## Coefficients:
##                                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                     31.4205     1.1536  27.237  < 2e-16 ***
## sexfemale                        0.1564     1.7184   0.091    0.928    
## marital_combmarried             16.5603     1.3273  12.477  < 2e-16 ***
## marital_combOther               -2.5872     2.4677  -1.048    0.294    
## marital_combdivorced             6.2796     1.5814   3.971 7.24e-05 ***
## sexfemale:marital_combmarried   -0.8856     1.9516  -0.454    0.650    
## sexfemale:marital_combOther     -4.7414     3.1645  -1.498    0.134    
## sexfemale:marital_combdivorced  -0.1494     2.2535  -0.066    0.947    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 28.02 on 6736 degrees of freedom
##   (262 observations deleted due to missingness)
## Multiple R-squared:  0.06936,    Adjusted R-squared:  0.0684 
## F-statistic: 71.72 on 7 and 6736 DF,  p-value: < 2.2e-16

Exercises

  1. Using the gss_cat data from the forcats package, fit a model that predicts age with marital status, partyid, and the interaction between the two.
  2. How well does this model fit?
  3. How is the intercept interpreted here?
  4. Do the results change when collapsing the partyid variable into the following three categories:
    • Republican
    • Democrat
    • Other

Visualize Model Results

When exploring model results, visualizing the model results is often more useful than looking at a table of coefficients. In addition, if the model is simply attempting to predict means (ANOVA), plotting often simply involves computing means for the different categories. This section will also explore how best to visualize interactions.

If we simply would like to show the main effects from the model above predicting Armed Forces Qualifiactions Test Score with marital status and sex, we could calculate the means of these groups.

marital_means <- heights %>%
  group_by(marital_comb) %>%
  summarise(avg_afqt = mean(afqt, na.rm = TRUE),
            sd_afqt = sd(afqt, na.rm = TRUE), 
            n = n()) %>%
  mutate(se_mean = sd_afqt/sqrt(n), 
         group = 'Marital', 
         levels = marital_comb) %>%
  ungroup() %>%
  dplyr::select(-marital_comb)
sex_means <- heights %>%
  group_by(sex) %>%
  summarise(avg_afqt = mean(afqt, na.rm = TRUE),
            sd_afqt = sd(afqt, na.rm = TRUE), 
            n = n()) %>%
  mutate(se_mean = sd_afqt/sqrt(n),
         group = 'Sex', 
         levels = sex) %>%
  ungroup() %>%
  dplyr::select(-sex)
comb_means <- bind_rows(marital_means, sex_means)
comb_means
## # A tibble: 6 × 6
##   avg_afqt sd_afqt     n se_mean group   levels  
##      <dbl>   <dbl> <int>   <dbl> <chr>   <fct>   
## 1     31.5    27.7  1124   0.828 Marital single  
## 2     47.6    29.1  3806   0.472 Marital married 
## 3     25.7    24.1   527   1.05  Marital Other   
## 4     37.7    26.6  1549   0.676 Marital divorced
## 5     41.9    29.8  3402   0.511 Sex     male    
## 6     40.6    28.3  3604   0.472 Sex     female

The effects can now be shown in a figure.

ggplot(comb_means, aes(avg_afqt, fct_reorder(levels, avg_afqt))) + 
  geom_point(size = 3) + 
  facet_grid(group ~ ., scales = 'free', space = 'free') + 
  ylab("Groups") + 
  xlab("Average Armed Forces Qualification Score") + 
  theme_bw()

Since we computed standard errors, we could also add error bars using geom_errorbarh.

ggplot(comb_means, aes(avg_afqt, fct_reorder(levels, avg_afqt))) + 
  geom_point(size = 3) + 
  geom_errorbarh(aes(xmin = avg_afqt - se_mean*2, xmax = avg_afqt + se_mean*2),
                 height = 0) + 
  facet_grid(group ~ ., scales = 'free', space = 'free') +
  ylab("Groups") + 
  xlab("Average Armed Forces Qualification Score") + 
  theme_bw()

Plotting the interaction happens in a similar fashion. Namely, we will now calculate means by using both variables in a single group_by statement.

int_means <- heights %>%
  group_by(marital_comb, sex) %>%
  summarise(avg_afqt = mean(afqt, na.rm = TRUE),
            sd_afqt = sd(afqt, na.rm = TRUE), 
            n = n()) %>%
  mutate(se_mean = sd_afqt/sqrt(n))
## `summarise()` has grouped output by 'marital_comb'. You can override using the
## `.groups` argument.
int_means
## # A tibble: 8 × 6
## # Groups:   marital_comb [4]
##   marital_comb sex    avg_afqt sd_afqt     n se_mean
##   <fct>        <fct>     <dbl>   <dbl> <int>   <dbl>
## 1 single       male       31.4    28.2   624   1.13 
## 2 single       female     31.6    27.3   500   1.22 
## 3 married      male       48.0    29.8  1905   0.683
## 4 married      female     47.3    28.5  1901   0.653
## 5 Other        male       28.8    25.7   177   1.93 
## 6 Other        female     24.2    23.1   350   1.24 
## 7 divorced     male       37.7    27.7   696   1.05 
## 8 divorced     female     37.7    25.7   853   0.879

These means should now match the last model from the previous lecture. We can now plot these directly.

ggplot(int_means, aes(avg_afqt, fct_reorder(marital_comb, avg_afqt), 
                      shape = sex, linetype = sex, group = sex)) + 
  geom_point(size = 3) + 
  geom_line(size = 1) + 
  ylab("Groups") + 
  xlab("Average Armed Forces Qualification Score") + 
  theme_bw()

Standard error bars can be shown here, but may complicate the figure too much.

ggplot(int_means, aes(avg_afqt, fct_reorder(marital_comb, avg_afqt), 
                      shape = sex, linetype = sex, group = sex)) + 
  geom_point(size = 3) + 
  geom_line(size = 1) + 
  geom_errorbarh(aes(xmin = avg_afqt - se_mean * 2, xmax = avg_afqt + se_mean * 2), 
                 height = 0) +
  ylab("Groups") + 
  xlab("Average Armed Forces Qualification Score") + 
  theme_bw()

Exercises

  1. Using the gss_cat data from the forcats package, fit a model that predicts age with marital status, partyid, and the interaction between the two.
  2. Create a figure that explores the interaction between the two variables.
  3. Add a third variable to the model, race. Include this as a main effect, plus interacting with the other variables.

Interaction between continuous and categorical predictors

Using again the heights data, suppose we wished to again predict the Armed Forces Qualifications Test score (afqt) with marital status and height. This can be done with the linear model.

summary(lm(afqt ~ marital_comb + height, data = heights))
## 
## Call:
## lm(formula = afqt ~ marital_comb + height, data = heights)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -53.150 -22.517  -4.248  22.097  78.355 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          -28.99932    5.67144  -5.113 3.25e-07 ***
## marital_combmarried   16.13531    0.96385  16.741  < 2e-16 ***
## marital_combOther     -4.63978    1.50160  -3.090  0.00201 ** 
## marital_combdivorced   6.75490    1.11278   6.070 1.35e-09 ***
## height                 0.89847    0.08329  10.787  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 27.78 on 6739 degrees of freedom
##   (262 observations deleted due to missingness)
## Multiple R-squared:  0.08467,    Adjusted R-squared:  0.08413 
## F-statistic: 155.8 on 4 and 6739 DF,  p-value: < 2.2e-16

You’ll notice here that the intercept is negative. Why is this negative? You can most easily see this by looking at a figure of the data. Note, here I am simply showing the dependent and the height variable ignoring the marital status, however, this will have a slightly different slope than the one above (Try it).

ggplot(heights, aes(x = height, y = afqt)) + 
  geom_jitter(size = 2) + 
  geom_abline(intercept = -26.02, slope = 1.002, size = 1, color = 'blue') + 
  coord_cartesian(xlim = c(0, 90), ylim = c(-30, 105)) +
  ylab("Average Armed Forces Qualification Score") + 
  xlab("Height") +
  theme_bw()
## Warning: Removed 262 rows containing missing values (geom_point).

It may be better to mean center the height variable.

summary(lm(afqt ~ marital_comb + I(height - mean(heights$height)), data = heights))
## 
## Call:
## lm(formula = afqt ~ marital_comb + I(height - mean(heights$height)), 
##     data = heights)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -53.150 -22.517  -4.248  22.097  78.355 
## 
## Coefficients:
##                                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                      31.29186    0.84798   36.90  < 2e-16 ***
## marital_combmarried              16.13531    0.96385   16.74  < 2e-16 ***
## marital_combOther                -4.63978    1.50160   -3.09  0.00201 ** 
## marital_combdivorced              6.75490    1.11278    6.07 1.35e-09 ***
## I(height - mean(heights$height))  0.89847    0.08329   10.79  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 27.78 on 6739 degrees of freedom
##   (262 observations deleted due to missingness)
## Multiple R-squared:  0.08467,    Adjusted R-squared:  0.08413 
## F-statistic: 155.8 on 4 and 6739 DF,  p-value: < 2.2e-16

Notice that the mean effects do not change, but rather just the location of the intercept. This is a traditional analysis of covariance model.

Interactions are simple to add now, they follow the same syntax as the categorical predictors from above. For example, if we wanted to add an interaction between height and marital status, we could do this as follows.

summary(lm(afqt ~ marital_comb + I(height - mean(heights$height)) + 
             marital_comb:I(height - mean(heights$height)), 
           data = heights))
## 
## Call:
## lm(formula = afqt ~ marital_comb + I(height - mean(heights$height)) + 
##     marital_comb:I(height - mean(heights$height)), data = heights)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -53.768 -22.664  -4.183  21.972  78.762 
## 
## Coefficients:
##                                                       Estimate Std. Error
## (Intercept)                                           31.32214    0.84911
## marital_combmarried                                   16.08587    0.96531
## marital_combOther                                     -4.58940    1.53070
## marital_combdivorced                                   6.66304    1.11478
## I(height - mean(heights$height))                       0.76180    0.20923
## marital_combmarried:I(height - mean(heights$height))   0.22912    0.23745
## marital_combOther:I(height - mean(heights$height))     0.21641    0.37137
## marital_combdivorced:I(height - mean(heights$height)) -0.02469    0.27510
##                                                       t value Pr(>|t|)    
## (Intercept)                                            36.888  < 2e-16 ***
## marital_combmarried                                    16.664  < 2e-16 ***
## marital_combOther                                      -2.998 0.002725 ** 
## marital_combdivorced                                    5.977 2.39e-09 ***
## I(height - mean(heights$height))                        3.641 0.000274 ***
## marital_combmarried:I(height - mean(heights$height))    0.965 0.334624    
## marital_combOther:I(height - mean(heights$height))      0.583 0.560092    
## marital_combdivorced:I(height - mean(heights$height))  -0.090 0.928481    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 27.79 on 6736 degrees of freedom
##   (262 observations deleted due to missingness)
## Multiple R-squared:  0.08494,    Adjusted R-squared:  0.08399 
## F-statistic: 89.32 on 7 and 6736 DF,  p-value: < 2.2e-16

or

summary(lm(afqt ~ marital_comb * I(height - mean(heights$height)), data = heights))
## 
## Call:
## lm(formula = afqt ~ marital_comb * I(height - mean(heights$height)), 
##     data = heights)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -53.768 -22.664  -4.183  21.972  78.762 
## 
## Coefficients:
##                                                       Estimate Std. Error
## (Intercept)                                           31.32214    0.84911
## marital_combmarried                                   16.08587    0.96531
## marital_combOther                                     -4.58940    1.53070
## marital_combdivorced                                   6.66304    1.11478
## I(height - mean(heights$height))                       0.76180    0.20923
## marital_combmarried:I(height - mean(heights$height))   0.22912    0.23745
## marital_combOther:I(height - mean(heights$height))     0.21641    0.37137
## marital_combdivorced:I(height - mean(heights$height)) -0.02469    0.27510
##                                                       t value Pr(>|t|)    
## (Intercept)                                            36.888  < 2e-16 ***
## marital_combmarried                                    16.664  < 2e-16 ***
## marital_combOther                                      -2.998 0.002725 ** 
## marital_combdivorced                                    5.977 2.39e-09 ***
## I(height - mean(heights$height))                        3.641 0.000274 ***
## marital_combmarried:I(height - mean(heights$height))    0.965 0.334624    
## marital_combOther:I(height - mean(heights$height))      0.583 0.560092    
## marital_combdivorced:I(height - mean(heights$height))  -0.090 0.928481    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 27.79 on 6736 degrees of freedom
##   (262 observations deleted due to missingness)
## Multiple R-squared:  0.08494,    Adjusted R-squared:  0.08399 
## F-statistic: 89.32 on 7 and 6736 DF,  p-value: < 2.2e-16

What do these coefficients mean however? This is best explained with a picture.

model_summary <- augment(lm(afqt ~ marital_comb * I(height - mean(heights$height)), 
                            data = heights))
model_summary <- rename(model_summary, 'height_center' = `I(height - mean(heights$height))`)
model_summary
## # A tibble: 6,744 × 9
##    .rownames  afqt marital_comb height_center .fitted     .hat .sigma    .cooksd
##    <chr>     <dbl> <fct>             <I<dbl>>   <dbl>    <dbl>  <dbl>      <dbl>
##  1 1          6.84 married             -7.10     40.4 0.00115    27.8    2.09e-4
##  2 2         49.4  married              2.90     50.3 0.000390   27.8    4.39e-8
##  3 3         99.4  married             -2.10     45.3 0.000360   27.8    1.70e-4
##  4 4         44.0  married             -4.10     43.3 0.000576   27.8    4.33e-8
##  5 5         59.7  married             -1.10     46.3 0.000301   27.8    8.70e-6
##  6 6         98.8  divorced             0.896    38.6 0.000737   27.8    4.33e-4
##  7 7         82.3  married              6.90     54.2 0.00100    27.8    1.28e-4
##  8 8         50.3  divorced            -3.10     35.7 0.000976   27.8    3.37e-5
##  9 9         89.7  divorced             1.90     39.4 0.000884   27.8    3.63e-4
## 10 10        96.0  divorced             1.90     39.4 0.000884   27.8    4.59e-4
## # … with 6,734 more rows, and 1 more variable: .std.resid <dbl>
ggplot(model_summary, aes(height_center, afqt)) + 
  geom_jitter(alpha = .1) + 
  geom_line(aes(x = height_center, y = .fitted, color = marital_comb),
            size = 2) + 
  ylab("Average Armed Forces Qualification Score") + 
  xlab("Height") +
  theme_bw()

The figure for the traditional ANCOVA model initially explored would look like:

model_summary <- augment(lm(afqt ~ marital_comb + I(height - mean(heights$height)), 
                            data = heights))
model_summary <- rename(model_summary, 'height_center' = `I(height - mean(heights$height))`)
model_summary
## # A tibble: 6,744 × 9
##    .rownames  afqt marital_comb height_center .fitted     .hat .sigma    .cooksd
##    <chr>     <dbl> <fct>             <I<dbl>>   <dbl>    <dbl>  <dbl>      <dbl>
##  1 1          6.84 married             -7.10     41.0 0.000753   27.8    2.29e-4
##  2 2         49.4  married              2.90     50.0 0.000337   27.8    2.99e-8
##  3 3         99.4  married             -2.10     45.5 0.000320   27.8    2.41e-4
##  4 4         44.0  married             -4.10     43.7 0.000439   27.8    9.09e-9
##  5 5         59.7  married             -1.10     46.4 0.000288   27.8    1.31e-5
##  6 6         98.8  divorced             0.896    38.9 0.000684   27.8    6.38e-4
##  7 7         82.3  married              6.90     53.6 0.000674   27.8    1.44e-4
##  8 8         50.3  divorced            -3.10     35.3 0.000736   27.8    4.31e-5
##  9 9         89.7  divorced             1.90     39.7 0.000716   27.8    4.63e-4
## 10 10        96.0  divorced             1.90     39.7 0.000716   27.8    5.88e-4
## # … with 6,734 more rows, and 1 more variable: .std.resid <dbl>
ggplot(model_summary, aes(height_center, afqt)) + 
  geom_jitter(alpha = .1) + 
  geom_line(aes(x = height_center, y = .fitted, color = marital_comb),
            size = 2) + 
  ylab("Average Armed Forces Qualification Score") + 
  xlab("Height") +
  theme_bw()

Exercises

  1. Using the gss_cat data from the forcats package, fit a model that predicts age with marital status, tvhours, and the interaction between the two.
  2. Interpret the parameter estimates from this model? Is there evidence that the interaction is adding to the model?
  3. Create a figure that explores the interaction between the two variables.
  4. Fit another model that predicts age with marital status, party affiliation, and tvhours (main effects) as well as the interaction between marital status and tvhours and party afilliation and tvhours (two second order interactions).
  5. Interpret the effects for this model.
Previous
Next