Getting started
Let’s make a simple linear model first.
library(partialling.out)
library(tinytable)
library(tinyplot)
library(palmerpenguins)
library(fwlplot)
model <- lm(bill_length_mm ~ bill_depth_mm + species, data = penguins)
summary(model)
#>
#> Call:
#> lm(formula = bill_length_mm ~ bill_depth_mm + species, data = penguins)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -8.0300 -1.5828 0.0733 1.6925 10.0313
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 13.2164 2.2475 5.88 9.83e-09 ***
#> bill_depth_mm 1.3940 0.1220 11.43 < 2e-16 ***
#> speciesChinstrap 9.9390 0.3678 27.02 < 2e-16 ***
#> speciesGentoo 13.4033 0.5118 26.19 < 2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 2.518 on 338 degrees of freedom
#> (2 observations deleted due to missingness)
#> Multiple R-squared: 0.7892, Adjusted R-squared: 0.7874
#> F-statistic: 421.9 on 3 and 338 DF, p-value: < 2.2e-16
Using the partialling_out
function, you can get the
residualised variable of interest (bill_length_mm
) and of
the first explanatory variable (bill_depth_mm
), i.e. it
would return the residuals of the following two regressions.
modely <- lm(bill_length_mm ~ species, data = penguins)
modelx <- lm(bill_depth_mm ~ species, data = penguins)
res <- partialling_out(model, data = penguins)
tt(head(res)) |>
format_tt(digits = 2) |>
style_tt(align = "c")
res_bill_length_mm | res_bill_depth_mm |
---|---|
0.31 | 0.35 |
0.71 | -0.95 |
1.51 | -0.35 |
-2.09 | 0.95 |
0.51 | 2.25 |
0.11 | -0.55 |
Accordingly, the coefficient of res_bill_depth_mm
in the
model lm(res_bill_length_mm ~ res_bill_depth_mm)
will be
the same of the coefficient of bill_depth_mm
in the
original model.
resmodel <- lm(res_bill_length_mm ~ res_bill_depth_mm, data = res)
print(c(model$coefficients[2], resmodel$coefficients[2]))
#> bill_depth_mm res_bill_depth_mm
#> 1.394011 1.394011
If both
is set to FALSE
, the function will
return the actual Y values and the residualised X values.
tt(head(partialling_out(model, penguins, both = FALSE))) |>
format_tt(digits = 2) |>
style_tt(align = "c")
bill_length_mm | res_bill_depth_mm |
---|---|
39 | 0.35 |
40 | -0.95 |
40 | -0.35 |
37 | 0.95 |
39 | 2.25 |
39 | -0.55 |
Weighted models
If weights
are specified in partialling_out
they will be used to weight the underlying partial models. The weights
will also be returned as a column in the result data.frame. If the
original model is weighted but the partial models aren’t, the function
will throw a warning (also if the original model is not weighted but
partial models are).
model <- lm(
bill_length_mm ~ bill_depth_mm + species,
data = penguins,
weights = penguins$body_mass_g
)
res <- partialling_out(model, data = penguins, weights = penguins$body_mass_g)
tt(head(res)) |>
format_tt(digits = 2) |>
style_tt(align = "c")
res_bill_length_mm | res_bill_depth_mm | weights |
---|---|---|
0.129 | 0.27 | 3750 |
0.529 | -1.03 | 3800 |
1.329 | -0.43 | 3250 |
-2.271 | 0.87 | 3450 |
0.329 | 2.17 | 3650 |
-0.071 | -0.63 | 3625 |
Note that, for the FWL theorem to hold, if the partial models are weighted, the linear model between the two residualised variables must be weighted as well.
unweighted_model <- lm(res_bill_length_mm ~ res_bill_depth_mm, data = res)
weighted_model <- lm(
res_bill_length_mm ~ res_bill_depth_mm,
weights = res$weights,
data = res
)
data.frame(
"original model" = model$coefficients[2],
"unweighted model" = unweighted_model$coefficients[2],
"weighted_model" = weighted_model$coefficients[2]
) |>
tt() |>
format_tt(digits = 4) |>
style_tt(align = "c")
original.model | unweighted.model | weighted_model |
---|---|---|
1.442 | 1.394 | 1.442 |
Fixed effects models
As stated, the model will also work with feols
or
felm
models
library(fixest)
model_fixest <- feols(bill_length_mm ~ bill_depth_mm | species, data = penguins)
res_fixest <- partialling_out(model_fixest, data = penguins)
tt(head(res_fixest)) |>
format_tt(digits = 2) |>
style_tt(align = "c")
res_bill_length_mm | res_bill_depth_mm |
---|---|
0.31 | 0.35 |
0.71 | -0.95 |
1.51 | -0.35 |
-2.09 | 0.95 |
0.51 | 2.25 |
0.11 | -0.55 |
library(lfe)
model_lfe <- felm(bill_length_mm ~ bill_depth_mm | species, data = penguins)
res_lfe <- partialling_out(model_lfe, data = penguins)
tt(head(res_lfe)) |>
format_tt(digits = 2) |>
style_tt(align = "c")
res_bill_length_mm | res_bill_depth_mm |
---|---|
0.31 | 0.35 |
0.71 | -0.95 |
1.51 | -0.35 |
-2.09 | 0.95 |
0.51 | 2.25 |
0.11 | -0.55 |
Interaction terms and varying slopes
So far, interaction terms with :
, *
, or
^
(in feols
) as well as varying slopes in
feols
with [
or [[
are supported
only for variables other than the main explanatory variable, i.e.
feols(bill_depth_mm ~ bill_length_mm | species^sex, data = penguins) |>
partialling_out()
is supported, but
feols(bill_depth_mm ~ i(bill_length_mm, species) | sex, data = penguins) |>
partialling_out()
will yield an error. Therefore, if your main explanatory variable is an interaction, it should be built outside the model if you intend to partial it out afterwards.
AsIs and polynomial terms
So far, AsIs expressions, and polynomial terms in the formula are not yet supported, so these two expressions will yield an error.
partialling_out(lm(bill_depth_mm ~ I(bill_length_mm^2) + species,
data = penguins))
partialling_out(lm(bill_depth_mm ~ poly(bill_length_mm, 2) + species,
data = penguins[!is.na(penguins$bill_length_mm), ]))
Plotting the results
Results can then be displayed in a scatterplot either regular or binned.
tinytheme("tufte")
par(mfrow = c(2, 1))
tinyplot(res_fixest$res_bill_length_mm ~ res_fixest$res_bill_depth_mm)
tinyplot_add(
res_fixest$res_bill_length_mm ~ res_fixest$res_bill_depth_mm,
type = "lm"
)
res_fixest$qnt <- cut(
res_fixest$res_bill_depth_mm,
breaks = quantile(res_fixest$res_bill_depth_mm, probs = seq(0, 1, .02)),
include.lowest = TRUE
)
res_qnt <- aggregate(
cbind(res_bill_depth_mm, res_bill_length_mm) ~ qnt,
data = res_fixest,
FUN = mean
)
tinyplot(res_qnt$res_bill_length_mm ~ res_qnt$res_bill_depth_mm)
tinyplot_add(
res_fixest$res_bill_length_mm ~ res_fixest$res_bill_depth_mm,
type = "lm"
)
Plotting all residuals is equivalent to the following approach in
fwlplot
fwlplot(bill_length_mm ~ bill_depth_mm | species, data = penguins)
Adding other parameters to the model
Any parameters that could be passed to lm()
,
feols()
, or felm()
, can be passed to
partialling_out()
.
model_fixest <- feols(
bill_length_mm ~ bill_depth_mm | species + island,
data = penguins,
cluster = ~species
)
tt(head(partialling_out(model_fixest, data = penguins, cluster = ~species))) |>
format_tt(digits = 2) |>
style_tt(align = "c")
res_bill_length_mm | res_bill_depth_mm |
---|---|
0.149 | 0.27 |
0.549 | -1.03 |
1.349 | -0.43 |
-2.251 | 0.87 |
0.349 | 2.17 |
-0.051 | -0.63 |
Acknowledgements
To the authors of the fwlplot package, Kyle Butts and Grant McDermott, which has provided inspiration and ideas for this project. To my colleague Andreu Arenas-Jal for his insight and guiding.