Skip to contents

Setup

We start by loading a few helpful libraries.

# Load libraries
library(tidyverse, warn.conflicts = FALSE) # Data wrangling/ manipulation
library(arsenal)   # Package with good table creation functions
library(tidycoRe)  # Internal package for creating tables
library(flextable) # For formatting tables for presentation
library(broom)     # For cleaning model results
library(glue)      # For concatenating strings

Linear regression table

Using the demo dataset included in the tidycoRe package, we’re going to build a linear regression to predict BMI with the following predictors: age, gender, race, ethnicity, smoking status, drinking status, history of cancer, history of CVD, history of hypertension, history of hyperlipidemia, and history of diabetes.

lm2 <- lm(
  bmi ~ age + female + white + hispanic + shx_smoke + shx_drink + hx_cancer + 
        hx_cvd + hx_htn + hx_hld + hx_diabetes, 
  data = demo_data
  )

And here’s a summary of the results:

summary(lm2)
#> 
#> Call:
#> lm(formula = bmi ~ age + female + white + hispanic + shx_smoke + 
#>     shx_drink + hx_cancer + hx_cvd + hx_htn + hx_hld + hx_diabetes, 
#>     data = demo_data)
#> 
#> Residuals:
#>      Min       1Q   Median       3Q      Max 
#> -15.2699  -4.6741  -0.8572   3.7478  22.7053 
#> 
#> Coefficients:
#>                              Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)                  35.69377    2.66691  13.384  < 2e-16 ***
#> age                          -0.11628    0.04650  -2.501 0.013424 *  
#> femaleFemale                  0.03388    1.10011   0.031 0.975473    
#> whiteWhite                    3.61611    1.39484   2.592 0.010428 *  
#> hispanicYes                   3.72449    2.80140   1.330 0.185609    
#> shx_smokeFormer smoker       -0.21779    1.18552  -0.184 0.854478    
#> shx_smokeCurrent smoker      -0.52670    1.76314  -0.299 0.765544    
#> shx_drink<1 drink per week   -1.00385    1.50030  -0.669 0.504413    
#> shx_drink1-3 drinks per week -4.14258    1.65234  -2.507 0.013190 *  
#> shx_drink4-6 drinks per week -3.05412    1.77738  -1.718 0.087709 .  
#> shx_drink1-2 drinks per day  -5.03727    1.97711  -2.548 0.011801 *  
#> shx_drink>2 drinks per day    0.67882    3.06979   0.221 0.825279    
#> hx_cancerYes                 -2.07835    4.61225  -0.451 0.652888    
#> hx_cvdYes                    -0.26151    1.81825  -0.144 0.885824    
#> hx_htnYes                     4.19313    1.22987   3.409 0.000828 ***
#> hx_hldYes                    -1.07512    1.43113  -0.751 0.453632    
#> hx_diabetesYes                1.86206    1.65515   1.125 0.262302    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 6.954 on 157 degrees of freedom
#>   (26 observations deleted due to missingness)
#> Multiple R-squared:  0.2112, Adjusted R-squared:  0.1308 
#> F-statistic: 2.627 on 16 and 157 DF,  p-value: 0.001143

If we plan on generating multiple models using the same formula, we can save the variables as vectors and call on them using the “formulize” function from the “arsenal” package. We can see that these results are the same as the the previous model.

LHS <- "bmi"
RHS <- c("age", 
         "female", 
         "white", 
         "hispanic", 
         "shx_smoke", 
         "shx_drink", 
         "hx_cancer", 
         "hx_cvd", 
         "hx_htn", 
         "hx_hld", 
         "hx_diabetes")

lm2_0 <- lm(
  formulize(LHS, RHS),
  data = demo_data
)

summary(lm2_0)
#> 
#> Call:
#> lm(formula = formulize(LHS, RHS), data = demo_data)
#> 
#> Residuals:
#>      Min       1Q   Median       3Q      Max 
#> -15.2699  -4.6741  -0.8572   3.7478  22.7053 
#> 
#> Coefficients:
#>                              Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)                  35.69377    2.66691  13.384  < 2e-16 ***
#> age                          -0.11628    0.04650  -2.501 0.013424 *  
#> femaleFemale                  0.03388    1.10011   0.031 0.975473    
#> whiteWhite                    3.61611    1.39484   2.592 0.010428 *  
#> hispanicYes                   3.72449    2.80140   1.330 0.185609    
#> shx_smokeFormer smoker       -0.21779    1.18552  -0.184 0.854478    
#> shx_smokeCurrent smoker      -0.52670    1.76314  -0.299 0.765544    
#> shx_drink<1 drink per week   -1.00385    1.50030  -0.669 0.504413    
#> shx_drink1-3 drinks per week -4.14258    1.65234  -2.507 0.013190 *  
#> shx_drink4-6 drinks per week -3.05412    1.77738  -1.718 0.087709 .  
#> shx_drink1-2 drinks per day  -5.03727    1.97711  -2.548 0.011801 *  
#> shx_drink>2 drinks per day    0.67882    3.06979   0.221 0.825279    
#> hx_cancerYes                 -2.07835    4.61225  -0.451 0.652888    
#> hx_cvdYes                    -0.26151    1.81825  -0.144 0.885824    
#> hx_htnYes                     4.19313    1.22987   3.409 0.000828 ***
#> hx_hldYes                    -1.07512    1.43113  -0.751 0.453632    
#> hx_diabetesYes                1.86206    1.65515   1.125 0.262302    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 6.954 on 157 degrees of freedom
#>   (26 observations deleted due to missingness)
#> Multiple R-squared:  0.2112, Adjusted R-squared:  0.1308 
#> F-statistic: 2.627 on 16 and 157 DF,  p-value: 0.001143

We generally want to report the coefficients and corresponding CIs and p-values from our regression. We can extract these using the “tidy” function from the “broom” package.

tidy(lm2, conf.int = TRUE)
#> # A tibble: 17 × 7
#>    term                         estim…¹ std.e…² stati…³  p.value conf.…⁴ conf.…⁵
#>    <chr>                          <dbl>   <dbl>   <dbl>    <dbl>   <dbl>   <dbl>
#>  1 (Intercept)                  35.7     2.67   13.4    9.66e-28  30.4   41.0   
#>  2 age                          -0.116   0.0465 -2.50   1.34e- 2  -0.208 -0.0244
#>  3 femaleFemale                  0.0339  1.10    0.0308 9.75e- 1  -2.14   2.21  
#>  4 whiteWhite                    3.62    1.39    2.59   1.04e- 2   0.861  6.37  
#>  5 hispanicYes                   3.72    2.80    1.33   1.86e- 1  -1.81   9.26  
#>  6 shx_smokeFormer smoker       -0.218   1.19   -0.184  8.54e- 1  -2.56   2.12  
#>  7 shx_smokeCurrent smoker      -0.527   1.76   -0.299  7.66e- 1  -4.01   2.96  
#>  8 shx_drink<1 drink per week   -1.00    1.50   -0.669  5.04e- 1  -3.97   1.96  
#>  9 shx_drink1-3 drinks per week -4.14    1.65   -2.51   1.32e- 2  -7.41  -0.879 
#> 10 shx_drink4-6 drinks per week -3.05    1.78   -1.72   8.77e- 2  -6.56   0.457 
#> 11 shx_drink1-2 drinks per day  -5.04    1.98   -2.55   1.18e- 2  -8.94  -1.13  
#> 12 shx_drink>2 drinks per day    0.679   3.07    0.221  8.25e- 1  -5.38   6.74  
#> 13 hx_cancerYes                 -2.08    4.61   -0.451  6.53e- 1 -11.2    7.03  
#> 14 hx_cvdYes                    -0.262   1.82   -0.144  8.86e- 1  -3.85   3.33  
#> 15 hx_htnYes                     4.19    1.23    3.41   8.28e- 4   1.76   6.62  
#> 16 hx_hldYes                    -1.08    1.43   -0.751  4.54e- 1  -3.90   1.75  
#> 17 hx_diabetesYes                1.86    1.66    1.13   2.62e- 1  -1.41   5.13  
#> # … with abbreviated variable names ¹​estimate, ²​std.error, ³​statistic,
#> #   ⁴​conf.low, ⁵​conf.high

Since tidy returns a data frame object, we can clean this up using dplyr functions as we normally would.

lm2_tidy  <- tidy(lm2, conf.int = TRUE) %>% 
  mutate(
    p.value = format.pval(p.value, digits = 2, eps = 0.001), # format p-value column
    across(where(is.numeric), ~roundx(.x, 2)), # Round numeric columns to 2 digits
    b_ci = glue("{estimate} ({conf.low}, {conf.high})") # Concatenate rounded coefficient plus lower and upper CI bounds
    ) %>% 
  select(term, b_ci, p.value) # Select columns to keep

lm2_tidy
#> # A tibble: 17 × 3
#>    term                         b_ci                  p.value
#>    <chr>                        <glue>                <chr>  
#>  1 (Intercept)                  35.69 ( 30.43, 40.96) <0.001 
#>  2 age                          -0.12 ( -0.21, -0.02) 0.013  
#>  3 femaleFemale                  0.03 ( -2.14,  2.21) 0.975  
#>  4 whiteWhite                    3.62 (  0.86,  6.37) 0.010  
#>  5 hispanicYes                   3.72 ( -1.81,  9.26) 0.186  
#>  6 shx_smokeFormer smoker       -0.22 ( -2.56,  2.12) 0.854  
#>  7 shx_smokeCurrent smoker      -0.53 ( -4.01,  2.96) 0.766  
#>  8 shx_drink<1 drink per week   -1.00 ( -3.97,  1.96) 0.504  
#>  9 shx_drink1-3 drinks per week -4.14 ( -7.41, -0.88) 0.013  
#> 10 shx_drink4-6 drinks per week -3.05 ( -6.56,  0.46) 0.088  
#> 11 shx_drink1-2 drinks per day  -5.04 ( -8.94, -1.13) 0.012  
#> 12 shx_drink>2 drinks per day    0.68 ( -5.38,  6.74) 0.825  
#> 13 hx_cancerYes                 -2.08 (-11.19,  7.03) 0.653  
#> 14 hx_cvdYes                    -0.26 ( -3.85,  3.33) 0.886  
#> 15 hx_htnYes                     4.19 (  1.76,  6.62) <0.001 
#> 16 hx_hldYes                    -1.08 ( -3.90,  1.75) 0.454  
#> 17 hx_diabetesYes                1.86 ( -1.41,  5.13) 0.262

If we want goodness of fit statistics, we can use the “glance” function from the broom package.

lm2_glance  <- glance(lm2) %>% 
  mutate(
    across(where(is.numeric), ~roundx(.x, 2))
    )

lm2_glance
#> # A tibble: 1 × 12
#>   r.squ…¹ adj.r…² sigma stati…³ p.value df    logLik AIC   BIC   devia…⁴ df.re…⁵
#>   <chr>   <chr>   <chr> <chr>   <chr>   <chr> <chr>  <chr> <chr> <chr>   <chr>  
#> 1 0.21    0.13    6.95  2.63    0.00    16.00 -575.… 1186… 1243… 7592.88 157.00 
#> # … with 1 more variable: nobs <chr>, and abbreviated variable names
#> #   ¹​r.squared, ²​adj.r.squared, ³​statistic, ⁴​deviance, ⁵​df.residual

These results are still in plain-text in our console window. Let’s use the tidycoRe function, format_basictable, to convert this data frame to a flextable object, apply some default formats, and print.

After converting to a flextable object, we can further tweak its appearance by applying functions from the flextable package. As an example, let’s rename the columns.

format_basictable(lm2_tidy) %>% 
  set_header_labels(
    term = "Parameter",
    b_ci = "\U03B2 (95% CI)",
    p.value = "p"
  )

We can use the same process for the goodness of fit table.

format_basictable(lm2_glance)

And change the column names with flextable::set_header_labels and add outer borders.

format_basictable(lm2_glance, borders = TRUE) %>% 
  set_header_labels(
    r.squared     = "R\U00B2",
    adj.r.squared = "Adjusted R\U00B2",
    sigma         = "Standard error residuals",
    statistic     = "Test statistic",
    p.value       = "p",
    df            = "Degrees of Freedom",
    logLik        = "Log-likelihood",
    deviance      = "Deviance",
    df.residual   = "Degrees of Freedom residuals",
    nobs          = "# observations"
  )