How to use tidycoRe for CorEvitas table and report formatting
01_table_formatting.Rmd
Welcome to tidycoRe!
A formatting package to create standardized CorEvitas R query projects, tables, and reports.
Get started
To get started, let’s load the package into our working session:
If you have not downloaded it yet from our private github repo, please see the Readme on the package home page.
Create standard-format summary tables
tidycoRe comes equipped with functions to format arsenal summary tableby objects and regular dataframes into the standardized CorEvitas style flextable objects for word, html, and pdf documents.
Tableby objects are an important part of tidycoRe and additional documentation can be found here: https://cran.r-project.org/web/packages/arsenal/vignettes/tableby.html.
The formatted tableby object
The format_tableby function converts tableby objects into data frames. The format_tableby_flextable function styles data frames created by format_tableby into summary flextables that can exported via RMarkdown or Quarto into html, pdf, docx, pptx, etc, documents.
Features include:
- rounding numeric values to 1 significant digit
- condensing rows where numeric variables are summarized by N (if specified (one of the most important benefits)
- summarizing categorical variables using their specified summary statistics in the tableby command
- if specified, borders are applied
First we will create the tableby object.
demo_tableby <- arsenal::tableby(
female ~
age +
white +
hispanic +
insurance_private +
insurance_medicare +
insurance_medicaid +
insurance_none,
cat.stats = c("countpct"),
numeric.stats = c("meansd"),
test = TRUE,
data = demo_data
)
Which looks like this unformatted:
Male (N=110) | Female (N=90) | Total (N=200) | p value | |
---|---|---|---|---|
Age (years) | 0.721 | |||
Mean (SD) | 51.236 (12.971) | 50.556 (13.869) | 50.930 (13.353) | |
White | 0.064 | |||
Other race | 29 (26.4%) | 14 (15.6%) | 43 (21.5%) | |
White | 81 (73.6%) | 76 (84.4%) | 157 (78.5%) | |
Hispanic Ethnicity | 0.300 | |||
No | 106 (97.2%) | 83 (94.3%) | 189 (95.9%) | |
Yes | 3 (2.8%) | 5 (5.7%) | 8 (4.1%) | |
Private Insurance | 0.874 | |||
Not reported | 23 (20.9%) | 18 (20.0%) | 41 (20.5%) | |
Yes | 87 (79.1%) | 72 (80.0%) | 159 (79.5%) | |
Medicare Insurance | 0.767 | |||
Not reported | 91 (82.7%) | 73 (81.1%) | 164 (82.0%) | |
Yes | 19 (17.3%) | 17 (18.9%) | 36 (18.0%) | |
Medicaid Insurance | 0.917 | |||
Not reported | 101 (91.8%) | 83 (92.2%) | 184 (92.0%) | |
Yes | 9 (8.2%) | 7 (7.8%) | 16 (8.0%) | |
No Insurance | 0.839 | |||
Not reported | 108 (98.2%) | 88 (97.8%) | 196 (98.0%) | |
Yes | 2 (1.8%) | 2 (2.2%) | 4 (2.0%) |
Versus using format_tableby() below:
format_tableby(
tableby_data = demo_tableby,
condense = TRUE
) %>%
format_tableby_flextable()
|
Male (N=110) |
Female (N=90) |
Total (N=200) |
p value |
Age (years) |
51.2 (13.0) |
50.6 (13.9) |
50.9 (13.4) |
0.721 |
White |
0.064 |
|||
Other race |
29 (26.4%) |
14 (15.6%) |
43 (21.5%) |
|
White |
81 (73.6%) |
76 (84.4%) |
157 (78.5%) |
|
Hispanic Ethnicity |
3 (2.8%) |
5 (5.7%) |
8 (4.1%) |
0.300 |
Private Insurance |
87 (79.1%) |
72 (80.0%) |
159 (79.5%) |
0.874 |
Medicare Insurance |
19 (17.3%) |
17 (18.9%) |
36 (18.0%) |
0.767 |
Medicaid Insurance |
9 (8.2%) |
7 (7.8%) |
16 (8.0%) |
0.917 |
No Insurance |
2 (1.8%) |
2 (2.2%) |
4 (2.0%) |
0.839 |
Important notes about condense = TRUE
If more than one summary statistic is specified for categorical data, the rows will not be condensed for categorical data
The variable rows will only be condensed if it has a ‘Yes’ value and the one other value is: ‘No’, ‘Not reported’, ‘Unknown’, ‘Unk’. The variable must only have two levels where Yes is one of the levels and the other is from the list previously mentioned.
Numeric summary statistics will pull the first numeric statistic and place it on the variable row
The formatted dataframe
This function styles dataframes into flextables applicable for html, pdf, or word documents.
Features include:
- if specified, borders are applied
format_basictable(
data = tidycoRe::demo_data %>%
select(
biohxgroup,
age,
female,
white,
hispanic,
insurance_private,
insurance_medicare,
insurance_medicaid,
insurance_none) %>%
head(10),
borders = TRUE
)
biohxgroup |
age |
female |
white |
hispanic |
insurance_private |
insurance_medicare |
insurance_medicaid |
insurance_none |
Prior other Bio Non-Failure |
35 |
Female |
Other race |
No |
Yes |
Not reported |
Not reported |
Not reported |
Prior other Bio Non-Failure |
72 |
Male |
White |
No |
Not reported |
Yes |
Not reported |
Not reported |
Prior SEC Failure |
50 |
Female |
White |
No |
Yes |
Not reported |
Not reported |
Not reported |
Prior SEC Failure |
49 |
Male |
White |
Yes |
Yes |
Not reported |
Not reported |
Not reported |
Prior other Bio Failure |
61 |
Female |
White |
No |
Yes |
Not reported |
Not reported |
Not reported |
Prior SEC Failure |
50 |
Male |
Other race |
No |
Yes |
Not reported |
Not reported |
Not reported |
Prior other Bio Failure |
45 |
Female |
White |
No |
Yes |
Not reported |
Not reported |
Not reported |
Prior other Bio Failure |
66 |
Male |
White |
No |
Not reported |
Yes |
Not reported |
Not reported |
Prior other Bio Failure |
69 |
Male |
White |
No |
Not reported |
Yes |
Not reported |
Not reported |
Prior SEC Failure |
39 |
Female |
White |
No |
Yes |
Not reported |
Not reported |
Not reported |
Tips on formatting flextables
In general, other than headers and footers, there is no easy way to add additional rows to a flextable object or flextables together. This needs to be done when the objects are still data frames, i.e., before using ‘format_basictable()’, or after ‘format_tableby()’ and before ‘format_tableby_flextable()’.
You can pull a dataframe object out of a flextable using the following code:
table_name$body$dataset
to extract and manipulate before reapplying the flextable formatting with the above tidycoRe functions.