Hi!
This vignette illustrates the functionality of mlth.data.frame
package. This name stands for multiheaded data.frame, a data.frame with grouped columns.
This package is a companion of TwinAnalysis
package. As such, it provides functionality for organizing and collecting the tables across the analysis script and sending them into an Excel file or on a html/markdown/LaTeX report.
To install the package:
# install.packages('devtools')
devtools::install_github('IvanVoronin/mlth.data.frame')
library(mlth.data.frame)
A mlth.data.frame looks like this:
A <- mlth.data.frame(Letters = list(cons = c('a', 'e', 'i'),
vow = c('b', 'c', 'd')),
Numbers = list(even = c(2, 4, 6),
odd = c(1, 3, 5)))
A
## Letters- Numbers-
## cons vow even odd
## 1 a b 2 1
## 2 e c 4 3
## 3 i d 6 5
This mlth.data.frame has 4 columns and 3 rows. Use ncol()
, nrow()
and dim()
to get the size of the table:
The columns of mlth.data.frame can be arranged into groups, in this case Letters and Numbers. You can select groups and variables by refering to them by name or index number:
A$Letters
## cons vow
## 1 a b
## 2 e c
## 3 i d
A[['Numbers']]
## even odd
## 1 2 1
## 2 4 3
## 3 6 5
# Mind the difference between `[` and `[[`
A['Numbers']
## Numbers-
## even odd
## 1 2 1
## 2 4 3
## 3 6 5
If you select a single column, it will return vector, otherwise - mlth.data.frame.
A$Letters$cons
## [1] a e i
## Levels: a e i
# But:
A$Letters['cons']
## cons
## 1 a
## 2 e
## 3 i
class(A$Letters['cons'])
## [1] "mlth.data.frame" "list"
Any mlth.data.frame is a list, so all appropriate functions can be applied to them.
names(A)
## [1] "Letters" "Numbers"
lapply(A, names)
## $Letters
## [1] "cons" "vow"
##
## $Numbers
## [1] "even" "odd"
rapply(A, class)
## Letters.cons Letters.vow Numbers.even Numbers.odd
## "factor" "factor" "numeric" "numeric"
I will use the in-built dataset mtcars
to demonstrate the functionality of mlth.data.frame for the output. I want to compare the cars with 4, 6 and 8 cylinders (cyl
) by mileage (mpg
), horsepower (hp
) and weight. This is one of many possible ways to do so and get the result in mlth.data.frame.
data(mtcars)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
mtcars$cyl <- factor(mtcars$cyl,
levels = c(4, 6, 8),
labels = paste(c(4, 6, 8), 'cylinders'))
var_names <- c('mpg', 'hp', 'wt')
Tab <- mtcars[, var_names] %>%
by(mtcars$cyl, function(x) {
lapply(x, function(y) {
y <- na.omit(y)
data.frame(n = length(y),
M = mean(y),
SD = sd(y))
}) %>% do.call('rbind', .)
}) %>% do.call('mlth.data.frame', .) %>%
{ row.names(.) <- var_names; . }
Tab
## 4 cylinders------------ 6 cylinders------------ 8 cylinders-------------
## n M SD n M SD n M SD
## mpg 11 26.663636 4.5098277 7 19.742857 1.4535670 14 15.100000 2.5600481
## hp 11 82.636364 20.9345300 7 122.285714 24.2604911 14 209.214286 50.9768855
## wt 11 2.285727 0.5695637 7 3.117143 0.3563455 14 3.999214 0.7594047
Now let’s append the results from ANOVA.
Tab <- mtcars[, var_names] %>%
lapply(function(x) {
LM <- summary(lm(x ~ mtcars$cyl))
ftest <- LM$fstatistic
data.frame(F = ftest['value'],
df1 = ftest['numdf'],
df2 = ftest['dendf'],
p = pf(ftest['value'],
ftest['numdf'],
ftest['dendf'],
lower.tail = FALSE),
R2 = LM$r.squared)
}) %>% do.call('rbind', .) %>%
as.mlth.data.frame %>%
cbind(Tab, .)
Tab
## 4 cylinders------------ 6 cylinders------------ 8 cylinders-------------
## n M SD n M SD n M SD F df1 df2 p R2
## mpg 11 26.663636 4.5098277 7 19.742857 1.4535670 14 15.100000 2.5600481 39.69752 2 29 4.978919e-09 0.7324601
## hp 11 82.636364 20.9345300 7 122.285714 24.2604911 14 209.214286 50.9768855 36.17687 2 29 1.318541e-08 0.7138734
## wt 11 2.285727 0.5695637 7 3.117143 0.3563455 14 3.999214 0.7594047 22.91139 2 29 1.074683e-06 0.6124175
To send this table to an Excel output, we have to register it. Then, at the end of the script, we will use write.xlsx.output()
to write all the tables to a single Excel file. The table will be appended to the global OUTPUT
list. We can point out the name of Excel spreadsheet in the file (name
), table caption (caption
) and table footnote (note
). Details in ?register_output
. The table doesn’t have to be mlth.data.frame, the regular data.frame can also be recorded for the output.
register_output(Tab, name = 'by cylinders',
caption = 'Characteristics of the cars with different engines')
## 4 cylinders------------ 6 cylinders------------ 8 cylinders-------------
## n M SD n M SD n M SD F df1 df2 p R2
## mpg 11 26.663636 4.5098277 7 19.742857 1.4535670 14 15.100000 2.5600481 39.69752 2 29 4.978919e-09 0.7324601
## hp 11 82.636364 20.9345300 7 122.285714 24.2604911 14 209.214286 50.9768855 36.17687 2 29 1.318541e-08 0.7138734
## wt 11 2.285727 0.5695637 7 3.117143 0.3563455 14 3.999214 0.7594047 22.91139 2 29 1.074683e-06 0.6124175
# This is an example of adding a data.frame to the output
data.frame(A = 1:3, B = 4:6) %>%
register_output(name = 'lonely data.frame')
## A B
## 1 1 4
## 2 2 5
## 3 3 6
# It works
To forward the table to Rmarkdown document, use knitr
and kableExtra
( more about kableExtra). The latter package provides additional functionality for table formating, including layered headers. Here is an example of formating the table as html.
library(knitr)
library(kableExtra)
##
## Attaching package: 'kableExtra'
## The following object is masked from 'package:dplyr':
##
## group_rows
Tab %>% behead %>%
kable(digits = c(0, 2, 2, 0, 2, 2, 0, 2, 2, 3, 0, 0, 3, 3)) %>%
add_complex_header_above(Tab) %>%
kable_styling
n | M | SD | n | M | SD | n | M | SD | F | df1 | df2 | p | R2 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
mpg | 11 | 26.66 | 4.51 | 7 | 19.74 | 1.45 | 14 | 15.10 | 2.56 | 39.698 | 2 | 29 | 0 | 0.732 |
hp | 11 | 82.64 | 20.93 | 7 | 122.29 | 24.26 | 14 | 209.21 | 50.98 | 36.177 | 2 | 29 | 0 | 0.714 |
wt | 11 | 2.29 | 0.57 | 7 | 3.12 | 0.36 | 14 | 4.00 | 0.76 | 22.911 | 2 | 29 | 0 | 0.612 |
At the end we write all the output into a file. Currently, openxlsx
package is used to write the tables into xlsx file. But you can define your own writer function based on any other package of your choice and pass it to write.xlsx.output()
. Details in ?write.xlsx.output
and ?xlsx.writer.openxlsx
.