Hi!

This vignette illustrates the functionality of mlth.data.frame package. This name stands for multiheaded data.frame, a data.frame with grouped columns.

This package is a companion of TwinAnalysis package. As such, it provides functionality for organizing and collecting the tables across the analysis script and sending them into an Excel file or on a html/markdown/LaTeX report.

To install the package:

# install.packages('devtools')
devtools::install_github('IvanVoronin/mlth.data.frame')

library(mlth.data.frame)

A mlth.data.frame looks like this:

A <- mlth.data.frame(Letters = list(cons = c('a', 'e', 'i'),
                                    vow = c('b', 'c', 'd')),
                     Numbers = list(even = c(2, 4, 6),
                                    odd = c(1, 3, 5)))

A
##   Letters- Numbers-
##   cons vow even odd
## 1 a    b      2   1
## 2 e    c      4   3
## 3 i    d      6   5

This mlth.data.frame has 4 columns and 3 rows. Use ncol(), nrow() and dim() to get the size of the table:

ncol(A)
## [1] 4

nrow(A)
## [1] 3

dim(A)
## [1] 3 4

The columns of mlth.data.frame can be arranged into groups, in this case Letters and Numbers. You can select groups and variables by refering to them by name or index number:

A$Letters
##   cons vow
## 1 a    b  
## 2 e    c  
## 3 i    d

A[['Numbers']]
##   even odd
## 1    2   1
## 2    4   3
## 3    6   5

# Mind the difference between `[` and `[[`
A['Numbers']
##   Numbers-
##   even odd
## 1    2   1
## 2    4   3
## 3    6   5

If you select a single column, it will return vector, otherwise - mlth.data.frame.

A$Letters$cons
## [1] a e i
## Levels: a e i

# But:
A$Letters['cons']
##   cons
## 1 a   
## 2 e   
## 3 i

class(A$Letters['cons'])
## [1] "mlth.data.frame" "list"

Any mlth.data.frame is a list, so all appropriate functions can be applied to them.

names(A)
## [1] "Letters" "Numbers"

lapply(A, names)
## $Letters
## [1] "cons" "vow" 
## 
## $Numbers
## [1] "even" "odd"

rapply(A, class)
## Letters.cons  Letters.vow Numbers.even  Numbers.odd 
##     "factor"     "factor"    "numeric"    "numeric"

I will use the in-built dataset mtcars to demonstrate the functionality of mlth.data.frame for the output. I want to compare the cars with 4, 6 and 8 cylinders (cyl) by mileage (mpg), horsepower (hp) and weight. This is one of many possible ways to do so and get the result in mlth.data.frame.

data(mtcars)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

mtcars$cyl <- factor(mtcars$cyl,
                     levels = c(4, 6, 8),
                     labels = paste(c(4, 6, 8), 'cylinders'))

var_names <- c('mpg', 'hp', 'wt')

Tab <- mtcars[, var_names] %>%
  by(mtcars$cyl, function(x) {
    lapply(x, function(y) {
      y <- na.omit(y)
      data.frame(n = length(y), 
           M = mean(y),
           SD = sd(y))
    }) %>% do.call('rbind', .)
  }) %>% do.call('mlth.data.frame', .) %>%
  { row.names(.) <- var_names; . }

Tab
##     4 cylinders------------ 6 cylinders------------ 8 cylinders-------------
##     n  M         SD         n M          SD         n  M          SD        
## mpg 11 26.663636  4.5098277 7  19.742857  1.4535670 14  15.100000  2.5600481
## hp  11 82.636364 20.9345300 7 122.285714 24.2604911 14 209.214286 50.9768855
## wt  11  2.285727  0.5695637 7   3.117143  0.3563455 14   3.999214  0.7594047

Now let’s append the results from ANOVA.

Tab <- mtcars[, var_names] %>%
  lapply(function(x) {
    LM <- summary(lm(x ~ mtcars$cyl))
    ftest <- LM$fstatistic
    data.frame(F = ftest['value'],
               df1 = ftest['numdf'],
               df2 = ftest['dendf'],
               p = pf(ftest['value'], 
                      ftest['numdf'], 
                      ftest['dendf'], 
                      lower.tail = FALSE),
               R2 = LM$r.squared)
  }) %>% do.call('rbind', .) %>%
  as.mlth.data.frame %>%
  cbind(Tab, .)

Tab
##     4 cylinders------------ 6 cylinders------------ 8 cylinders-------------                                        
##     n  M         SD         n M          SD         n  M          SD         F        df1 df2 p            R2       
## mpg 11 26.663636  4.5098277 7  19.742857  1.4535670 14  15.100000  2.5600481 39.69752   2  29 4.978919e-09 0.7324601
## hp  11 82.636364 20.9345300 7 122.285714 24.2604911 14 209.214286 50.9768855 36.17687   2  29 1.318541e-08 0.7138734
## wt  11  2.285727  0.5695637 7   3.117143  0.3563455 14   3.999214  0.7594047 22.91139   2  29 1.074683e-06 0.6124175

To send this table to an Excel output, we have to register it. Then, at the end of the script, we will use write.xlsx.output() to write all the tables to a single Excel file. The table will be appended to the global OUTPUT list. We can point out the name of Excel spreadsheet in the file (name), table caption (caption) and table footnote (note). Details in ?register_output. The table doesn’t have to be mlth.data.frame, the regular data.frame can also be recorded for the output.

register_output(Tab, name = 'by cylinders',
                caption = 'Characteristics of the cars with different engines')
##     4 cylinders------------ 6 cylinders------------ 8 cylinders-------------                                        
##     n  M         SD         n M          SD         n  M          SD         F        df1 df2 p            R2       
## mpg 11 26.663636  4.5098277 7  19.742857  1.4535670 14  15.100000  2.5600481 39.69752   2  29 4.978919e-09 0.7324601
## hp  11 82.636364 20.9345300 7 122.285714 24.2604911 14 209.214286 50.9768855 36.17687   2  29 1.318541e-08 0.7138734
## wt  11  2.285727  0.5695637 7   3.117143  0.3563455 14   3.999214  0.7594047 22.91139   2  29 1.074683e-06 0.6124175
# This is an example of adding a data.frame to the output
data.frame(A = 1:3, B = 4:6) %>%
  register_output(name = 'lonely data.frame')
##   A B
## 1 1 4
## 2 2 5
## 3 3 6
# It works

To forward the table to Rmarkdown document, use knitr and kableExtra ( more about kableExtra). The latter package provides additional functionality for table formating, including layered headers. Here is an example of formating the table as html.

library(knitr)
library(kableExtra)
## 
## Attaching package: 'kableExtra'
## The following object is masked from 'package:dplyr':
## 
##     group_rows

Tab %>% behead %>%
  kable(digits = c(0, 2, 2, 0, 2, 2, 0, 2, 2, 3, 0, 0, 3, 3))  %>% 
  add_complex_header_above(Tab) %>%
  kable_styling
4 cylinders
6 cylinders
8 cylinders
n M SD n M SD n M SD F df1 df2 p R2
mpg 11 26.66 4.51 7 19.74 1.45 14 15.10 2.56 39.698 2 29 0 0.732
hp 11 82.64 20.93 7 122.29 24.26 14 209.21 50.98 36.177 2 29 0 0.714
wt 11 2.29 0.57 7 3.12 0.36 14 4.00 0.76 22.911 2 29 0 0.612

At the end we write all the output into a file. Currently, openxlsx package is used to write the tables into xlsx file. But you can define your own writer function based on any other package of your choice and pass it to write.xlsx.output(). Details in ?write.xlsx.output and ?xlsx.writer.openxlsx.

write.xlsx.output('vignette_output.xlsx')
## Loading required package: openxlsx
## Note: zip::zip() is deprecated, please use zip::zipr() instead