Intro

This vignette illustrates the functionality of TwinAnalysis package for twin analysis (the statistical analysis of the data of twins in a twin study).

This package can:

Build and run the univariate and multivariate (correlated factors) ACE and ADE models.
Report the parameters and statistics from a twin model in the tables that are ready for a paper or report.
Build and run the cross-lag ACE and ADE models.
Compute descriptive statistics and twin correlations that often supplement the results from the twin analysis.

This package cannot:

Do the analysis of categorical outcomes.
Help to choose the model or fix the model that does not fit the data.
Clean or reshape the data. (However, this vignette provides an example of reshaping the data.)

This package builds on the functionality of OpenMx, a package for structural equation modeling. OpenMx is widely used for twin analysis due to its matrix algebra tools that allow flexible model building. This vignette only explains the concepts from OpenMx that are essential for the use of TwinAnalysis. However, it is advisable to learn the syntax of OpenMx for the fuller control of the models, particularly for the diagnostic of the (un)fitted models.

TwinAnalysis uses the mlth.data.frame package for the output tables. This package can also gather the tables across the script and write them into a single Excel file. The script for the installation of the package is provided below. For the details on mlth.data.frame refer to the package vignette).

This vignette uses the pipe operator (%>%) from the dplyr package. x %>% f is equivalent to f(x). For example, d %>% sapply(mean, na.rm = TRUE) is equivalent to sapply(d, mean, na.rm).

Installing the package

# install.packages('devtools')
library(devtools)
install_github('ivanvoronin/mlth.data.frame')
install_github('ivanvoronin/TwinAnalysis')

Loading packages and data

## Load the packages

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(mlth.data.frame)
library(OpenMx)
#> To take full advantage of multiple cores, use:
#>   mxOption(key='Number of Threads', value=parallel::detectCores()) #now
#>   Sys.setenv(OMP_NUM_THREADS=parallel::detectCores()) #before library(OpenMx)
library(TwinAnalysis)

For the purpose of illustration, I will use the twin dataset from OpenMx. This is an Australian twin data on height (ht1/2), weight (wt1/2) and BMI (bmi1/2). There are two cohorts of twins: younger (zyg %in% 1:5) and older (zyg %in% 6:10). Details in ?twinData.

## Load the data

data(twinData)

head(twinData)
#>   fam age zyg part wt1 wt2    ht1    ht2   htwt1   htwt2    bmi1    bmi2
#> 1   1  21   1    2  58  57 1.7000 1.7000 20.0692 19.7232 20.9943 20.8726
#> 2   2  24   1    2  54  53 1.6299 1.6299 20.3244 19.9481 21.0828 20.9519
#> 3   3  21   1    2  55  50 1.6499 1.6799 20.2020 17.7154 21.0405 20.1210
#> 4   4  21   1    2  66  76 1.5698 1.6499 26.7759 27.9155 23.0125 23.3043
#> 5   5  19   1    2  50  48 1.6099 1.6299 19.2894 18.0662 20.7169 20.2583
#> 6   6  26   1    2  60  60 1.5999 1.5698 23.4375 24.3418 22.0804 22.3454
#>    cohort zygosity age1 age2
#> 1 younger     MZFF   21   21
#> 2 younger     MZFF   24   24
#> 3 younger     MZFF   21   21
#> 4 younger     MZFF   21   21
#> 5 younger     MZFF   19   19
#> 6 younger     MZFF   26   26

Pre-processing twin data

Pre-processing of the data may include many steps, such as:

computing test scores,
excluding non-valid data values,
checking distributions,
removing outliers,
transforming variables,
performing factor analysis,

which are not covered in this vignette.

Twin data can be stored in a data tables in two formats:

wide: each row stands for one twin pair,
long: each row stands for a single twin, the twins are matched by family ID.

Most functions from TwinAnalysis operate with data in the wide format. It is assumed that the variables that differ between the twins (including phenotype variables) are named as X1 and X2. X1 is trait X in the first twin and X2 is trait X in the second twin. Mind that there is no separator between variable name and the suffix 1/2.

Below is an example of reshaping wide table twinData to long table longData and back.

# From wide to long
longData <- twinData %>%
  reshape(direction = 'long',
          varying = c('wt1', 'wt1', 'ht1', 'ht2',       # The variables that vary 
                      'htwt1', 'htwt2', 'bmi1', 'bmi2', # within a twin pair
                      'age1', 'age2'), 
          sep = '') #                 <- A separator between variable name and suffix

head(longData)
#>     fam age zyg part wt2  cohort zygosity time wt     ht    htwt     bmi id
#> 1.1   1  21   1    2  57 younger     MZFF    1 58 1.7000 20.0692 20.9943  1
#> 2.1   2  24   1    2  53 younger     MZFF    1 54 1.6299 20.3244 21.0828  2
#> 3.1   3  21   1    2  50 younger     MZFF    1 55 1.6499 20.2020 21.0405  3
#> 4.1   4  21   1    2  76 younger     MZFF    1 66 1.5698 26.7759 23.0125  4
#> 5.1   5  19   1    2  48 younger     MZFF    1 50 1.6099 19.2894 20.7169  5
#> 6.1   6  26   1    2  60 younger     MZFF    1 60 1.5999 23.4375 22.0804  6

# Back from long to wide
longData %>%
  reshape(direction = 'wide',
          v.names = c('wt', 'ht', 'htwt', # The variables that vary within a twin pair 
                      'bmi', 'age'),      
          idvar = 'fam',             # Unique identifier of twin pair
          timevar = 'time',          # Twin number (identifier of twin within pair)
          sep = '') %>%              # Optional: a separator, default '.'
  head
#>     fam zyg part wt2  cohort zygosity id wt1    ht1   htwt1    bmi1 age1    ht2
#> 1.1   1   1    2  58 younger     MZFF  1  58 1.7000 20.0692 20.9943   21 1.7000
#> 2.1   2   1    2  54 younger     MZFF  2  54 1.6299 20.3244 21.0828   24 1.6299
#> 3.1   3   1    2  55 younger     MZFF  3  55 1.6499 20.2020 21.0405   21 1.6799
#> 4.1   4   1    2  66 younger     MZFF  4  66 1.5698 26.7759 23.0125   21 1.6499
#> 5.1   5   1    2  50 younger     MZFF  5  50 1.6099 19.2894 20.7169   19 1.6299
#> 6.1   6   1    2  60 younger     MZFF  6  60 1.5999 23.4375 22.0804   26 1.5698
#>       htwt2    bmi2 age2
#> 1.1 19.7232 20.8726   21
#> 2.1 19.9481 20.9519   24
#> 3.1 17.7154 20.1210   21
#> 4.1 27.9155 23.3043   21
#> 5.1 18.0662 20.2583   19
#> 6.1 24.3418 22.3454   26

Zygosity is a key variable for twin analysis as it labels MZ and DZ twins. A twin pair with no zygosity is excluded from the twin analysis.

In the illustration data, the zygosity variable labels both zygosity and sex. I create the new zygosity and sex variables to use in further analysis.

twinData$zyg <- twinData$zygosity %>%
  as.character %>%
  sapply(switch,
         MZFF = 'MZ',
         MZMM = 'MZ',
         DZFF = 'DZss',     # Same-sex DZ pairs
         DZMM = 'DZss',
         DZOS = 'DZos') %>% # Opposite-sex DZ pairs
  factor(levels = c('MZ', 'DZss', 'DZos')) # This is to keep the order of levels

twinData$sex1 <- twinData$zygosity %>%
  as.character %>%
  sapply(switch,
         MZFF = 'Female',
         MZMM = 'Male',
         DZFF = 'Female',     
         DZMM = 'Male',
         DZOS = 'Female')    # First twin in opposite-sex pair is female

twinData$sex2 <- twinData$zygosity %>%
  as.character %>%
  sapply(switch,
         MZFF = 'Female',
         MZMM = 'Male',
         DZFF = 'Female',     
         DZMM = 'Male',
         DZOS = 'Male')    # Second twin in opposite-sex pair is male

# There are many alternative ways to do the same thing
# I also like to check whether the manipulations with the data
# did the right thing, like so:
with(twinData, table(zygosity, zyg))
#>         zyg
#> zygosity   MZ DZss DZos
#>     MZFF 1232    0    0
#>     MZMM  567    0    0
#>     DZFF    0  751    0
#>     DZMM    0  352    0
#>     DZOS    0    0  906
with(twinData, table(zygosity, sex1))
#>         sex1
#> zygosity Female Male
#>     MZFF   1232    0
#>     MZMM      0  567
#>     DZFF    751    0
#>     DZMM      0  352
#>     DZOS    906    0
with(twinData, table(zygosity, sex2))
#>         sex2
#> zygosity Female Male
#>     MZFF   1232    0
#>     MZMM      0  567
#>     DZFF    751    0
#>     DZMM      0  352
#>     DZOS      0  906

Finally, for this analysis I will select only the younger cohort of participants.

## It is usually a good idea to make a copy of an original
## data frame and make changes there
twinData2 <- twinData[twinData$cohort == 'younger', ]

Twin descriptive statistics

Descriptive statistics are important for twin analysis because they help us to make sure that the data is OK and that the assumptions of twin model hold. If any assumption doesn’t hold, the model can have poor fit meaning that it may not descrive well the reality that we study.

One mathematical assumption requires similar distributions of the phenotypic variables within twin pairs and between zygosity groups. Namely, the means and variances shouldn’t differ between twin 1 and twin 2 and between MZ and DZ twins. Another mathematical assumption constrains the similarities witnin MZ and DZ pairs: \(r_{MZ} > r_{DZ}\) and \(2r_{DZ} < r_{MZ}\) (ACE model) or \(r_{MZ}>2r_{DZ}\) and \(4r_{DZ} > r_{MZ}\) (ADE model). We also may want to compare twin similarity in DZss and DZos pairs if we want to combine them into the single group.

The TwinAnalysis package provides functionality to compute descriptive statistics (means, SDs, twin correlations).

get_univ_descriptives(twinData2, zyg = 'zyg',      
                      vars = c('ht', 'wt', 'bmi')) ## No suffix
#> $ht
#>      Twin1------------------ Twin2------------------ Cor--------------------------
#>      n   M        SD         n   M        SD         r         lowerCI   upperCI  
#> MZ   821 1.677719 0.09629763 825 1.676664 0.09555881 0.9420948 0.9337942 0.9493820
#> DZss 545 1.690366 0.09573311 543 1.685135 0.09574242 0.6978255 0.6515079 0.7389602
#> DZos 498 1.637742 0.07010653 497 1.780719 0.06566807 0.4281559 0.3526396 0.4981315
#> 
#> $wt
#>      Twin1----------------- Twin2----------------- Cor--------------------------
#>      n   M        SD        n   M        SD        r         lowerCI   upperCI  
#> MZ   820 60.91707 10.714998 819 60.79243 10.771732 0.8875201 0.8718395 0.9013832
#> DZss 536 62.96828 11.053520 542 62.60332 11.588262 0.5743910 0.5140242 0.6290962
#> DZos 488 57.14549  7.556314 493 71.36511  9.602917 0.2489379 0.1625160 0.3315695
#> 
#> $bmi
#>      Twin1----------------- Twin2----------------- Cor--------------------------
#>      n   M        SD        n   M        SD        r         lowerCI   upperCI  
#> MZ   811 21.43949 0.8357471 812 21.43161 0.8584708 0.7803316 0.7513856 0.8062802
#> DZss 530 21.57202 0.8805291 534 21.56730 0.9143063 0.3315864 0.2521765 0.4065603
#> DZos 482 21.36556 0.8252537 489 21.75473 0.7921427 0.2330841 0.1451316 0.3173818

## We can also compare DZss and DZos correlations to check if they differ
get_fisher_z(twinData2, zyg = 'zyg', 
             vars = c('ht', 'wt', 'bmi'), 
             compare = c('DZss', 'DZos'))
#>        r_DZss n_DZss    r_DZos n_DZos        z            p
#> ht  0.6978255    533 0.4281559    486 6.444726 1.158091e-10
#> wt  0.5743910    524 0.2489379    474 6.287600 3.224121e-10
#> bmi 0.3315864    512 0.2330841    464 1.666716 9.557098e-02

In this dataset there is noticeable difference between DZ twin 1 and DZ twin 2 in height and weight. This is a natural result of twin 1 being female and twin 2 being male in this group. The twin similarity is lower in DZos pairs comparing to DZss pairs, perhaps as the result of gender differences. Finally, \(r_{MZ} > 2r_{DZ}\) for BMI which means that non-additive genetic effects may contribute to this phenotype and ADE model may be appropriate.

For now, I ignore gender differences and exclude DZos pairs from the analysis by making a new zygosity variable.

The models only accept zygosity as a factor variable with labels 'MZ' and 'DZ'.

twinData2$zyg2 <- twinData2$zyg %>%
  as.character %>%
  sapply(switch, 
         MZ = 'MZ', 
         DZss = 'DZ', 
         DZos = NA) %>%
  factor(levels = c('MZ', 'DZ'))

twinData2 <- twinData2[!is.na(twinData2$zyg2), ]

get_univ_descriptives(twinData2, zyg = 'zyg2',
                      vars = c('ht', 'wt', 'bmi'))
#> $ht
#>    Twin1------------------ Twin2------------------ Cor--------------------------
#>    n   M        SD         n   M        SD         r         lowerCI   upperCI  
#> MZ 821 1.677719 0.09629763 825 1.676664 0.09555881 0.9420948 0.9337942 0.9493820
#> DZ 545 1.690366 0.09573311 543 1.685135 0.09574242 0.6978255 0.6515079 0.7389602
#> 
#> $wt
#>    Twin1---------------- Twin2---------------- Cor--------------------------
#>    n   M        SD       n   M        SD       r         lowerCI   upperCI  
#> MZ 820 60.91707 10.71500 819 60.79243 10.77173 0.8875201 0.8718395 0.9013832
#> DZ 536 62.96828 11.05352 542 62.60332 11.58826 0.5743910 0.5140242 0.6290962
#> 
#> $bmi
#>    Twin1----------------- Twin2----------------- Cor--------------------------
#>    n   M        SD        n   M        SD        r         lowerCI   upperCI  
#> MZ 811 21.43949 0.8357471 812 21.43161 0.8584708 0.7803316 0.7513856 0.8062802
#> DZ 530 21.57202 0.8805291 534 21.56730 0.9143063 0.3315864 0.2521765 0.4065603

Univariate twin model

A univariate ACE/ADE is the simplest form of twin model.

Let’s fit a univariate ACE model to BMI.

## Define the model
bmi_univ <- univariate_ace(twinData2, zyg = 'zyg2', ph = 'bmi')

## Fit MxModel
bmi_univ <- mxRun(bmi_univ)
#> Running ACE with 4 parameters

## OpenMx summary
summary(bmi_univ)
#> Summary of ACE 
#>  
#> free parameters:
#>            name matrix row col      Estimate  Std.Error A
#> 1 ACE.mean[1,1]   mean   1   1  2.149531e+01 0.02128696  
#> 2    ACE.a[1,1]      a bmi bmi  7.810045e-01 0.01645491  
#> 3    ACE.c[1,1]      c bmi bmi -3.778693e-09 0.10683600  
#> 4    ACE.e[1,1]      e bmi bmi -4.028416e-01 0.01022541  
#> 
#> Model Statistics: 
#>                |  Parameters  |  Degrees of Freedom  |  Fit (-2lnL units)
#>        Model:              4                   2683              6076.504
#>    Saturated:             NA                     NA                    NA
#> Independence:             NA                     NA                    NA
#> Number of observations/statistics: 1399/2687
#> 
#> Information Criteria: 
#>       |  df Penalty  |  Parameters Penalty  |  Sample-Size Adjusted
#> AIC:       710.5043               6084.504                 6084.533
#> BIC:    -13357.8410               6105.478                 6092.772
#> CFI: NA 
#> TLI: 1   (also known as NNFI) 
#> RMSEA:  0  [95% CI (NA, NA)]
#> Prob(RMSEA <= 0.05): NA
#> To get additional fit indices, see help(mxRefModels)
#> timestamp: 2021-01-12 03:33:29 
#> Wall clock time: 0.08024454 secs 
#> optimizer:  SLSQP 
#> OpenMx version number: 2.18.1 
#> Need help?  See help(mxSummary)

## Output tables
get_output_tables(bmi_univ)
#> $Variance_components
#>          A(%)         C(%)      E(%)
#> bmi 0.7898588 1.848952e-17 0.2101412
#> 
#> $Raw_variance
#>         Total        A            C         E
#> bmi 0.7722494 0.609968 1.427852e-17 0.1622814

univariate_ace() returns an MxModel object which means that you can do everything you usually do with OpenMx models: run parameter fitting (mxRun(), mxTryHard()), get model summary (summary()), add/delete matrices and algebra from a model (mxModel()), add constraints (mxConstraint()) etc.

bmi_univ
#> MxModel 'ACE' 
#> type : default 
#> $matrices : 'mean', 'a', 'c', and 'e' 
#> $algebras : 'expMeans', 'A', 'C', 'E', 'expCovMZ', 'expCovDZ', 'V', 'Raw_variance', and 'Variance_components' 
#> $constraints : NULL 
#> $intervals : NULL 
#> $latentVars : none
#> $manifestVars : none
#> $data : NULL
#> $submodels : 'MZ' and 'DZ' 
#> $expectation : NULL 
#> $fitfunction : MxFitFunctionMultigroup 
#> $compute : MxComputeSequence 
#> $independent : FALSE 
#> $options :  'output_tables' = 'Variance_components' and 'Raw_variance' 
#> $output : TRUE

bmi_univ$expCovMZ
#> mxAlgebra 'expCovMZ' 
#> $formula:  rbind(cbind(A + C + E, A + C), cbind(A + C, A + C + E)) 
#> $result:
#>           [,1]      [,2]
#> [1,] 0.7722494 0.6099680
#> [2,] 0.6099680 0.7722494
#> dimnames: NULL

mxEval(expCovDZ, bmi_univ)
#>           [,1]      [,2]
#> [1,] 0.7722494 0.3049840
#> [2,] 0.3049840 0.7722494

## Note that expected variances in MZ and DZ twins are exactly same, 
## this is why the assumptions on means and variances must be met in the data

In addition to the standard model structure, such as the parameter matrices, the algebra for expected covariance matrix, the observed data, unvariate_ace() (as well as the other functions from TwinAnalysis that define models) adds the algebra that computes indeces that we actually will use as main results. In particular, univariate_ace() returns the single otput table with percents of variability accounted for by A, C and E. The output tables are usually stored inside the model as MxAlgebra.

The function get_output_tables() returns the output tables from the model. These tables are recorded as an algebraic expression with model parametwes (MxAlgebra object) inside the model. For example, the Variance_components table reports the proportion of genetic and environmental contributions computed as genetic or environmental variability divided by the total variability of a trait.

To ensure that the model fits the data well, we compare it against two reference models: saturated model and independence model. In a saturated model, there are as many parameters as there are data points (in terms of variances, covariances and means), therefore this models has the best fit of all. An independence model assumes that all variables are independent. Theoretically, a good model should describe the data as well as its saturated model.

The function ref_models() runs and appends the reference models to a target model. By default, it uses mxRefModels() to define reference models and the user can provide custom reference models as well (see ?ref_models).

The function twin_ref_model() defines, runs and appends the reference models that follow the assumptions of the twin method (equal means, variances and co-variances across twins within pair and across zygosity groups).

The function fit_stats() returns a table with fit statistics.

## bmi_univ and bmi_univ2 will be the same model with different reference models
bmi_univ2 <- bmi_univ 

bmi_univ <- twin_ref_models(bmi_univ, run = TRUE)
#> Running Saturated with 4 parameters
#> Running Independence with 2 parameters
bmi_univ2 <- ref_models(bmi_univ2)
#> Running Saturated ACE with 10 parameters
#> Running Independence ACE with 8 parameters

fit_stats(bmi_univ)
#>   model      base ep minus2LL   df      AIC       BIC       CFI TLI RMSEA
#> 1   ACE Saturated  4 6076.504 2683 710.5043 -13357.84 0.9936757   1     0
#>     diffLL diffdf p
#> 1 5.056397      0 1
fit_stats(bmi_univ2)
#>   model      base ep minus2LL   df      AIC       BIC       CFI       TLI
#> 1   ACE Saturated  4 6076.504 2683 710.5043 -13357.84 0.9810852 0.9936951
#>        RMSEA   diffLL diffdf           p
#> 1 0.04231163 21.02756      6 0.001813812

We can check that the assumptions hold by comparing two saturated models. In this case, the constrained saturated model explains the data significantly worse that the unconstrained saturated model meaning that some assumptions might be violated.

mxCompare(get_ref_models(bmi_univ2)$Saturated,
          get_ref_models(bmi_univ)$Saturated)
#>            base comparison ep minus2LL   df      AIC   diffLL diffdf          p
#> 1 Saturated ACE       <NA> 10 6055.477 2677 701.4768       NA     NA         NA
#> 2 Saturated ACE  Saturated  4 6071.448 2683 705.4479 15.97117      6 0.01390957

When there are several phenotypic variables to fit a univariate twin model upon, it is convenient to iterate over the variables.

univ_models <- list()

for (v in c('ht', 'wt', 'bmi')) {
  model <- univariate_ace(twinData2, zyg = 'zyg2', ph = v)
  model <- mxRun(model, intervals = TRUE)   ## This time I evaluate CIs
  model <- ref_models(model)

  univ_models[[v]] <- model
}
#> Running ACE with 4 parameters
#> Running Saturated ACE with 10 parameters
#> Running Independence ACE with 8 parameters
#> Running ACE with 4 parameters
#> Running Saturated ACE with 10 parameters
#> Running Independence ACE with 8 parameters
#> Running ACE with 4 parameters
#> Running Saturated ACE with 10 parameters
#> Running Independence ACE with 8 parameters


## Estimates of genetic and environmental contributions
univ_models %>%
  lapply(get_output_tables) %>%
  lapply(`[[`, 'Variance_components') %>% ## same as lapply(function(x) x[['Variance_components']])
  do.call(rbind, .)
#>          A(%)         C(%)       E(%)
#> ht  0.4907220 4.508715e-01 0.05840653
#> wt  0.6682634 2.247723e-01 0.10696432
#> bmi 0.7898588 1.848952e-17 0.21014116

univ_models %>%
  lapply(fit_stats) %>%
  do.call(rbind, .)
#>     model      base ep  minus2LL   df         AIC         BIC       CFI
#> ht    ACE Saturated  4 -7178.660 2730 -12638.6598 -26953.4502 0.9990337
#> wt    ACE Saturated  4 19288.575 2713  13862.5752   -363.0755 0.9923133
#> bmi   ACE Saturated  4  6076.504 2683    710.5043 -13357.8410 0.9810852
#>           TLI      RMSEA    diffLL diffdf           p
#> ht  0.9996779 0.01561183  8.045864      6 0.234762431
#> wt  0.9974378 0.03640788 17.126530      6 0.008829306
#> bmi 0.9936951 0.04231163 21.027563      6 0.001813812

Bivariate twin model

A bivariate twin model is applied when the research goal is to find out whether the measured correlation between two traits is accounted for by genetic or environmental variability shared by these two traits. This kind of analysis relies on within-twin and cross-twin within-trait and cross-trait covariances. For example, ht1 <-> ht2 is a within-trait cross-twin correlation, ht1 <-> wt1 is a cross-trait within-twin correlation, ht1 <-> wt2 is a cross-trait cross-twin correlation. All these data will be used to find genetic and environmental contributions to height and veight and to estimate the overlap between the corresponding genetic and environmental variability.

twin_cors <- get_cross_trait_cors(twinData2, zyg = 'zyg2',
                                  vars = c('ht', 'wt'))
twin_cors
#> zyg: MZ
#>           ht1       wt1       ht2       wt2
#> ht1 1.0000000 0.7058832 0.9420948 0.6612499
#> wt1 0.7058832 1.0000000 0.6895533 0.8875201
#> ht2 0.9420948 0.6895533 1.0000000 0.6856569
#> wt2 0.6612499 0.8875201 0.6856569 1.0000000
#> ------------------------------------------------------------ 
#> zyg: DZ
#>           ht1       wt1       ht2       wt2
#> ht1 1.0000000 0.6543578 0.6978255 0.5389969
#> wt1 0.6543578 1.0000000 0.5053343 0.5743910
#> ht2 0.6978255 0.5053343 1.0000000 0.6598756
#> wt2 0.5389969 0.5743910 0.6598756 1.0000000

## Within-twin variances-covariances
## Each table is diagonally symmetrical
twin_cors %>% 
  lapply(`[`, c('ht1', 'wt1'), c('ht1', 'wt1'))
#> $MZ
#>           ht1       wt1
#> ht1 1.0000000 0.7058832
#> wt1 0.7058832 1.0000000
#> 
#> $DZ
#>           ht1       wt1
#> ht1 1.0000000 0.6543578
#> wt1 0.6543578 1.0000000
twin_cors %>% 
  lapply(`[`, c('ht2', 'wt2'), c('ht2', 'wt2'))
#> $MZ
#>           ht2       wt2
#> ht2 1.0000000 0.6856569
#> wt2 0.6856569 1.0000000
#> 
#> $DZ
#>           ht2       wt2
#> ht2 1.0000000 0.6598756
#> wt2 0.6598756 1.0000000

## Cross-twin within-trait covariances
## Second set of tables is transposed first set of tables
twin_cors %>% 
  lapply(`[`, c('ht1', 'wt1'), c('ht2', 'wt2'))
#> $MZ
#>           ht2       wt2
#> ht1 0.9420948 0.6612499
#> wt1 0.6895533 0.8875201
#> 
#> $DZ
#>           ht2       wt2
#> ht1 0.6978255 0.5389969
#> wt1 0.5053343 0.5743910
twin_cors %>% 
  lapply(`[`, c('ht2', 'wt2'), c('ht1', 'wt1'))
#> $MZ
#>           ht1       wt1
#> ht2 0.9420948 0.6895533
#> wt2 0.6612499 0.8875201
#> 
#> $DZ
#>           ht1       wt1
#> ht2 0.6978255 0.5053343
#> wt2 0.5389969 0.5743910

When we run a bivariate ACE on the illustration data, the optimizer fails to find optimal parameter values (status code RED). We can check model identification as OpenMx suggests.

biv_model <- multivariate_ace(twinData2, zyg = 'zyg2',
                              ph = c('ht', 'wt'))
biv_model <- mxRun(biv_model)
#> Running ACE with 11 parameters

mxCheckIdentification(biv_model)
#> Model is locally identified
#> $status
#> [1] TRUE
#> 
#> $jacobian
#>           ACE.mean[1,1] ACE.mean[1,2] ACE.a[1,1]  ACE.a[2,1]   ACE.a[2,2]
#> MZ.cov1_1             0             0 0.11369848 -0.06865706   0.00000000
#> MZ.cov2_1             0             0 0.00000000 -8.37604304  -0.03432853
#> MZ.cov3_1             0             0 0.11369848 -0.06865706   0.00000000
#> MZ.cov4_1             0             0 0.00000000 -8.37604304  -0.03432853
#> MZ.cov2_2             0             0 0.00000000  0.00000000 -16.75208607
#> MZ.cov3_2             0             0 0.00000000 -8.37604304  -0.03432853
#> MZ.cov4_2             0             0 0.00000000  0.00000000 -16.75208607
#> MZ.cov3_3             0             0 0.11369848 -0.06865706   0.00000000
#> MZ.cov4_3             0             0 0.00000000 -8.37604304  -0.03432853
#> MZ.cov4_4             0             0 0.00000000  0.00000000 -16.75208607
#> MZ.mean1              1             0 0.00000000  0.00000000   0.00000000
#> MZ.mean2              0             1 0.00000000  0.00000000   0.00000000
#> MZ.mean3              1             0 0.00000000  0.00000000   0.00000000
#> MZ.mean4              0             1 0.00000000  0.00000000   0.00000000
#> DZ.cov1_1             0             0 0.11369848 -0.06865706   0.00000000
#> DZ.cov2_1             0             0 0.00000000 -8.37604304  -0.03432853
#> DZ.cov3_1             0             0 0.05684924 -0.03432853   0.00000000
#> DZ.cov4_1             0             0 0.00000000 -4.18802152  -0.01716426
#> DZ.cov2_2             0             0 0.00000000  0.00000000 -16.75208607
#> DZ.cov3_2             0             0 0.00000000 -4.18802152  -0.01716426
#> DZ.cov4_2             0             0 0.00000000  0.00000000  -8.37604304
#> DZ.cov3_3             0             0 0.11369848 -0.06865706   0.00000000
#> DZ.cov4_3             0             0 0.00000000 -8.37604304  -0.03432853
#> DZ.cov4_4             0             0 0.00000000  0.00000000 -16.75208607
#> DZ.mean1              1             0 0.00000000  0.00000000   0.00000000
#> DZ.mean2              0             1 0.00000000  0.00000000   0.00000000
#> DZ.mean3              1             0 0.00000000  0.00000000   0.00000000
#> DZ.mean4              0             1 0.00000000  0.00000000   0.00000000
#>             ACE.c[1,1] ACE.c[2,1]   ACE.c[2,2] ACE.e[1,1]   ACE.e[2,1]
#> MZ.cov1_1 7.284176e-08  -0.129831   0.00000000 0.04534429 -0.009811196
#> MZ.cov2_1 0.000000e+00  -6.276453  -0.06491548 0.00000000 -3.660707725
#> MZ.cov3_1 7.284176e-08  -0.129831   0.00000000 0.00000000  0.000000000
#> MZ.cov4_1 0.000000e+00  -6.276453  -0.06491548 0.00000000  0.000000000
#> MZ.cov2_2 0.000000e+00   0.000000 -12.55290567 0.00000000  0.000000000
#> MZ.cov3_2 0.000000e+00  -6.276453  -0.06491548 0.00000000  0.000000000
#> MZ.cov4_2 0.000000e+00   0.000000 -12.55290567 0.00000000  0.000000000
#> MZ.cov3_3 7.284176e-08  -0.129831   0.00000000 0.04534429 -0.009811196
#> MZ.cov4_3 0.000000e+00  -6.276453  -0.06491548 0.00000000 -3.660707725
#> MZ.cov4_4 0.000000e+00   0.000000 -12.55290567 0.00000000  0.000000000
#> MZ.mean1  0.000000e+00   0.000000   0.00000000 0.00000000  0.000000000
#> MZ.mean2  0.000000e+00   0.000000   0.00000000 0.00000000  0.000000000
#> MZ.mean3  0.000000e+00   0.000000   0.00000000 0.00000000  0.000000000
#> MZ.mean4  0.000000e+00   0.000000   0.00000000 0.00000000  0.000000000
#> DZ.cov1_1 7.284176e-08  -0.129831   0.00000000 0.04534429 -0.009811196
#> DZ.cov2_1 0.000000e+00  -6.276453  -0.06491548 0.00000000 -3.660707725
#> DZ.cov3_1 7.284176e-08  -0.129831   0.00000000 0.00000000  0.000000000
#> DZ.cov4_1 0.000000e+00  -6.276453  -0.06491548 0.00000000  0.000000000
#> DZ.cov2_2 0.000000e+00   0.000000 -12.55290567 0.00000000  0.000000000
#> DZ.cov3_2 0.000000e+00  -6.276453  -0.06491548 0.00000000  0.000000000
#> DZ.cov4_2 0.000000e+00   0.000000 -12.55290567 0.00000000  0.000000000
#> DZ.cov3_3 7.284176e-08  -0.129831   0.00000000 0.04534429 -0.009811196
#> DZ.cov4_3 0.000000e+00  -6.276453  -0.06491548 0.00000000 -3.660707725
#> DZ.cov4_4 0.000000e+00   0.000000 -12.55290567 0.00000000  0.000000000
#> DZ.mean1  0.000000e+00   0.000000   0.00000000 0.00000000  0.000000000
#> DZ.mean2  0.000000e+00   0.000000   0.00000000 0.00000000  0.000000000
#> DZ.mean3  0.000000e+00   0.000000   0.00000000 0.00000000  0.000000000
#> DZ.mean4  0.000000e+00   0.000000   0.00000000 0.00000000  0.000000000
#>             ACE.e[2,2]
#> MZ.cov1_1  0.000000000
#> MZ.cov2_1 -0.004905598
#> MZ.cov3_1  0.000000000
#> MZ.cov4_1  0.000000000
#> MZ.cov2_2 -7.321415450
#> MZ.cov3_2  0.000000000
#> MZ.cov4_2  0.000000000
#> MZ.cov3_3  0.000000000
#> MZ.cov4_3 -0.004905598
#> MZ.cov4_4 -7.321415450
#> MZ.mean1   0.000000000
#> MZ.mean2   0.000000000
#> MZ.mean3   0.000000000
#> MZ.mean4   0.000000000
#> DZ.cov1_1  0.000000000
#> DZ.cov2_1 -0.004905598
#> DZ.cov3_1  0.000000000
#> DZ.cov4_1  0.000000000
#> DZ.cov2_2 -7.321415450
#> DZ.cov3_2  0.000000000
#> DZ.cov4_2  0.000000000
#> DZ.cov3_3  0.000000000
#> DZ.cov4_3 -0.004905598
#> DZ.cov4_4 -7.321415450
#> DZ.mean1   0.000000000
#> DZ.mean2   0.000000000
#> DZ.mean3   0.000000000
#> DZ.mean4   0.000000000
#> 
#> $non_identified_parameters
#> [1] "None"

get_output_tables(biv_model)
#> $Variance_components
#>         A(%)      C(%)       E(%)
#> ht 0.4813462 0.4599256 0.05872822
#> wt 0.5706103 0.3203984 0.10899132
#> 
#> $Covariation_components
#>                  Total            A            C            E      A(%)
#> ht <-> ht 9.162395e-03  0.004410284  0.004214019 5.380911e-04 0.4813462
#> ht <-> wt 7.129341e-01  0.287537235  0.407438942 1.795796e-02 0.4033153
#> wt <-> wt 1.229527e+02 70.158096949 39.393860199 1.340078e+01 0.5706103
#>                C(%)       E(%)
#> ht <-> ht 0.4599256 0.05872822
#> ht <-> wt 0.5714959 0.02518881
#> wt <-> wt 0.3203984 0.10899132
#> 
#> $All_correlations
#>                 r_A r_C       r_E     Total
#> ht <-> ht 1.0000000   1 1.0000000 1.0000000
#> ht <-> wt 0.5169183   1 0.2114774 0.6717007
#> wt <-> wt 1.0000000   1 1.0000000 1.0000000

It says that there are problems with the non-shared environmental parameters (ACE.e[1,1], ACE.e[2,1], ACE.e[2,2]). The matrix e is actually a Cholesky decomposition of environmental variability: E = e %*% t(e). I will eliminate non-shared environmental correlation between height and weight by constraining e[2, 1] == 0.

biv_model <- multivariate_ace(twinData2, zyg = 'zyg2',
                              ph = c('ht', 'wt'))
biv_model$e$values[2, 1] <- 0     ## This is not the only way to fix a parameter
biv_model$e$free[2, 1] <- FALSE

biv_model <- mxRun(biv_model)
#> Running ACE with 10 parameters
#> Warning: In model 'ACE' Optimizer returned a non-zero status code 6. The model
#> does not satisfy the first-order optimality conditions to the required accuracy,
#> and no improved point for the merit function could be found during the final
#> linesearch (Mx status RED)

## It works fine now

get_output_tables(biv_model)
#> $Variance_components
#>         A(%)      C(%)         E(%)
#> ht 0.2979551 0.7018932 0.0001517254
#> wt 0.1160863 0.1335608 0.7503528378
#> 
#> $Covariation_components
#>               Total         A          C            E      A(%)      C(%)
#> ht <-> ht  10.58548  3.153999   7.429879 1.606087e-03 0.2979551 0.7018932
#> ht <-> wt  28.99156  4.864991  24.126566 0.000000e+00 0.1678072 0.8321928
#> wt <-> wt 820.33246 95.229393 109.564278 6.155388e+02 0.1160863 0.1335608
#>                   E(%)
#> ht <-> ht 0.0001517254
#> ht <-> wt 0.0000000000
#> wt <-> wt 0.7503528378
#> 
#> $All_correlations
#>                 r_A       r_C r_E     Total
#> ht <-> ht 1.0000000 1.0000000   1 1.0000000
#> ht <-> wt 0.2807151 0.8456101   0 0.3111152
#> wt <-> wt 1.0000000 1.0000000   1 1.0000000

biv_model <- ref_models(biv_model)
#> Running Saturated ACE with 28 parameters
#> Running Independence ACE with 16 parameters
fit_stats(biv_model)
#> Warning in computeFitStatistics(likelihood, DoF, chi, chiDoF,
#> retval[["numObs"]], : Your model may be mis-specified (and fit worse than an
#> independence model), or you may be using the wrong independence model, see ?
#> mxRefModels
#>   model      base ep minus2LL   df      AIC       BIC       CFI       TLI
#> 1   ACE Saturated 10 28525.79 5441 17643.79 -10886.16 -2.748076 -1.498717
#>       RMSEA   diffLL diffdf p
#> 1 0.8347794 17566.24     18 0

Now it runs fine. The non-shared environmental correlation was sussessfully eliminated. The chi-square test shows significant difference between the model and the corresponding saturated model but CFI, TLI and RMSEA suggest good fit.

The output tables of a bivariate/multivatiate ACE model:

Variance_compinents: percent of total variance of each variable accounted for by genetic, shared environmental and non-shared environmental factors.
All_correlations: genetic and environmental correlations (rA, rC, rE), total (phenotypic) correlation (Total), percent of correlation accounted for by genetic, shared environmental and non-shared environmental factors (A(%), C(%), E(%)).
A_correlations, C_correlations, E_correlations, Total_correlations: genetic, environmental and total correlation matrices.

Notes:

The models with categorical outcomes: different fit function and fit statistics may be needed.
In multivariate models we can achieve genetic and environmental correlations with opposite sign. This means that one factor will contribute to phenotypic relationship positively and another negatively. In this case we have to treat percents with caution. I use absolute value of correlations to compute percents.
If the model has not been fit, get_output_tables() will throw an error.
Also, check the package umx that provides a lot of functionality for SEM and twin analysis.

Twin Analysis vignette

Ivan Voronin

2021-01-12