Intro
This vignette illustrates the functionality of TwinAnalysis
package for twin analysis (the statistical analysis of the data of twins in a twin study).
This package can:
Build and run the univariate and multivariate (correlated factors) ACE and ADE models.
Report the parameters and statistics from a twin model in the tables that are ready for a paper or report.
Build and run the cross-lag ACE and ADE models.
Compute descriptive statistics and twin correlations that often supplement the results from the twin analysis.
This package cannot:
Do the analysis of categorical outcomes.
Help to choose the model or fix the model that does not fit the data.
Clean or reshape the data. (However, this vignette provides an example of reshaping the data.)
This package builds on the functionality of OpenMx
, a package for structural equation modeling. OpenMx
is widely used for twin analysis due to its matrix algebra tools that allow flexible model building. This vignette only explains the concepts from OpenMx
that are essential for the use of TwinAnalysis
. However, it is advisable to learn the syntax of OpenMx
for the fuller control of the models, particularly for the diagnostic of the (un)fitted models.
TwinAnalysis
uses the mlth.data.frame
package for the output tables. This package can also gather the tables across the script and write them into a single Excel file. The script for the installation of the package is provided below. For the details on mlth.data.frame
refer to the package vignette).
This vignette uses the pipe operator (%>%
) from the dplyr
package. x %>% f
is equivalent to f(x)
. For example, d %>% sapply(mean, na.rm = TRUE)
is equivalent to sapply(d, mean, na.rm)
.
Installing the package
Loading packages and data
## Load the packages
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(mlth.data.frame)
library(OpenMx)
#> To take full advantage of multiple cores, use:
#> mxOption(key='Number of Threads', value=parallel::detectCores()) #now
#> Sys.setenv(OMP_NUM_THREADS=parallel::detectCores()) #before library(OpenMx)
library(TwinAnalysis)
For the purpose of illustration, I will use the twin dataset from OpenMx
. This is an Australian twin data on height (ht1/2
), weight (wt1/2
) and BMI (bmi1/2
). There are two cohorts of twins: younger (zyg %in% 1:5
) and older (zyg %in% 6:10
). Details in ?twinData
.
## Load the data
data(twinData)
head(twinData)
#> fam age zyg part wt1 wt2 ht1 ht2 htwt1 htwt2 bmi1 bmi2
#> 1 1 21 1 2 58 57 1.7000 1.7000 20.0692 19.7232 20.9943 20.8726
#> 2 2 24 1 2 54 53 1.6299 1.6299 20.3244 19.9481 21.0828 20.9519
#> 3 3 21 1 2 55 50 1.6499 1.6799 20.2020 17.7154 21.0405 20.1210
#> 4 4 21 1 2 66 76 1.5698 1.6499 26.7759 27.9155 23.0125 23.3043
#> 5 5 19 1 2 50 48 1.6099 1.6299 19.2894 18.0662 20.7169 20.2583
#> 6 6 26 1 2 60 60 1.5999 1.5698 23.4375 24.3418 22.0804 22.3454
#> cohort zygosity age1 age2
#> 1 younger MZFF 21 21
#> 2 younger MZFF 24 24
#> 3 younger MZFF 21 21
#> 4 younger MZFF 21 21
#> 5 younger MZFF 19 19
#> 6 younger MZFF 26 26
Pre-processing twin data
Pre-processing of the data may include many steps, such as:
- computing test scores,
- excluding non-valid data values,
- checking distributions,
- removing outliers,
- transforming variables,
- performing factor analysis,
which are not covered in this vignette.
Twin data can be stored in a data tables in two formats:
- wide: each row stands for one twin pair,
- long: each row stands for a single twin, the twins are matched by family ID.
Most functions from TwinAnalysis
operate with data in the wide format. It is assumed that the variables that differ between the twins (including phenotype variables) are named as X1
and X2
. X1
is trait X
in the first twin and X2
is trait X
in the second twin. Mind that there is no separator between variable name and the suffix 1/2
.
Below is an example of reshaping wide table twinData
to long table longData
and back.
# From wide to long
longData <- twinData %>%
reshape(direction = 'long',
varying = c('wt1', 'wt1', 'ht1', 'ht2', # The variables that vary
'htwt1', 'htwt2', 'bmi1', 'bmi2', # within a twin pair
'age1', 'age2'),
sep = '') # <- A separator between variable name and suffix
head(longData)
#> fam age zyg part wt2 cohort zygosity time wt ht htwt bmi id
#> 1.1 1 21 1 2 57 younger MZFF 1 58 1.7000 20.0692 20.9943 1
#> 2.1 2 24 1 2 53 younger MZFF 1 54 1.6299 20.3244 21.0828 2
#> 3.1 3 21 1 2 50 younger MZFF 1 55 1.6499 20.2020 21.0405 3
#> 4.1 4 21 1 2 76 younger MZFF 1 66 1.5698 26.7759 23.0125 4
#> 5.1 5 19 1 2 48 younger MZFF 1 50 1.6099 19.2894 20.7169 5
#> 6.1 6 26 1 2 60 younger MZFF 1 60 1.5999 23.4375 22.0804 6
# Back from long to wide
longData %>%
reshape(direction = 'wide',
v.names = c('wt', 'ht', 'htwt', # The variables that vary within a twin pair
'bmi', 'age'),
idvar = 'fam', # Unique identifier of twin pair
timevar = 'time', # Twin number (identifier of twin within pair)
sep = '') %>% # Optional: a separator, default '.'
head
#> fam zyg part wt2 cohort zygosity id wt1 ht1 htwt1 bmi1 age1 ht2
#> 1.1 1 1 2 58 younger MZFF 1 58 1.7000 20.0692 20.9943 21 1.7000
#> 2.1 2 1 2 54 younger MZFF 2 54 1.6299 20.3244 21.0828 24 1.6299
#> 3.1 3 1 2 55 younger MZFF 3 55 1.6499 20.2020 21.0405 21 1.6799
#> 4.1 4 1 2 66 younger MZFF 4 66 1.5698 26.7759 23.0125 21 1.6499
#> 5.1 5 1 2 50 younger MZFF 5 50 1.6099 19.2894 20.7169 19 1.6299
#> 6.1 6 1 2 60 younger MZFF 6 60 1.5999 23.4375 22.0804 26 1.5698
#> htwt2 bmi2 age2
#> 1.1 19.7232 20.8726 21
#> 2.1 19.9481 20.9519 24
#> 3.1 17.7154 20.1210 21
#> 4.1 27.9155 23.3043 21
#> 5.1 18.0662 20.2583 19
#> 6.1 24.3418 22.3454 26
Zygosity is a key variable for twin analysis as it labels MZ and DZ twins. A twin pair with no zygosity is excluded from the twin analysis.
In the illustration data, the zygosity
variable labels both zygosity and sex. I create the new zygosity and sex variables to use in further analysis.
twinData$zyg <- twinData$zygosity %>%
as.character %>%
sapply(switch,
MZFF = 'MZ',
MZMM = 'MZ',
DZFF = 'DZss', # Same-sex DZ pairs
DZMM = 'DZss',
DZOS = 'DZos') %>% # Opposite-sex DZ pairs
factor(levels = c('MZ', 'DZss', 'DZos')) # This is to keep the order of levels
twinData$sex1 <- twinData$zygosity %>%
as.character %>%
sapply(switch,
MZFF = 'Female',
MZMM = 'Male',
DZFF = 'Female',
DZMM = 'Male',
DZOS = 'Female') # First twin in opposite-sex pair is female
twinData$sex2 <- twinData$zygosity %>%
as.character %>%
sapply(switch,
MZFF = 'Female',
MZMM = 'Male',
DZFF = 'Female',
DZMM = 'Male',
DZOS = 'Male') # Second twin in opposite-sex pair is male
# There are many alternative ways to do the same thing
# I also like to check whether the manipulations with the data
# did the right thing, like so:
with(twinData, table(zygosity, zyg))
#> zyg
#> zygosity MZ DZss DZos
#> MZFF 1232 0 0
#> MZMM 567 0 0
#> DZFF 0 751 0
#> DZMM 0 352 0
#> DZOS 0 0 906
with(twinData, table(zygosity, sex1))
#> sex1
#> zygosity Female Male
#> MZFF 1232 0
#> MZMM 0 567
#> DZFF 751 0
#> DZMM 0 352
#> DZOS 906 0
with(twinData, table(zygosity, sex2))
#> sex2
#> zygosity Female Male
#> MZFF 1232 0
#> MZMM 0 567
#> DZFF 751 0
#> DZMM 0 352
#> DZOS 0 906
Finally, for this analysis I will select only the younger cohort of participants.
Twin descriptive statistics
Descriptive statistics are important for twin analysis because they help us to make sure that the data is OK and that the assumptions of twin model hold. If any assumption doesn’t hold, the model can have poor fit meaning that it may not descrive well the reality that we study.
One mathematical assumption requires similar distributions of the phenotypic variables within twin pairs and between zygosity groups. Namely, the means and variances shouldn’t differ between twin 1 and twin 2 and between MZ and DZ twins. Another mathematical assumption constrains the similarities witnin MZ and DZ pairs: \(r_{MZ} > r_{DZ}\) and \(2r_{DZ} < r_{MZ}\) (ACE model) or \(r_{MZ}>2r_{DZ}\) and \(4r_{DZ} > r_{MZ}\) (ADE model). We also may want to compare twin similarity in DZss and DZos pairs if we want to combine them into the single group.
The TwinAnalysis
package provides functionality to compute descriptive statistics (means, SDs, twin correlations).
get_univ_descriptives(twinData2, zyg = 'zyg',
vars = c('ht', 'wt', 'bmi')) ## No suffix
#> $ht
#> Twin1------------------ Twin2------------------ Cor--------------------------
#> n M SD n M SD r lowerCI upperCI
#> MZ 821 1.677719 0.09629763 825 1.676664 0.09555881 0.9420948 0.9337942 0.9493820
#> DZss 545 1.690366 0.09573311 543 1.685135 0.09574242 0.6978255 0.6515079 0.7389602
#> DZos 498 1.637742 0.07010653 497 1.780719 0.06566807 0.4281559 0.3526396 0.4981315
#>
#> $wt
#> Twin1----------------- Twin2----------------- Cor--------------------------
#> n M SD n M SD r lowerCI upperCI
#> MZ 820 60.91707 10.714998 819 60.79243 10.771732 0.8875201 0.8718395 0.9013832
#> DZss 536 62.96828 11.053520 542 62.60332 11.588262 0.5743910 0.5140242 0.6290962
#> DZos 488 57.14549 7.556314 493 71.36511 9.602917 0.2489379 0.1625160 0.3315695
#>
#> $bmi
#> Twin1----------------- Twin2----------------- Cor--------------------------
#> n M SD n M SD r lowerCI upperCI
#> MZ 811 21.43949 0.8357471 812 21.43161 0.8584708 0.7803316 0.7513856 0.8062802
#> DZss 530 21.57202 0.8805291 534 21.56730 0.9143063 0.3315864 0.2521765 0.4065603
#> DZos 482 21.36556 0.8252537 489 21.75473 0.7921427 0.2330841 0.1451316 0.3173818
## We can also compare DZss and DZos correlations to check if they differ
get_fisher_z(twinData2, zyg = 'zyg',
vars = c('ht', 'wt', 'bmi'),
compare = c('DZss', 'DZos'))
#> r_DZss n_DZss r_DZos n_DZos z p
#> ht 0.6978255 533 0.4281559 486 6.444726 1.158091e-10
#> wt 0.5743910 524 0.2489379 474 6.287600 3.224121e-10
#> bmi 0.3315864 512 0.2330841 464 1.666716 9.557098e-02
In this dataset there is noticeable difference between DZ twin 1 and DZ twin 2 in height and weight. This is a natural result of twin 1 being female and twin 2 being male in this group. The twin similarity is lower in DZos pairs comparing to DZss pairs, perhaps as the result of gender differences. Finally, \(r_{MZ} > 2r_{DZ}\) for BMI which means that non-additive genetic effects may contribute to this phenotype and ADE model may be appropriate.
For now, I ignore gender differences and exclude DZos pairs from the analysis by making a new zygosity variable.
The models only accept zygosity as a factor variable with labels 'MZ'
and 'DZ'
.
twinData2$zyg2 <- twinData2$zyg %>%
as.character %>%
sapply(switch,
MZ = 'MZ',
DZss = 'DZ',
DZos = NA) %>%
factor(levels = c('MZ', 'DZ'))
twinData2 <- twinData2[!is.na(twinData2$zyg2), ]
get_univ_descriptives(twinData2, zyg = 'zyg2',
vars = c('ht', 'wt', 'bmi'))
#> $ht
#> Twin1------------------ Twin2------------------ Cor--------------------------
#> n M SD n M SD r lowerCI upperCI
#> MZ 821 1.677719 0.09629763 825 1.676664 0.09555881 0.9420948 0.9337942 0.9493820
#> DZ 545 1.690366 0.09573311 543 1.685135 0.09574242 0.6978255 0.6515079 0.7389602
#>
#> $wt
#> Twin1---------------- Twin2---------------- Cor--------------------------
#> n M SD n M SD r lowerCI upperCI
#> MZ 820 60.91707 10.71500 819 60.79243 10.77173 0.8875201 0.8718395 0.9013832
#> DZ 536 62.96828 11.05352 542 62.60332 11.58826 0.5743910 0.5140242 0.6290962
#>
#> $bmi
#> Twin1----------------- Twin2----------------- Cor--------------------------
#> n M SD n M SD r lowerCI upperCI
#> MZ 811 21.43949 0.8357471 812 21.43161 0.8584708 0.7803316 0.7513856 0.8062802
#> DZ 530 21.57202 0.8805291 534 21.56730 0.9143063 0.3315864 0.2521765 0.4065603
Univariate twin model
A univariate ACE/ADE is the simplest form of twin model.
Let’s fit a univariate ACE model to BMI.
## Define the model
bmi_univ <- univariate_ace(twinData2, zyg = 'zyg2', ph = 'bmi')
## Fit MxModel
bmi_univ <- mxRun(bmi_univ)
#> Running ACE with 4 parameters
## OpenMx summary
summary(bmi_univ)
#> Summary of ACE
#>
#> free parameters:
#> name matrix row col Estimate Std.Error A
#> 1 ACE.mean[1,1] mean 1 1 2.149531e+01 0.02128696
#> 2 ACE.a[1,1] a bmi bmi 7.810045e-01 0.01645491
#> 3 ACE.c[1,1] c bmi bmi -3.778693e-09 0.10683600
#> 4 ACE.e[1,1] e bmi bmi -4.028416e-01 0.01022541
#>
#> Model Statistics:
#> | Parameters | Degrees of Freedom | Fit (-2lnL units)
#> Model: 4 2683 6076.504
#> Saturated: NA NA NA
#> Independence: NA NA NA
#> Number of observations/statistics: 1399/2687
#>
#> Information Criteria:
#> | df Penalty | Parameters Penalty | Sample-Size Adjusted
#> AIC: 710.5043 6084.504 6084.533
#> BIC: -13357.8410 6105.478 6092.772
#> CFI: NA
#> TLI: 1 (also known as NNFI)
#> RMSEA: 0 [95% CI (NA, NA)]
#> Prob(RMSEA <= 0.05): NA
#> To get additional fit indices, see help(mxRefModels)
#> timestamp: 2021-01-12 03:33:29
#> Wall clock time: 0.08024454 secs
#> optimizer: SLSQP
#> OpenMx version number: 2.18.1
#> Need help? See help(mxSummary)
## Output tables
get_output_tables(bmi_univ)
#> $Variance_components
#> A(%) C(%) E(%)
#> bmi 0.7898588 1.848952e-17 0.2101412
#>
#> $Raw_variance
#> Total A C E
#> bmi 0.7722494 0.609968 1.427852e-17 0.1622814
univariate_ace()
returns an MxModel object which means that you can do everything you usually do with OpenMx
models: run parameter fitting (mxRun()
, mxTryHard()
), get model summary (summary()
), add/delete matrices and algebra from a model (mxModel()
), add constraints (mxConstraint()
) etc.
bmi_univ
#> MxModel 'ACE'
#> type : default
#> $matrices : 'mean', 'a', 'c', and 'e'
#> $algebras : 'expMeans', 'A', 'C', 'E', 'expCovMZ', 'expCovDZ', 'V', 'Raw_variance', and 'Variance_components'
#> $constraints : NULL
#> $intervals : NULL
#> $latentVars : none
#> $manifestVars : none
#> $data : NULL
#> $submodels : 'MZ' and 'DZ'
#> $expectation : NULL
#> $fitfunction : MxFitFunctionMultigroup
#> $compute : MxComputeSequence
#> $independent : FALSE
#> $options : 'output_tables' = 'Variance_components' and 'Raw_variance'
#> $output : TRUE
bmi_univ$expCovMZ
#> mxAlgebra 'expCovMZ'
#> $formula: rbind(cbind(A + C + E, A + C), cbind(A + C, A + C + E))
#> $result:
#> [,1] [,2]
#> [1,] 0.7722494 0.6099680
#> [2,] 0.6099680 0.7722494
#> dimnames: NULL
mxEval(expCovDZ, bmi_univ)
#> [,1] [,2]
#> [1,] 0.7722494 0.3049840
#> [2,] 0.3049840 0.7722494
## Note that expected variances in MZ and DZ twins are exactly same,
## this is why the assumptions on means and variances must be met in the data
In addition to the standard model structure, such as the parameter matrices, the algebra for expected covariance matrix, the observed data, unvariate_ace()
(as well as the other functions from TwinAnalysis
that define models) adds the algebra that computes indeces that we actually will use as main results. In particular, univariate_ace()
returns the single otput table with percents of variability accounted for by A, C and E. The output tables are usually stored inside the model as MxAlgebra
.
The function get_output_tables()
returns the output tables from the model. These tables are recorded as an algebraic expression with model parametwes (MxAlgebra
object) inside the model. For example, the Variance_components
table reports the proportion of genetic and environmental contributions computed as genetic or environmental variability divided by the total variability of a trait.
To ensure that the model fits the data well, we compare it against two reference models: saturated model and independence model. In a saturated model, there are as many parameters as there are data points (in terms of variances, covariances and means), therefore this models has the best fit of all. An independence model assumes that all variables are independent. Theoretically, a good model should describe the data as well as its saturated model.
The function ref_models()
runs and appends the reference models to a target model. By default, it uses mxRefModels()
to define reference models and the user can provide custom reference models as well (see ?ref_models
).
The function twin_ref_model()
defines, runs and appends the reference models that follow the assumptions of the twin method (equal means, variances and co-variances across twins within pair and across zygosity groups).
The function fit_stats()
returns a table with fit statistics.
## bmi_univ and bmi_univ2 will be the same model with different reference models
bmi_univ2 <- bmi_univ
bmi_univ <- twin_ref_models(bmi_univ, run = TRUE)
#> Running Saturated with 4 parameters
#> Running Independence with 2 parameters
bmi_univ2 <- ref_models(bmi_univ2)
#> Running Saturated ACE with 10 parameters
#> Running Independence ACE with 8 parameters
fit_stats(bmi_univ)
#> model base ep minus2LL df AIC BIC CFI TLI RMSEA
#> 1 ACE Saturated 4 6076.504 2683 710.5043 -13357.84 0.9936757 1 0
#> diffLL diffdf p
#> 1 5.056397 0 1
fit_stats(bmi_univ2)
#> model base ep minus2LL df AIC BIC CFI TLI
#> 1 ACE Saturated 4 6076.504 2683 710.5043 -13357.84 0.9810852 0.9936951
#> RMSEA diffLL diffdf p
#> 1 0.04231163 21.02756 6 0.001813812
We can check that the assumptions hold by comparing two saturated models. In this case, the constrained saturated model explains the data significantly worse that the unconstrained saturated model meaning that some assumptions might be violated.
mxCompare(get_ref_models(bmi_univ2)$Saturated,
get_ref_models(bmi_univ)$Saturated)
#> base comparison ep minus2LL df AIC diffLL diffdf p
#> 1 Saturated ACE <NA> 10 6055.477 2677 701.4768 NA NA NA
#> 2 Saturated ACE Saturated 4 6071.448 2683 705.4479 15.97117 6 0.01390957
When there are several phenotypic variables to fit a univariate twin model upon, it is convenient to iterate over the variables.
univ_models <- list()
for (v in c('ht', 'wt', 'bmi')) {
model <- univariate_ace(twinData2, zyg = 'zyg2', ph = v)
model <- mxRun(model, intervals = TRUE) ## This time I evaluate CIs
model <- ref_models(model)
univ_models[[v]] <- model
}
#> Running ACE with 4 parameters
#> Running Saturated ACE with 10 parameters
#> Running Independence ACE with 8 parameters
#> Running ACE with 4 parameters
#> Running Saturated ACE with 10 parameters
#> Running Independence ACE with 8 parameters
#> Running ACE with 4 parameters
#> Running Saturated ACE with 10 parameters
#> Running Independence ACE with 8 parameters
## Estimates of genetic and environmental contributions
univ_models %>%
lapply(get_output_tables) %>%
lapply(`[[`, 'Variance_components') %>% ## same as lapply(function(x) x[['Variance_components']])
do.call(rbind, .)
#> A(%) C(%) E(%)
#> ht 0.4907220 4.508715e-01 0.05840653
#> wt 0.6682634 2.247723e-01 0.10696432
#> bmi 0.7898588 1.848952e-17 0.21014116
univ_models %>%
lapply(fit_stats) %>%
do.call(rbind, .)
#> model base ep minus2LL df AIC BIC CFI
#> ht ACE Saturated 4 -7178.660 2730 -12638.6598 -26953.4502 0.9990337
#> wt ACE Saturated 4 19288.575 2713 13862.5752 -363.0755 0.9923133
#> bmi ACE Saturated 4 6076.504 2683 710.5043 -13357.8410 0.9810852
#> TLI RMSEA diffLL diffdf p
#> ht 0.9996779 0.01561183 8.045864 6 0.234762431
#> wt 0.9974378 0.03640788 17.126530 6 0.008829306
#> bmi 0.9936951 0.04231163 21.027563 6 0.001813812
Bivariate twin model
A bivariate twin model is applied when the research goal is to find out whether the measured correlation between two traits is accounted for by genetic or environmental variability shared by these two traits. This kind of analysis relies on within-twin and cross-twin within-trait and cross-trait covariances. For example, ht1 <-> ht2
is a within-trait cross-twin correlation, ht1 <-> wt1
is a cross-trait within-twin correlation, ht1 <-> wt2
is a cross-trait cross-twin correlation. All these data will be used to find genetic and environmental contributions to height and veight and to estimate the overlap between the corresponding genetic and environmental variability.
twin_cors <- get_cross_trait_cors(twinData2, zyg = 'zyg2',
vars = c('ht', 'wt'))
twin_cors
#> zyg: MZ
#> ht1 wt1 ht2 wt2
#> ht1 1.0000000 0.7058832 0.9420948 0.6612499
#> wt1 0.7058832 1.0000000 0.6895533 0.8875201
#> ht2 0.9420948 0.6895533 1.0000000 0.6856569
#> wt2 0.6612499 0.8875201 0.6856569 1.0000000
#> ------------------------------------------------------------
#> zyg: DZ
#> ht1 wt1 ht2 wt2
#> ht1 1.0000000 0.6543578 0.6978255 0.5389969
#> wt1 0.6543578 1.0000000 0.5053343 0.5743910
#> ht2 0.6978255 0.5053343 1.0000000 0.6598756
#> wt2 0.5389969 0.5743910 0.6598756 1.0000000
## Within-twin variances-covariances
## Each table is diagonally symmetrical
twin_cors %>%
lapply(`[`, c('ht1', 'wt1'), c('ht1', 'wt1'))
#> $MZ
#> ht1 wt1
#> ht1 1.0000000 0.7058832
#> wt1 0.7058832 1.0000000
#>
#> $DZ
#> ht1 wt1
#> ht1 1.0000000 0.6543578
#> wt1 0.6543578 1.0000000
twin_cors %>%
lapply(`[`, c('ht2', 'wt2'), c('ht2', 'wt2'))
#> $MZ
#> ht2 wt2
#> ht2 1.0000000 0.6856569
#> wt2 0.6856569 1.0000000
#>
#> $DZ
#> ht2 wt2
#> ht2 1.0000000 0.6598756
#> wt2 0.6598756 1.0000000
## Cross-twin within-trait covariances
## Second set of tables is transposed first set of tables
twin_cors %>%
lapply(`[`, c('ht1', 'wt1'), c('ht2', 'wt2'))
#> $MZ
#> ht2 wt2
#> ht1 0.9420948 0.6612499
#> wt1 0.6895533 0.8875201
#>
#> $DZ
#> ht2 wt2
#> ht1 0.6978255 0.5389969
#> wt1 0.5053343 0.5743910
twin_cors %>%
lapply(`[`, c('ht2', 'wt2'), c('ht1', 'wt1'))
#> $MZ
#> ht1 wt1
#> ht2 0.9420948 0.6895533
#> wt2 0.6612499 0.8875201
#>
#> $DZ
#> ht1 wt1
#> ht2 0.6978255 0.5053343
#> wt2 0.5389969 0.5743910
When we run a bivariate ACE on the illustration data, the optimizer fails to find optimal parameter values (status code RED). We can check model identification as OpenMx suggests.
biv_model <- multivariate_ace(twinData2, zyg = 'zyg2',
ph = c('ht', 'wt'))
biv_model <- mxRun(biv_model)
#> Running ACE with 11 parameters
mxCheckIdentification(biv_model)
#> Model is locally identified
#> $status
#> [1] TRUE
#>
#> $jacobian
#> ACE.mean[1,1] ACE.mean[1,2] ACE.a[1,1] ACE.a[2,1] ACE.a[2,2]
#> MZ.cov1_1 0 0 0.11369848 -0.06865706 0.00000000
#> MZ.cov2_1 0 0 0.00000000 -8.37604304 -0.03432853
#> MZ.cov3_1 0 0 0.11369848 -0.06865706 0.00000000
#> MZ.cov4_1 0 0 0.00000000 -8.37604304 -0.03432853
#> MZ.cov2_2 0 0 0.00000000 0.00000000 -16.75208607
#> MZ.cov3_2 0 0 0.00000000 -8.37604304 -0.03432853
#> MZ.cov4_2 0 0 0.00000000 0.00000000 -16.75208607
#> MZ.cov3_3 0 0 0.11369848 -0.06865706 0.00000000
#> MZ.cov4_3 0 0 0.00000000 -8.37604304 -0.03432853
#> MZ.cov4_4 0 0 0.00000000 0.00000000 -16.75208607
#> MZ.mean1 1 0 0.00000000 0.00000000 0.00000000
#> MZ.mean2 0 1 0.00000000 0.00000000 0.00000000
#> MZ.mean3 1 0 0.00000000 0.00000000 0.00000000
#> MZ.mean4 0 1 0.00000000 0.00000000 0.00000000
#> DZ.cov1_1 0 0 0.11369848 -0.06865706 0.00000000
#> DZ.cov2_1 0 0 0.00000000 -8.37604304 -0.03432853
#> DZ.cov3_1 0 0 0.05684924 -0.03432853 0.00000000
#> DZ.cov4_1 0 0 0.00000000 -4.18802152 -0.01716426
#> DZ.cov2_2 0 0 0.00000000 0.00000000 -16.75208607
#> DZ.cov3_2 0 0 0.00000000 -4.18802152 -0.01716426
#> DZ.cov4_2 0 0 0.00000000 0.00000000 -8.37604304
#> DZ.cov3_3 0 0 0.11369848 -0.06865706 0.00000000
#> DZ.cov4_3 0 0 0.00000000 -8.37604304 -0.03432853
#> DZ.cov4_4 0 0 0.00000000 0.00000000 -16.75208607
#> DZ.mean1 1 0 0.00000000 0.00000000 0.00000000
#> DZ.mean2 0 1 0.00000000 0.00000000 0.00000000
#> DZ.mean3 1 0 0.00000000 0.00000000 0.00000000
#> DZ.mean4 0 1 0.00000000 0.00000000 0.00000000
#> ACE.c[1,1] ACE.c[2,1] ACE.c[2,2] ACE.e[1,1] ACE.e[2,1]
#> MZ.cov1_1 7.284176e-08 -0.129831 0.00000000 0.04534429 -0.009811196
#> MZ.cov2_1 0.000000e+00 -6.276453 -0.06491548 0.00000000 -3.660707725
#> MZ.cov3_1 7.284176e-08 -0.129831 0.00000000 0.00000000 0.000000000
#> MZ.cov4_1 0.000000e+00 -6.276453 -0.06491548 0.00000000 0.000000000
#> MZ.cov2_2 0.000000e+00 0.000000 -12.55290567 0.00000000 0.000000000
#> MZ.cov3_2 0.000000e+00 -6.276453 -0.06491548 0.00000000 0.000000000
#> MZ.cov4_2 0.000000e+00 0.000000 -12.55290567 0.00000000 0.000000000
#> MZ.cov3_3 7.284176e-08 -0.129831 0.00000000 0.04534429 -0.009811196
#> MZ.cov4_3 0.000000e+00 -6.276453 -0.06491548 0.00000000 -3.660707725
#> MZ.cov4_4 0.000000e+00 0.000000 -12.55290567 0.00000000 0.000000000
#> MZ.mean1 0.000000e+00 0.000000 0.00000000 0.00000000 0.000000000
#> MZ.mean2 0.000000e+00 0.000000 0.00000000 0.00000000 0.000000000
#> MZ.mean3 0.000000e+00 0.000000 0.00000000 0.00000000 0.000000000
#> MZ.mean4 0.000000e+00 0.000000 0.00000000 0.00000000 0.000000000
#> DZ.cov1_1 7.284176e-08 -0.129831 0.00000000 0.04534429 -0.009811196
#> DZ.cov2_1 0.000000e+00 -6.276453 -0.06491548 0.00000000 -3.660707725
#> DZ.cov3_1 7.284176e-08 -0.129831 0.00000000 0.00000000 0.000000000
#> DZ.cov4_1 0.000000e+00 -6.276453 -0.06491548 0.00000000 0.000000000
#> DZ.cov2_2 0.000000e+00 0.000000 -12.55290567 0.00000000 0.000000000
#> DZ.cov3_2 0.000000e+00 -6.276453 -0.06491548 0.00000000 0.000000000
#> DZ.cov4_2 0.000000e+00 0.000000 -12.55290567 0.00000000 0.000000000
#> DZ.cov3_3 7.284176e-08 -0.129831 0.00000000 0.04534429 -0.009811196
#> DZ.cov4_3 0.000000e+00 -6.276453 -0.06491548 0.00000000 -3.660707725
#> DZ.cov4_4 0.000000e+00 0.000000 -12.55290567 0.00000000 0.000000000
#> DZ.mean1 0.000000e+00 0.000000 0.00000000 0.00000000 0.000000000
#> DZ.mean2 0.000000e+00 0.000000 0.00000000 0.00000000 0.000000000
#> DZ.mean3 0.000000e+00 0.000000 0.00000000 0.00000000 0.000000000
#> DZ.mean4 0.000000e+00 0.000000 0.00000000 0.00000000 0.000000000
#> ACE.e[2,2]
#> MZ.cov1_1 0.000000000
#> MZ.cov2_1 -0.004905598
#> MZ.cov3_1 0.000000000
#> MZ.cov4_1 0.000000000
#> MZ.cov2_2 -7.321415450
#> MZ.cov3_2 0.000000000
#> MZ.cov4_2 0.000000000
#> MZ.cov3_3 0.000000000
#> MZ.cov4_3 -0.004905598
#> MZ.cov4_4 -7.321415450
#> MZ.mean1 0.000000000
#> MZ.mean2 0.000000000
#> MZ.mean3 0.000000000
#> MZ.mean4 0.000000000
#> DZ.cov1_1 0.000000000
#> DZ.cov2_1 -0.004905598
#> DZ.cov3_1 0.000000000
#> DZ.cov4_1 0.000000000
#> DZ.cov2_2 -7.321415450
#> DZ.cov3_2 0.000000000
#> DZ.cov4_2 0.000000000
#> DZ.cov3_3 0.000000000
#> DZ.cov4_3 -0.004905598
#> DZ.cov4_4 -7.321415450
#> DZ.mean1 0.000000000
#> DZ.mean2 0.000000000
#> DZ.mean3 0.000000000
#> DZ.mean4 0.000000000
#>
#> $non_identified_parameters
#> [1] "None"
get_output_tables(biv_model)
#> $Variance_components
#> A(%) C(%) E(%)
#> ht 0.4813462 0.4599256 0.05872822
#> wt 0.5706103 0.3203984 0.10899132
#>
#> $Covariation_components
#> Total A C E A(%)
#> ht <-> ht 9.162395e-03 0.004410284 0.004214019 5.380911e-04 0.4813462
#> ht <-> wt 7.129341e-01 0.287537235 0.407438942 1.795796e-02 0.4033153
#> wt <-> wt 1.229527e+02 70.158096949 39.393860199 1.340078e+01 0.5706103
#> C(%) E(%)
#> ht <-> ht 0.4599256 0.05872822
#> ht <-> wt 0.5714959 0.02518881
#> wt <-> wt 0.3203984 0.10899132
#>
#> $All_correlations
#> r_A r_C r_E Total
#> ht <-> ht 1.0000000 1 1.0000000 1.0000000
#> ht <-> wt 0.5169183 1 0.2114774 0.6717007
#> wt <-> wt 1.0000000 1 1.0000000 1.0000000
It says that there are problems with the non-shared environmental parameters (ACE.e[1,1]
, ACE.e[2,1]
, ACE.e[2,2]
). The matrix e
is actually a Cholesky decomposition of environmental variability: E = e %*% t(e)
. I will eliminate non-shared environmental correlation between height and weight by constraining e[2, 1] == 0
.
biv_model <- multivariate_ace(twinData2, zyg = 'zyg2',
ph = c('ht', 'wt'))
biv_model$e$values[2, 1] <- 0 ## This is not the only way to fix a parameter
biv_model$e$free[2, 1] <- FALSE
biv_model <- mxRun(biv_model)
#> Running ACE with 10 parameters
#> Warning: In model 'ACE' Optimizer returned a non-zero status code 6. The model
#> does not satisfy the first-order optimality conditions to the required accuracy,
#> and no improved point for the merit function could be found during the final
#> linesearch (Mx status RED)
## It works fine now
get_output_tables(biv_model)
#> $Variance_components
#> A(%) C(%) E(%)
#> ht 0.2979551 0.7018932 0.0001517254
#> wt 0.1160863 0.1335608 0.7503528378
#>
#> $Covariation_components
#> Total A C E A(%) C(%)
#> ht <-> ht 10.58548 3.153999 7.429879 1.606087e-03 0.2979551 0.7018932
#> ht <-> wt 28.99156 4.864991 24.126566 0.000000e+00 0.1678072 0.8321928
#> wt <-> wt 820.33246 95.229393 109.564278 6.155388e+02 0.1160863 0.1335608
#> E(%)
#> ht <-> ht 0.0001517254
#> ht <-> wt 0.0000000000
#> wt <-> wt 0.7503528378
#>
#> $All_correlations
#> r_A r_C r_E Total
#> ht <-> ht 1.0000000 1.0000000 1 1.0000000
#> ht <-> wt 0.2807151 0.8456101 0 0.3111152
#> wt <-> wt 1.0000000 1.0000000 1 1.0000000
biv_model <- ref_models(biv_model)
#> Running Saturated ACE with 28 parameters
#> Running Independence ACE with 16 parameters
fit_stats(biv_model)
#> Warning in computeFitStatistics(likelihood, DoF, chi, chiDoF,
#> retval[["numObs"]], : Your model may be mis-specified (and fit worse than an
#> independence model), or you may be using the wrong independence model, see ?
#> mxRefModels
#> model base ep minus2LL df AIC BIC CFI TLI
#> 1 ACE Saturated 10 28525.79 5441 17643.79 -10886.16 -2.748076 -1.498717
#> RMSEA diffLL diffdf p
#> 1 0.8347794 17566.24 18 0
Now it runs fine. The non-shared environmental correlation was sussessfully eliminated. The chi-square test shows significant difference between the model and the corresponding saturated model but CFI, TLI and RMSEA suggest good fit.
The output tables of a bivariate/multivatiate ACE model:
Variance_compinents
: percent of total variance of each variable accounted for by genetic, shared environmental and non-shared environmental factors.All_correlations
: genetic and environmental correlations (rA
,rC
,rE
), total (phenotypic) correlation (Total
), percent of correlation accounted for by genetic, shared environmental and non-shared environmental factors (A(%)
,C(%)
,E(%)
).A_correlations
,C_correlations
,E_correlations
,Total_correlations
: genetic, environmental and total correlation matrices.
Notes:
- The models with categorical outcomes: different fit function and fit statistics may be needed.
- In multivariate models we can achieve genetic and environmental correlations with opposite sign. This means that one factor will contribute to phenotypic relationship positively and another negatively. In this case we have to treat percents with caution. I use absolute value of correlations to compute percents.
- If the model has not been fit,
get_output_tables()
will throw an error. - Also, check the package
umx
that provides a lot of functionality for SEM and twin analysis.