Yan Holtz, Zhihong Zhu, Julanne Frater, Perry Bartlett, Jian Yang, John McGrath
This document report the application of LDSC to the vitamin D GWAS result. It aims to determine the heritability of the trait and the potential population stratification.
I lost a few hours of my life installing the LDSC software properly. Here are a few hints to do it faster next time. Important:
#Delta
cd /shares/compbio/Group-Wray/YanHoltz/SOFT
git clone https://github.com/bulik/ldsc.git
# Then install Anaconda. Warning: must be version 2 of python, not 3.
cd /shares/compbio/Group-Wray/YanHoltz/SOFT
wget https://repo.anaconda.com/archive/Anaconda2-5.1.0-Linux-x86_64.sh
#start installation in /shares/compbio/Group-Wray/YanHoltz/SOFT
# Then follow https://github.com/bulik/ldsc
# To have help:
/shares/compbio/Group-Wray/YanHoltz/SOFT/ldsc/ldsc.py -h
# I also download LD scrore provided by LDSC:
wget https://data.broadinstitute.org/alkesgroup/LDSCORE/eur_w_ld_chr.tar.bz2
tar -xvjf eur_w_ld_chr.tar.bz2
I can use LD score regression
to study the the inflation in test statistics due to population structure.
For the Vitamin D GWAS, the intercept of the LD score regression is 1.0202 with a standard error of 0.0068. The ratio is 0.1604 with a standard error of 0.0538.
If I understood well, the intercept tends to increase with the sample size even if no population structure is present. It is thus advised to check the ratio instead. In both case, it looks like a slight population structure is observed in this GWAS, but probably negligible.
The intercept minus 2 standard errors is very close to 1:
0.1604 - 1.96*0.0538 = 1.006872
The SNP heritability as computed using LD score regression is 0.069 with a standard error of 0.017. This value is very close from the one provided by Xia Jiang et al (7.5%)
# Good folder
cd /shares/compbio/Group-Wray/YanHoltz/VITAMIND_XIA_ET_AL/4_LDSC/HERIT
# GWAS result at good format: snpid hg18chr bp a1 a2 or se pval info ngt CEUaf
cat /shares/compbio/Group-Wray/YanHoltz/DATA/GWAS/XiaEtAl_VitaminD/GWAS_vitaminD_XiaEtAL.ma | awk '{print $1, $2, $3, $7, $5}' > input.txt
# Munge. Be careful to call the python version which is IN ldsc.
/shares/compbio/Group-Wray/YanHoltz/SOFT/anaconda3/envs/ldsc/bin/python /shares/compbio/Group-Wray/YanHoltz/SOFT/ldsc/munge_sumstats.py \
--sumstats input.txt \
--N 79000 \
--out tmp \
--merge-alleles /shares/compbio/Group-Wray/YanHoltz/SOFT/ldsc/eur_w_ld_chr/w_hm3.snplist
# Heritability and the LD Score regressiuon intercept
/shares/compbio/Group-Wray/YanHoltz/SOFT/anaconda3/envs/ldsc/bin/python /shares/compbio/Group-Wray/YanHoltz/SOFT/ldsc/ldsc.py \
--h2 tmp.sumstats.gz \
--ref-ld-chr /shares/compbio/Group-Wray/YanHoltz/SOFT/ldsc/eur_w_ld_chr/ \
--w-ld-chr /shares/compbio/Group-Wray/YanHoltz/SOFT/ldsc/eur_w_ld_chr/ \
--out output_h2
# Check result
more output_h2.log
A work by Yan Holtz
Yan.holtz.data@gmail.com