Yan Holtz, Zhihong Zhu, Julanne Frater, Perry Bartlett, Jian Yang, John McGrath

This document report the application of LDSC to the vitamin D GWAS result. It aims to determine the heritability of the trait and the potential population stratification.

Installing LDSC

I lost a few hours of my life installing the LDSC software properly. Here are a few hints to do it faster next time. Important:

Install anaconda with python version 2, not 3
When calling LDSC, I must use the python version stored in the lDSC folder, not the default one.

#Delta
cd /shares/compbio/Group-Wray/YanHoltz/SOFT
git clone https://github.com/bulik/ldsc.git

# Then install Anaconda. Warning: must be version 2 of python, not 3.
cd /shares/compbio/Group-Wray/YanHoltz/SOFT
wget https://repo.anaconda.com/archive/Anaconda2-5.1.0-Linux-x86_64.sh
#start installation in /shares/compbio/Group-Wray/YanHoltz/SOFT

# Then follow https://github.com/bulik/ldsc

# To have help:
/shares/compbio/Group-Wray/YanHoltz/SOFT/ldsc/ldsc.py -h

# I also download LD scrore provided by LDSC:
wget https://data.broadinstitute.org/alkesgroup/LDSCORE/eur_w_ld_chr.tar.bz2
tar -xvjf eur_w_ld_chr.tar.bz2

Heritability and LD Score Regression Intercept.

I can use LD score regression to study the the inflation in test statistics due to population structure.

For the Vitamin D GWAS, the intercept of the LD score regression is 1.0202 with a standard error of 0.0068. The ratio is 0.1604 with a standard error of 0.0538.

If I understood well, the intercept tends to increase with the sample size even if no population structure is present. It is thus advised to check the ratio instead. In both case, it looks like a slight population structure is observed in this GWAS, but probably negligible.

The intercept minus 2 standard errors is very close to 1:

0.1604 - 1.96*0.0538 = 1.006872

The SNP heritability as computed using LD score regression is 0.069 with a standard error of 0.017. This value is very close from the one provided by Xia Jiang et al (7.5%)

# Good folder
cd /shares/compbio/Group-Wray/YanHoltz/VITAMIND_XIA_ET_AL/4_LDSC/HERIT

# GWAS result at good format: snpid hg18chr bp a1 a2 or se pval info ngt CEUaf
cat /shares/compbio/Group-Wray/YanHoltz/DATA/GWAS/XiaEtAl_VitaminD/GWAS_vitaminD_XiaEtAL.ma | awk '{print $1, $2, $3, $7, $5}' > input.txt

# Munge. Be careful to call the python version which is IN ldsc.
/shares/compbio/Group-Wray/YanHoltz/SOFT/anaconda3/envs/ldsc/bin/python /shares/compbio/Group-Wray/YanHoltz/SOFT/ldsc/munge_sumstats.py \
--sumstats input.txt \
--N 79000 \
--out tmp \
--merge-alleles /shares/compbio/Group-Wray/YanHoltz/SOFT/ldsc/eur_w_ld_chr/w_hm3.snplist

# Heritability and the LD Score regressiuon intercept
/shares/compbio/Group-Wray/YanHoltz/SOFT/anaconda3/envs/ldsc/bin/python /shares/compbio/Group-Wray/YanHoltz/SOFT/ldsc/ldsc.py \
--h2 tmp.sumstats.gz \
--ref-ld-chr /shares/compbio/Group-Wray/YanHoltz/SOFT/ldsc/eur_w_ld_chr/ \
--w-ld-chr /shares/compbio/Group-Wray/YanHoltz/SOFT/ldsc/eur_w_ld_chr/ \
--out output_h2

# Check result
more output_h2.log

- Discussion

Heritability of Vitamin-D is 7%
No inflation of the LD score criteria -> no population stratification

A work by Yan Holtz

Yan.holtz.data@gmail.com

Using LD Score regression to assess heritability and population stratification

Installing LDSC

Heritability and LD Score Regression Intercept.

- Discussion