Yan Holtz, Zhihong Zhu, Julanne Frater, Perry Bartlett, Jian Yang, John McGrath
The previous part of this project run a GWAS on vitamin D blood concentration and found out 6 loci significantly associated. This part aims to annotate these regions in order to list putative underlying genes.
library(tidyverse)
library(qqman)
library(readr)
library(DT)
library(RColorBrewer)
library(ggrepel)
The Locus Zoom online tool allows to visualise the regions of interest and see the underlying genes. Just put the .linear
result of PLINK on it, and specify the SNP of interest.
Results for the 6 significant associations:
This step is computed using the MAGMA pipeline
that is run through the FUMA online platform. Here is what it does:
Here is the Manhattan plot showing the p-value of each gene. Genome wide significance (red dashed line in the plot) was defined at P = 0.05/17999 = 2.778e-6.
Number of significantly associated genes: 22. A list of these 22 genes can be found using the download button of FUMA and reading genes.txt
. This list of gene is going to be used for several of the following steps.
Note: Connection to the FUMA online platform: go to the website / Account: y.holtz@uq.edu.au / Pass: fumayanholtz.
Note: This step is computed with the SNP2GENE
utility of the FUMA platform.
The previous GWGAS step identified 22 genes potentially involved in Vitamin-D concentration. The Gene2Func
utility of the FUMA platform allows to check where these genes are expressed the most. It uses the GTEX dataset
that provides expression data for all genes in 53 different tissue types:
A few observation:
This part follows the previous one. If I understood well here is how it works:
A few observation:
The previous approach considered only the 22 significant genes detected using the GWGAS method. The MAGMA pipeline offers another approach that use the full distribution of SNP p-values to detect tissues enriched for vitamin D concentration. Here is the result:
Observation:
The MsigDB is a database that lists about 10,000 groups of gene involved in the same pathways. For instance, the pathway called go_favonoid_metabolic_process
is composed by 24 genes and contols the flavonoid metabolic process.
It is possible to test wether our 22 genes are often involved in some of these pathways. In our case, no significant pathway seems to be enriched. Here are 4 pathways that seem to have a little trend, but do not pass multiple testing correction.
The vitamin-D GWAS currently available show 6 independant loci
. These loci are composed by 1875 SNPs
. Each SNP has a position. Using the genome annotation, it is possible to count what are the functional annotations of SNPs that are the most frequent. FUMA allows to load a file called snps.txt
that present these results. Let’s load it and visualize the data.
# Packages
library(tidyverse)
library(hrbrthemes)
library(patchwork)
# read data
data <- read.table("0_DATA/snps.txt", header=T)
# Plot 1: functionnal annotation:
p1 <- data %>%
filter(!is.na(func)) %>%
filter(!func %in% c("upstream:downstream:upstream", "upstream:upstream:downstream" )) %>%
group_by(func) %>%
summarize(n=n()) %>%
arrange(n) %>%
mutate(func=factor(func, func)) %>%
ggplot(aes(x=func, y=n)) +
geom_bar(stat="identity", alpha=0.6) +
coord_flip() +
theme_ipsum() +
xlab("") +
ylab("") +
ggtitle("Functionnal Annotation") +
theme(plot.title = element_text(size=10))
# Plot 2: regulome DB score:
p2 <- data %>%
filter(!is.na(RDB)) %>%
group_by(RDB) %>%
summarize(n=n()) %>%
arrange(desc(RDB)) %>%
mutate(RDB=factor(RDB,RDB)) %>%
ggplot(aes(x=RDB, y=n)) +
geom_bar(stat="identity", alpha=0.6) +
coord_flip() +
theme_ipsum() +
xlab("") +
ylab("") +
ggtitle("RegulomeDB score") +
theme(plot.title = element_text(size=10))
# Plot 3: chromatin state:
p3 <- data %>%
group_by(minChrState) %>%
summarize(n=n()) %>%
arrange(desc(minChrState)) %>%
mutate(minChrState = factor(minChrState, minChrState)) %>%
ggplot(aes(x=as.factor(minChrState), y=n)) +
geom_bar(stat="identity", alpha=0.6) +
coord_flip() +
theme_ipsum() +
xlab("") +
ylab("") +
ggtitle("Chromatine state") +
theme(plot.title = element_text(size=10))
p1 + p2 + p3
Explanation:
note - I’m not sure why this result is interesting. Indeed, we would probably need to compare it with the proportion of each class in the complete genome. For instance, intergenic and intronic annotations are probably the most common ones in the genome, so it is not a surprise if I observe it so often, is it?
note - The tool used to assign each SNP to a class is called ANNOVAR.
The GWAS catalog is a catalog of thounsands of GWAS. FUMA allows to take all our significant SNPs and check if they have already been detected for another study. It appears that:
FUMA is a platform that can be used to annotate, prioritize, visualize and interpret GWAS results. It offers 2 main functions:
Note - Connection to the FUMA online platform: go to the website / Account: y.holtz@uq.edu.au / Pass: fumayanholtz.
Citation -
Input - as input it takes a GWAS result. Transfert the Plink results locally:
cd /Users/y.holtz/Desktop
scp y.holtz@delta.imb.uq.edu.au:/home/y.holtz/BLOOD_GWAS/1_GWAS/PLINK/result_GWAS_bloodpressure_plink.linear.gz .
Another possiblity suggested by John would be to use the IGV software. Download is possible here. It looks like this tool is more adapted for people working at the molecular level? Like using NGS and reads?
A work by Yan Holtz
Yan.holtz.data@gmail.com