Yan Holtz
yan1166@hotmail.com | www.r-graph-gallery.com
"Data visualization refers to the techniques used to communicate data or information by encoding it as visual objects (e.g., points, lines or bars) contained in graphics." Wikipédia
Data visualization is part of the Data Science Process
"R is a free software environment for statistical computing and graphics." The R Project
"R is a free software environment for statistical computing and graphics." The R Project
1+1
## [1] 2
"R is a free software environment for statistical computing and graphics." The R Project
1+1
## [1] 2
plot(1:10, 1:10)
Population, Life expectency and Gross Domestic Product per capita for 142 countries and 12 years.
library(gapminder)
head(gapminder)
data=subset(gapminder, year==2007)
plot(data$lifeExp ~ data$gdpPercap)
Take a few minutes: what would you do to communicate this result? ...
Take a few minutes: what would you do to communicate this result? ...
plot(data$lifeExp ~ data$gdpPercap,
xlab="Gdp per capita", ylab="Life Expectancy",
main="Features of countries in 2007")
plot(data$lifeExp ~ data$gdpPercap,
xlab="Gdp per capita", ylab="Life Expectancy",
main="Features of countries in 2007",
pch=20, cex=3)
plot(data$lifeExp ~ data$gdpPercap,
xlab="Gdp per capita", ylab="Life Expectancy",
main="Features of countries in 2007",
pch=20, cex=3, col="blue")
Color names
plot(lifeExp ~ gdpPercap, data=data,
pch=20, cex=4, col="forestgreen")
Get all the 657 possibilities with
colors()
Color number
plot(lifeExp ~ gdpPercap, data=data,
pch=20, cex=4, col=colors()[18])
RGB
plot(lifeExp ~ gdpPercap, data=data,
pch=20, cex=5, col=rgb(0.2,0.3,0.8,0.4))
R Color Brewer
library(RColorBrewer)
pal <- brewer.pal(5, "Set1")
pal
## [1] "#E41A1C" "#377EB8" "#4DAF4A" "#984EA3" "#FF7F00"
# attribute a color to each continent:
my_colors=pal[as.numeric(data$continent)]
# use this vector as color for the plot
plot(lifeExp ~ gdpPercap, data=data, pch=20, cex=3, col=my_colors)
#plot
my_colors=pal[as.numeric(data$continent)]
plot(lifeExp ~ gdpPercap, data=data, pch=20, cex=3, col=my_colors)
#add legend
legend("bottomright", legend=levels(data$continent), col=pal, pch=20, bty="n", pt.cex=3, horiz = F)
# Map the color:
library(RColorBrewer)
pal <- brewer.pal(5, "Set1")
my_colors=pal[as.numeric(data$continent)]
# Make the plot
par(mar=c(3,3,2,2)) # Margin
plot(data$lifeExp ~ data$gdpPercap,
# titles
xlab="Gross Domestic Product per capita", ylab="Life Expectancy",
main="Features of countries in 2007",
# color
col=my_colors,
# shapes
pch=20, cex=3,
# no box:`
bty="l")
#add legend
legend("bottomright", legend=levels(data$continent), col=pal, pch=20, bty="n", pt.cex=3, horiz = F)
"Help and Inspiration concerning R graphics"
library(ggplot2)
ggplot(data,
aes(gdpPercap, lifeExp, size = pop, color = continent, frame = year)) +
geom_point()
ggplot(gapminder, aes(gdpPercap, lifeExp, size = pop, color = continent, frame = year)) +
geom_point() +
xlim(0, 60000) +
facet_wrap(~year)
ggplot(data, aes(gdpPercap, lifeExp, size = pop, color = continent, frame = year)) +
geom_point() +
xlim(0, 60000) +
facet_wrap(~continent, nrow=3) +
theme(legend.position="none")
ggplot(gapminder, aes(x=continent, y=lifeExp, color=continent, fill=continent)) +
geom_boxplot(alpha=0.3) +
theme(legend.position="none")
ggplot(gapminder, aes(x=continent, y=lifeExp, color=continent, fill=continent)) +
geom_violin(alpha=0.3) +
theme(legend.position="none")
ggplot(gapminder, aes(x=continent, y=lifeExp, color=continent, fill=continent)) +
geom_boxplot(alpha=0.3) +
geom_jitter(color="grey", size=0.8) +
theme(legend.position="none")
library(dplyr)
gapminder %>%
select(continent, year, pop) %>%
group_by(year, continent) %>%
summarize(sum_pop = sum(as.numeric(pop))) %>%
ggplot( aes(fill=continent, y=sum_pop, x=year)) +
geom_bar(stat="identity") +
ylab("Population per continent")
library(dplyr)
gapminder %>%
filter(continent=="Asia") %>% filter(pop > 50000000) %>%
select(country, year, pop) %>%
group_by(year, country) %>%
ggplot( aes(x=year, y=pop, color=country, fill=country)) +
geom_area() +
facet_wrap(~country)+
theme(legend.position="none")
library(plotly)
# Basic ggplot2 chart
p=ggplot(data,
aes(gdpPercap, lifeExp, size = pop, color = continent, text=country)) +
geom_point()
# Made interactive with plotly
library(plotly)
ggplotly(p)
If you know ggplot2, you know how to do interactive charts!
We need reproductibility, And R is the perfect tool for that.
---
title: "Analysing the Gapminder dataset"
author: "Yan Holtz"
date: '`r as.character(format(Sys.Date(), format="%d/%m/%Y"))`'
output:
html_document:
toc: yes
---
---
title: "Analysing the Gapminder dataset"
author: "Yan Holtz"
date: '`r as.character(format(Sys.Date(), format="%d/%m/%Y"))`'
output:
html_document:
toc: yes
---
# 1- Introduction
Hi Robert, here is my reproducible analysis concerning the Gapminder dataset!
---
title: "Analysing the Gapminder dataset"
author: "Yan Holtz"
date: '`r as.character(format(Sys.Date(), format="%d/%m/%Y"))`'
output:
html_document:
toc: yes
---
# 1- Introduction
Hi Robert, here is my reproducible analysis concerning the Gapminder dataset!
# 2- Get data
The data are included in the gapminder library
\```{r}
library(gapminder)
head(gapminder)
\```
Ui.R
ui <- fluidPage(
# Widget to choose year
selectInput(
"year", "Select a year!",
choices=unique(gapminder$year), selected=1952
),
# Interactive plot
plotlyOutput("plot")
)
Server.R
server <- function(input, output) {
output$plot <- renderPlotly({
# Select data
data=subset(gapminder, year==input$year)
# Make the plot
p=ggplot(data,
aes(gdpPercap, lifeExp, size = pop,
color = continent, frame = year)) +
geom_point()
ggplotly(p)
})
}
Take Home message
- Use R !
- The Tidyverse is your friend
- Interactive charts are just here!
- Make your analysis reproducible
- Explore new graphic methods
Yan Holtz
yan1166@hotmail.com
holtzyan.wordpress.com
Slide made with Slidify
And available on github.com/holtzy