Visualizing data - with R

IFREMER, Sète, April 2017

Yan Holtz
yan1166@hotmail.com | www.r-graph-gallery.com

Daily Meal

The Menu

Daily Meal

The Menu

What is DataViz?

Tools available for Datavisualization

Dataviz tools

About R


"R is a free software environment for statistical computing and graphics." The R Project

R Logo

About R


"R is a free software environment for statistical computing and graphics." The R Project

R Logo

1+1
## [1] 2

About R


"R is a free software environment for statistical computing and graphics." The R Project

R Logo

1+1
## [1] 2
plot(1:10, 1:10)

plot of chunk unnamed-chunk-4

Why R?


  • Ecosystem - Pipeline: import / clean / transform / analyse / calculate / modelize / visualize / report
  • Free and Open Source
  • Reproducibility:
    • Share the data analysis process, not just the final product
    • Validation of results by others
    • Re-run analysis when data changes
    • share, edit, remix...
  • > 10K Librairies
  • Very active community
  • Strong graphic capabilities

R is groooooowing

Case study

Imagine: you received an interesting dataset. You have to analyse it and make a report to your colleagues.



Communicate result

The GapMinder Dataset

Population, Life expectency and Gross Domestic Product per capita for 142 countries and 12 years.

library(gapminder)
head(gapminder)

The Gapminder Library

Let's start simple

data=subset(gapminder, year==2007)
plot(data$lifeExp ~ data$gdpPercap)

plot of chunk unnamed-chunk-8

Let's improve it

Take a few minutes: what would you do to communicate this result? ...

Let's improve it

Take a few minutes: what would you do to communicate this result? ...

  • Title
  • Axis names
  • Shape
  • Color
  • Legend
  • Add information
  • Interactivity
  • Animation
  • Reproductibility
  • Share it

Add title and axis names

plot(data$lifeExp ~ data$gdpPercap,
     xlab="Gdp per capita", ylab="Life Expectancy",
     main="Features of countries in 2007")

plot of chunk unnamed-chunk-9

Change shapes

plot(data$lifeExp ~ data$gdpPercap,
     xlab="Gdp per capita", ylab="Life Expectancy",
     main="Features of countries in 2007",
     pch=20, cex=3)

plot of chunk unnamed-chunk-10

Add colors

plot(data$lifeExp ~ data$gdpPercap,
     xlab="Gdp per capita", ylab="Life Expectancy",
     main="Features of countries in 2007",
     pch=20, cex=3, col="blue")

plot of chunk unnamed-chunk-11

About colors in R


  • Color names

  • Color number
  • RGB
  • R Color Brewer
plot(lifeExp ~ gdpPercap, data=data, 
     pch=20, cex=4, col="forestgreen")

plot of chunk unnamed-chunk-13 Get all the 657 possibilities with

colors()

About colors in R


  • Color names
  • Color number

  • RGB
  • R Color Brewer
plot(lifeExp ~ gdpPercap, data=data, 
     pch=20, cex=4, col=colors()[18])

plot of chunk unnamed-chunk-15

About colors in R


  • Color names
  • Color number
  • RGB

  • R Color Brewer
plot(lifeExp ~ gdpPercap, data=data, 
     pch=20, cex=5, col=rgb(0.2,0.3,0.8,0.4))

plot of chunk unnamed-chunk-16

About colors in R


  • Color names
  • Color number
  • RGB
  • R Color Brewer

library(RColorBrewer)
pal <- brewer.pal(5, "Set1")
pal
## [1] "#E41A1C" "#377EB8" "#4DAF4A" "#984EA3" "#FF7F00"

RColorBrewer Palettes

Map a color to a variable

# attribute a color to each continent:
my_colors=pal[as.numeric(data$continent)]

# use this vector as color for the plot
plot(lifeExp ~ gdpPercap, data=data, pch=20, cex=3, col=my_colors)

plot of chunk unnamed-chunk-18

Add a legend

#plot
my_colors=pal[as.numeric(data$continent)]
plot(lifeExp ~ gdpPercap, data=data, pch=20, cex=3, col=my_colors)

#add legend
legend("bottomright", legend=levels(data$continent), col=pal, pch=20, bty="n", pt.cex=3, horiz = F)

plot of chunk unnamed-chunk-19

Finally

# Map the color:
library(RColorBrewer)
pal <- brewer.pal(5, "Set1")
my_colors=pal[as.numeric(data$continent)]

# Make the plot 
par(mar=c(3,3,2,2)) # Margin
plot(data$lifeExp ~ data$gdpPercap,

     # titles
     xlab="Gross Domestic Product per capita", ylab="Life Expectancy",
     main="Features of countries in 2007",

     # color
     col=my_colors,

     # shapes
     pch=20, cex=3,

     # no box:`
     bty="l")

#add legend
legend("bottomright", legend=levels(data$continent), col=pal, pch=20, bty="n", pt.cex=3, horiz = F)

Getting crazy??

Getting crazy?


plot of chunk unnamed-chunk-21

Individual pages

The R graph gallery

Portfolio pages

The Menu

The Gallery needs you!


plot of chunk unnamed-chunk-22

The Magic of GGplot2

The Magic of GGplot2

library(ggplot2)
ggplot(data, 
  aes(gdpPercap, lifeExp, size = pop, color = continent, frame = year)) +
  geom_point()

plot of chunk unnamed-chunk-23

About GGplot2

Learning GGplot2

About the tidyverse

"The tidyverse: a collection of R packages that share common philosophies and are designed to work together"

the Tidyverse

Tidy example

Faceting

ggplot(gapminder, aes(gdpPercap, lifeExp, size = pop, color = continent, frame = year)) +
  geom_point() +
  xlim(0, 60000) +
  facet_wrap(~year)

plot of chunk unnamed-chunk-24

Faceting (again)

ggplot(data, aes(gdpPercap, lifeExp, size = pop, color = continent, frame = year)) +
  geom_point() +
  xlim(0, 60000) +
  facet_wrap(~continent, nrow=3) + 
  theme(legend.position="none")

plot of chunk unnamed-chunk-25

Boxplot

ggplot(gapminder, aes(x=continent, y=lifeExp, color=continent, fill=continent)) + 
  geom_boxplot(alpha=0.3)  + 
  theme(legend.position="none")

plot of chunk unnamed-chunk-26

Warning: always check distribution

ggplot(gapminder, aes(x=continent, y=lifeExp, color=continent, fill=continent)) + 
  geom_violin(alpha=0.3)  + 
  theme(legend.position="none")

plot of chunk unnamed-chunk-27

Warning: always check distribution

ggplot(gapminder, aes(x=continent, y=lifeExp, color=continent, fill=continent)) + 
  geom_boxplot(alpha=0.3) + 
  geom_jitter(color="grey", size=0.8) + 
  theme(legend.position="none")

plot of chunk unnamed-chunk-28

With Data preparation

library(dplyr)
gapminder  %>% 

    select(continent, year, pop) %>% 
    group_by(year, continent) %>% 
    summarize(sum_pop = sum(as.numeric(pop))) %>% 

    ggplot( aes(fill=continent, y=sum_pop, x=year)) + 
        geom_bar(stat="identity") + 
        ylab("Population per continent")

plot of chunk unnamed-chunk-29

With Data preparation

library(dplyr)
gapminder  %>% 
    filter(continent=="Asia") %>% filter(pop > 50000000) %>%
    select(country, year, pop) %>% 
    group_by(year, country) %>% 
    ggplot( aes(x=year, y=pop, color=country, fill=country)) + 
        geom_area() +
        facet_wrap(~country)+
        theme(legend.position="none")

plot of chunk unnamed-chunk-30

What's next?



plot of chunk unnamed-chunk-31

Diving into Interactive charts

  • Zoom on a specific part
  • Get information when hovering
  • Make groups appear / disappear
  • export directly
  • move on axis
  • Play with your chart
  • Make your dataviz alive!

HTML WIDGETs

Plotly

Plotly

  • "Plotly is the modern platform for agile business intelligence and data science"
  • https://plot.ly/
  • and a html widget as well
library(plotly)
  • Make a plot with plot_ly() or ggplotly()

Apply plotly to the gapminder dataset

# Basic ggplot2 chart
p=ggplot(data, 
  aes(gdpPercap, lifeExp, size = pop, color = continent, text=country)) +
  geom_point()

# Made interactive with plotly
library(plotly)
ggplotly(p)

If you know ggplot2, you know how to do interactive charts!

Apply plotly to the gapminder dataset

Leaflet

D3network

D3heatmap

Communicate your result

Communicate your result

  • Copy and paste in powerpoint? in an e-mail?
  • Make a figure for publication with handmade modification?
  • Are you sure you can provide exactly the same result as last time?

Communicate your result

  • Copy and paste in powerpoint? in an e-mail?
  • Make a figure for publication with handmade modification?
  • Are you sure you can provide exactly the same result as last time?

    Data and garbage

Most Published Research Findings Are False

Findings are false

Plos Medicine

We need reproductibility, And R is the perfect tool for that.

Introducing RMarkDown


Basic RMD

  • Turn your analysis into reports
  • Fully reproducible
  • Weave together narrative text and code
  • Many output formats: PDF, HTML, websites...

http://rmarkdown.rstudio.com/

Header

---
title: "Analysing the Gapminder dataset"
author: "Yan Holtz"
date: '`r as.character(format(Sys.Date(), format="%d/%m/%Y"))`'
output:
  html_document:
    toc: yes
---

Title & text

---
title: "Analysing the Gapminder dataset"
author: "Yan Holtz"
date: '`r as.character(format(Sys.Date(), format="%d/%m/%Y"))`'
output:
  html_document:
    toc: yes
---


# 1- Introduction
Hi Robert, here is my reproducible analysis concerning the Gapminder dataset!

R code !

---
title: "Analysing the Gapminder dataset"
author: "Yan Holtz"
date: '`r as.character(format(Sys.Date(), format="%d/%m/%Y"))`'
output:
  html_document:
    toc: yes
---


# 1- Introduction
Hi Robert, here is my reproducible analysis concerning the Gapminder dataset!


# 2- Get data
The data are included in the gapminder library
\```{r}
library(gapminder)
head(gapminder)
\```

Basic HTML output

Pimp my RMD

Introduction to shiny applications

Introduction to shiny applications


Basic RMD

Introducion to shiny applications

Basic RMD

Ui.R

ui <- fluidPage(

  # Widget to choose year
  selectInput(
    "year", "Select a year!", 
    choices=unique(gapminder$year), selected=1952
    ),

  # Interactive plot
  plotlyOutput("plot")

)

Server.R

server <- function(input, output) {
    output$plot <- renderPlotly({

        # Select data
        data=subset(gapminder, year==input$year)

        # Make the plot
        p=ggplot(data, 
            aes(gdpPercap, lifeExp, size = pop, 
                color = continent, frame = year)) +
            geom_point()
        ggplotly(p)
    })  
}

Going further with Shiny

TI Demo

be open-minded!

Other Viz

From Dataviz to DataArt