Comparing crypto prices between exchanges

Are there differences? How big? How often? on which exchange and currency?


Prices of 5 cryto currencies have been harvested for 2 weeks on 5 different exchanges every few seconds as described here. About 800,000 data points are available, allowing to compare prices between exchanges at a high resolution.

Here is an overview of the Etherum price on the period on the 5 exchanges:

# Library

# Load result
Ticker$last <- as.numeric(Ticker$last)
#Ticker= head(Ticker, 100000)

# Plot
p <- Ticker %>%
  filter(symbol=="ETHEUR") %>%
  filter(last>400) %>%    # delete a weird value
  ggplot( aes(x=time, y=last, color=platform, group=platform)) +
    geom_line( size=0.8, size=0.4) +
    scale_color_viridis(discrete=TRUE, name="Exchange") +
    ylab("Etherum price") +
    xlab("Day (February)") +
    theme_ipsum() +
    scale_x_datetime(date_breaks = "1 day", date_labels = "%d", minor_breaks=NULL) +
    ggtitle("Etherum price evolution for 4 exchanges") +
    theme( plot.title = element_text(size=12, color="grey"))

# Save the plot for talk
ggsave(p, file="IMG/etherum_evolution.png", width=7, height=4)

There is a very high correlation between exchange prices. But if we zoom on the first 1,000 data points, a few discrepancies appear:

# Plot
p <- Ticker %>%
  filter(symbol=="ETHEUR") %>%
  filter(last>400) %>%    # delete a weird value
  head(1000) %>%
  ggplot( aes(x=time, y=last, color=platform, group=platform)) +
    geom_line( size=0.8) +
    scale_color_viridis(discrete=TRUE, name="Exchange") +
    ylab("Etherum price") +
    theme_ipsum() +
    ggtitle("zoom on the 1000 first data points") +
    theme( plot.title = element_text(size=12, color="grey")) 

# Save the plot for talk
ggsave(p, file="IMG/etherum_evolution_zoom.png", width=7, height=4)


As you can see on the graphic above, there are discrepencies between prices of exchanges. Basically, the price is not exactly the same on Bitstamp and on Cex at a given timestamp. The correlation is very strong tough.

This is what interest us to perform arbitrage. But can we quantify these differences? The following graphic displays the price difference (in %) between Kraken and Bitstamp at each time stamp:

get_last_difference <- function(plat1, plat2, currency){
  # First I calculate the differnce of the 'last' price between both exchanges:
  diff <- Ticker %>%
    filter(symbol==currency) %>%
    filter(platform %in% c(plat1, plat2)) %>%
    select(time, platform, last, symbol) %>%
    spread(platform, last) %>%
    mutate(diff=.[[4]] - .[[3]], diff_perc=(.[[4]] - .[[3]]) / .[[4]] *100 ) %>%
  # Then I plot the result
plot_last_difference <- function(diff, plat1, plat2){
  p <-  diff %>%
        sample_frac(0.1) %>%    
        ggplot( aes(x=time, y=diff_perc, group=symbol, fill=symbol)) +
          geom_area(fill="#69b3a2") +
          theme_ipsum() +
            plot.title = element_text(size=12, color="grey")
          ) +
          ylab("Price difference (%)") +
          xlab("Day (February)") +
          ggtitle(paste0("Etherum price difference between ", plat1, " and ", plat2)) +
          scale_x_datetime(date_breaks = "1 day", date_labels = "%d", minor_breaks=NULL) +

# Run the function for Kraken and bitstamp
diff <- get_last_difference("Kraken", "Bitstamp", "ETHEUR")
p <- plot_last_difference(diff, "Kraken", "Bitstamp")

ggsave(p, file="IMG/price_diff_eth_kraken_bitstamp.png", width=7, height=4)

# Compute a few summary statistics
lengthPeriod <- max(diff$time) - min(diff$time) 
numTimeStamp <- nrow(diff)
numOver1 <- diff %>% filter(diff_perc > 1 ) %>% nrow()
numUnder1 <- diff %>% filter(diff_perc < -1 ) %>% nrow()
numTot <- numOver1 + numUnder1
freqOver1 <- numTot / as.numeric(lengthPeriod) %>% round(0)

The chart aboves represents a period of 14 days where 42087 prices have been recovered. In total, 2293 situations with Bitstamp being at least 1% more expensive than Kraken have been observed, and 644 reciprocally. This is a total of 2937 times with differences over 1%, which is an average of 210 per day.

Important note: this chart represents the price differences of the last transaction done. However, note that we buy crypto at the ask price, and sell it at the bid price. Differences are thus smaller than that in reality.

Where is the best deal?

Now that we know there are a few differences between exchanges, we need to know what pair of exchange have the most frequent discrepencies. Moreover, we need to figure out if some currencies have more discrepencies than others.

The next graphic quantifies the number of differences observed per day (Y axis) for a specific threshold (X axis). Four lines are provided, one per currency that exists in both Kraken and Bitstamp. For instance, it appears that differences over 1% happen about 2OO times a day for bitcoin, and about 300 times a day for litecoin (XRP) and Bitcoincash. It is also interesting to note that differences over ~1.7% basically never happen.

# A function that calculates the difference between the ASK and the BID of 2 platforms for every currency at each time stamp
find_askbid_difference <- function(plat1, plat2){
  diff <- Ticker %>% 
    filter(platform %in% c(plat1, plat2)) %>%
    select(time, platform, symbol, ask, bid) %>%
    mutate(ask=as.numeric(ask), bid=as.numeric(bid)) %>%
    gather(temp, value, -time, -platform, -symbol) %>%
    mutate(platform=gsub(plat1,"plat1", platform)) %>%
    mutate(platform=gsub(plat2,"plat2", platform)) %>%
    unite(temp1, platform, temp, sep="_") %>%
    spread( key=temp1, value=value) %>%
    ) %>%
    rowwise() %>%
    mutate( diff_perc=max(diff1, diff2) ) %>%

# A function that plot these differences
plot_askbid_difference <- function(diff){
  p <-  diff %>%
          sample_frac(0.1) %>%
          ggplot(aes(x=time, y=diff_perc, group=symbol, fill=symbol)) +
            geom_area() +
            facet_wrap(~symbol, nrow=5) +
            theme(legend.position="none") +

# A function that counts the number of significant differences for several threshold
find_signif_diff <- function(diff){
  for( i in seq(0.7,4,0.2)){
    df <-  diff %>%
      group_by(symbol) %>%
      filter(diff_perc > i) %>%
      summarise( nb_over_thres = n() ) %>%
      mutate( thres = i) %>%
      arrange( nb_over_thres )
    nbSignifDiff <- rbind( nbSignifDiff, df)

# A function that plot this number of diff -> plot it PER DAY
plot_signif_diff <- function(nbSignifDiff){
  p <- nbSignifDiff %>%
    filter(symbol=="BCHEUR") %>%
    mutate(nb_over_thres = nb_over_thres / time_length(lengthPeriod, unit="day")) %>%
    mutate(mytext=paste0("Currency: ", gsub("EUR", "",symbol), "\n", round(nb_over_thres,0)," differences per day observed over ",thres, " %")) %>%
    mutate(symbol = gsub("EUR", "", symbol)) %>%
    ggplot( aes(x=thres, y=nb_over_thres, group=symbol, color=symbol, text=mytext)) +
      geom_line() +
      ylab("Number of differences / day") +
      xlab("Difference in %") +
      ggtitle("Number of price differences between Kraken and Bitstamp") +
      theme_ipsum() +
      theme(plot.title = element_text(size=12, color="grey")) +
      scale_color_viridis(discrete=TRUE, name="Currency")
    #ggplotly(p, tooltip="text")

# Apply that to a first example:
diff <- find_askbid_difference("Kraken", "Bitstamp")
nbSignifDiff <- find_signif_diff(diff)
p <- plot_signif_diff(nbSignifDiff)

It is possible to build exactly the same graphic for each pair of exchange, giving the following graphic. Note that some currencies are not available on every exchange, that’s why most of the pair below give results only for bitcoin that is the most widely used currency.

# find all the pairs of platform
list_platforms <- Ticker$platform %>% unique() %>% as.character()
list_pairs <- combn(list_platforms, 2)

# find the number of significant differences for each pair
for(i in c(1:ncol(list_pairs))){
  diff <- find_askbid_difference(list_pairs[1,i], list_pairs[2,i])
  nbSignifDiff <- find_signif_diff(diff)
  nbSignifDiff$pair <- paste( list_pairs[1,i], list_pairs[2,i], sep="-")
  bilan <- rbind(bilan, nbSignifDiff)

# plot it
p <- bilan %>%
  mutate(nb_over_thres = nb_over_thres / time_length(lengthPeriod, unit="day")) %>%
  mutate(mytext=paste0("Currency: ", gsub("EUR", "",symbol), "\n", nb_over_thres," differences observed over ",thres, " %")) %>%
  ggplot( aes(x=thres, y=nb_over_thres, group=symbol, color=symbol)) +
      geom_line() +
      ylab("Number of significant differences per day") +
      xlab("Difference threshold (%)") +
      theme_ipsum() +
      scale_color_viridis(discrete=TRUE, name="") +
      facet_wrap(~pair, ncol=5) +
        panel.spacing = unit(0.3, "lines"),
        strip.text.x = element_text(size = 8)
ggplotly(p, tooltip="text")
#ggsave(p, file="IMG/difference_counts_step3.png", width=7, height=4)

A few observation are of interest:


Next step

The next step is to understand what arbitrage is in depth, and study its potential limitations.

< How data was harvested What is arbitrage >


A work by Yan Holtz for data-to-viz.com