Harvesting crypto currency prices

Recovering 1M data points on 5 exchanges and 5 currencies

Public exchange API

Crypto currencies are bought and sold on different exchanges. Basically it is like a bank, but for crypto. The price at which a token is traded depends on the offer and on the demand. Thus it evolves permanently, every couples of seconds.

It is possible to recover this price using the public API of exchanges. Let’s say you want to know the Bitcoin price in the Kraken exchange. You can do that in your browser, typing this URL:


It gives you several information, three being of interest:

Do it programming

It is totally possible to do the same programming. This is handy since it will allow to recover the prices every couple of second automatically.

Here is an example using the R programming language to get the price of the bitcoin on Kraken and showing it in a clean table:

# package

# Recover the information
adress <- "https://api.kraken.com/0/public/Ticker?pair=BTCEUR"
ticker <- getURLContent(adress)

# Make the format more readable
tmp <- fromJSON(ticker)$result[[1]]
result <- data.frame(ask=tmp$a[1], bid=tmp$b[1], last=tmp$c[1], open=tmp$o, low=tmp$l[1], high=tmp$h[1], volume=tmp$v[1], volumeQuote=NA, timestamp=NA)

# Show result
ask bid last open low high volume
5697 5697 5697 5686 5671 5702 56

Code resource

I’ve written a set of functions allowing to get the price of many different currencies for the 5 main exchanges. You can easily use these functions. For instance, type the code below in R:

# Source functions that are stored on github  

# Use it: price of the bitcoin on bitstamp  
get_bitstamp(Sys.time(), "BTCEUR")

Harvesting (a lot of) data

I’ve harvested crypto prices:

This was easily done using an infinite loop that called the functions described above. The exact script used for this work is available here. 800,000 data points were recovered.

The resulting dataset is available on github in a compressed format. You can easily read it in R doing:

 # Load the data

# Have a look to the first lines

As a teaser, here is the evolution of the etherum price on Bitstamp on this period of time:

# Load the data

# Make the plot
Ticker %>%
  filter( symbol == "ETHEUR" ) %>%
  filter(platform == "Bitstamp") %>%
  ggplot( aes(x=time, y=as.numeric(last))) +
    geom_ribbon(aes(ymin=450, ymax=as.numeric(last)),  fill="#69b3a2", color="transparent", alpha=0.5) +
    geom_line(color="#69b3a2") +
    ggtitle("Evolution of Etherum price on the period") +
    ylab("bitcoin price ($)") +
    theme_ipsum() +
      plot.title = element_text(size=12)

Next step

The next step take this dataset and quantifies the differences between platform. If differences are big enough, we have a chance to perform arbitrage.

Quantifying differences


A work by Yan Holtz for data-to-viz.com