Blockchain based coins like Bitcoin and bitcoin based technology as Ethereum are becoming more and more important. In this blog post, I will showcase my CryptoHist repository for scraping historic cryptocurrency data. This will be used as input for the next blog post in which we will analyze the data! For more information about blockchain, definitely read this article.
Disclaimer: Use the repository at your own risk!
Blockchain is in fact a distributed database. This is perfect for objects like currency: it is possible to keep track of all the transactions. It is becoming more and more popular and the applications are endless. It would be interesting to study the behavior over time of cryptocurrencies and in order to do so we will need a dataset of the history of cryptocoins. Therefore I created CryptoHist. I only added CoinMarketCap as source, since it is easy to use and contains most relevant information. Feel free to contribute and add more sources! Give the repository a star if you like it!
Under the hood
The code consists of a Downloader object which downloads URLs and writes a cache file on disk. It also caches the data in-memory so if you call an URL for the second time in the same script it doesn’t have to download. This functionality is key for fetching all the data.
The CoinMarketCap scraper is in fact nothing more than a class doing two things:
- Download all currency information from an index page. On CoinMarketCap, such an index page can be found here. This index is then used to fetch the detail pages of the currencies.
- The detail pages contain a table consisting of price and market information. This is fetched and put into a Pandas dataframe.
The data can then be accessed by using one of the methods described in the README of the repository.
CryptoHist is a cryptocoin history scraper written in Python. Feel free to contribute to the repository! If you have any questions or suggestions, please mention it in the comment section.
Get updates in your inbox
Join over 8,000 data science learners.