nnfee
This is an attempt at estimating bitcoin fees with neural networks.
Tensorflow has some great tutorials. I started from the iris problem and tried to model my data accordingly.
Collection
Blocks and transactions
Data was collected between 2017-03 and 2018-02 (including these months). This included each bitcoin block and each transaction within those blocks.
Total space used: 115GB. Each block was saved as a gzipped json. A slim version of the block was also saved, containing only the relevant information. This allowed faster searching.
Smartbit provides an API which allows returning all the blocks, transactions, when the transaction was first seen, and when it was confirmed.
Mempool
Historical data was collected from https://jochen-hoenicke.de/queue/. This included:
- timestamps
- number of transactions in the mempool
- total amount of BTC to be payed as fees
- size of the mempool in MB
Preparation
Training data was created from:
- fee per byte
- how many MB the mempool had at the time of confirmation
- how many transactions were in the mempool at the time of confirmation
- how fast the transaction was confirmed
The time it took to confirm a transaction was normalized. For example, all transactions which took 15 minutes or less were assigned the number 0. All transactions which took less than 30 minutes but more than 15 minutes were assigned the number 1. Etc.
The numbers assigned are:
- 0: 15 minutes
- 1: 30 minutes
- 2: 1 hour
- 3: 4 hours
- 4: 12 hours
- 5: 1 day
- 6: 3 days
- 7: +3 days
Model
The training was split in 12 parts, each part containing the transactions which happened within a certain month. 5% of the transactions from each month were randomly selected to verify the neural network and were not used for training.
Using this data, the neural network achieved 71.8% accuracy (which is better than random - hehe).
The model is a 14.3GB folder. Compressed, it is 1.8GB. Serves me right for using so much training data. Guess I need to torrent it.
The source code, including docker stuff is available here. That is all you need to replicate this experiment (it might take longer than you expect).
Conclusion
I am not convinced all this training data is needed. Perhaps training a few neural networks depending on the state of the mempool is a better approach. For example, we could have spam mode and smooth mode. Luckily, 2017 had a lot of variance which allowed training this neural network.
Also, if I used the wrong type of neural network, please tell me. I need to learn more about neural network layer configuration and their different types.
Eye ball attempts at estimating fees sort of indicate this kinda works (what a great sentence). Hosting this as an API would make testing it much easier. If there is interest, I will do it.
Listening to bitcoin transactions and checking when they confirm might be a better approach than loading 1 year of data. Training could be performed (I suspect) with the last 10 days of bitcoin transactions and it might perform better. We’ll see.