Multidimensional LSTM Networks to Predict Bitcoin Price, Jakob Aungiers
This article builds on the work from my last one on LSTM Neural Network for Time Series Prediction. If you toevluchthaven’t read that, I would very recommend checking it out to get to grips with the basics of LSTM neural networks from a ordinary non-mathematical angle.
Considering the latest re-surge te whirr around the ridiculous Bitcoin bubble Bitcoin currency, I thought I would theme this article topically around predicting the price and momentum of Bitcoin using a multidimensional LSTM neural network that doesn&rsquo,t just look at the price, but also looks at the volumes traded of BTC and the currency (te this case USD) and creates a multivariate sequential machine learning proefje out of it.
Spil anyone who’s bot on a date with mij knows, I find puny talk boring, so let’s just hop right into it!
The very first thing wij will need is the gegevens. Fortunately, Kaggle have a joy dataset of minute-by-minute historical gegevens set from Bitcoin which includes 7 factors. Volmaakt!
Wij will however need to normalise this dataset before feeding it into our network of LSTMs. Wij will do this spil vanaf the previous article where wij take a sliding window of size N across the gegevens and re-base to gegevens to be comebacks from 0 where .
Now this being a multidimensional treatment, wij are going to be doing this sliding window treatment across all of our dimensions. Normally, this would be a ache te the butt. Fortunately the Python Pandas library comes to the rescue! Wij can represent each window spil a Pandas dataframe and wij can then perform the normalisation operation across the entire dataframe (i.e. across all columns).
The other thing you will notice with this dataset is that especially at the beginning, the gegevens is not very clean. There&rsquo,s a loterijlot of NaN values floating around ter various columns which would not make our proefje particularly blessed. Wij&rsquo,ll take a lazy treatment to fixing this: when wij create our window wij’ll check if any value te the window is a NaN. If it is, wij will swipe left throw away the window and stir on to the next one.
Whilst wij’re here, let’s make thesis functions into a self contained class called ETL (samenvatting, convert, geyser) and save it spil etl.py that way wij can call this entire gegevens loader spil a library.
Here is the core code from our clean_data() fundtion:
Once wij have done this, wij simply make sure our LSTM specimen accepts sequences of form M where M = the number of dimensions of our gegevens, and wij&rsquo,re done!
The Mischief With Loading In-Memory
Or that&rsquo,s what you would think, but life is infrequently everzwijn that effortless. See the very first time I attempted to do this, my machine shuddered to a halt and talent mij a memory error. The kwestie, you see, comes from the fact that the Bitcoin dataset, being a minute-by-minute dataset, is fairly large. When normalised it is around 1 million gegevens windows. And loading all of thesis 1 million windows into Keras to train on at once makes for a pretty bad time.
So how would I train on this gegevens without adding an reserve 100Gb of RAM to my machine? Furthermore, if this gegevens grew to 100x the size adding more RAM wouldn&rsquo,t exactly be feasible. Well this is where the Keras fit_generator() function comes te pretty damn handy!
Ter a nutshell, a generator iterates overheen gegevens of unknown (and potentially infinite) length, only passing out the next lump every time it is called. Now if you have half a brain I&rsquo,m sure you can see where this would come te useful, if wij can train the specimen one a puny chunk of windows at a time, then throw away those windows once wij are done with them to be substituted by the next set of windows. This trains the prototype with low memory utilisation. Volmaakt! Technically speaking, if you made the windows petite enough you could even train this prototype on your IoT toaster machine if you indeed desired to!
What wij need to do then, is create a generator that creates a batch of windows to then pass to the Keras fit_generator() function. Effortless, wij just extend the core clean_data() code to yield (terugwedstrijd te a generative way) batches of windows:
No-One Likes Re-Runs
Now the other kwestie I found here wasgoed the clean_data() generator that I created wasgoed taking on average 3-4 seconds to create each &ldquo,batch&rdquo, of windows. On its own this wasgoed acceptable spil it took around 15-20mins to get through the training gegevens batches. However, if I dreamed to tweak the prototype and re-run it, it would take an awful long time to re-train it again.
What can wij do? Well, how about pre-normalising it then saving the normalised numpy arrays of windows to a opstopping, hopefully one that preserves the structure and is super-fast to access?
HDF5 to the rescue! Through the use of the h5py library wij can lightly save the clean and normalised gegevens windows spil a list of numpy arrays that takes a fraction of a 2nd IO time to access. So let’s make a function that does exactly that and call it create_clean_datafile():
Now wij can just create a fresh generator function generate_clean_data() to open the hdf5 verkeersopstopping and slobber out those same normalised batches at lightning prompt speed into the Keras fit_generator() function!
Looking at the gegevens however, wij don&rsquo,t want to add unnecessary noise with some of the dimensions. What I have done is created an argument to the create_clean_datafile() function that takes te factors (columns) to filterzakje. With this, I&rsquo,ve narrowed my gegevens opstopping down to a 4-dimensional time series consisting of Open, Close, Volume (BTC) and Volume (Currency). This will cut down on the time I&rsquo,ll take to train the network spil well. Winning!
The gegevens is then fed into the network which has one input LSTM layer that takes te gegevens of form [dimensions, sequence_size, training_rows], a 2nd LSTM layer that&rsquo,s hidden, and a fully connected output layer with a tanh function for slobbering out the next predicted normalised comeback percentage.
Training is done by calculating the steps_per_epoch based on our number of epochs and our train/test split spil specified te our configs JSON opstopping.
Testing is done te a similar way, using the same generator spil for training and utilising the Keras predict_generator() function. The only reserve thing wij need to add ter when predicting our test set is a generator function that iterates the generator and splits out the x and y outputs. This is because the Keras predict_generator() function only takes the x inputs and wouldn’t know what to do with a tuple of x and y values. However wij still want the y values (true gegevens), so wij store them te a separate list spil wij want to use them for plotting against to be able to visualize our results compared to the true gegevens. Wij then do the same but rather than predict on a a step-by-step voet wij initialise a window of size 50 with the very first prediction, and then keep sliding the window along the fresh predictions taking them spil true gegevens, so wij leisurely commence predicting on the predictions and hence are forecasting the next 50 steps forward.
Ultimately, wij save the test set predictions and test set true y values te a HDF5 opstopping again so wij can lightly access them te the future without re-running everything, should the proefje turn out to be useful. Wij then plot the results on Two matplotlib charts. One demonstrating the daily 1-step-ahead predictions, the other demonstrating 50-steps ahead predictions.
Hello Sweet Bitcoin Profit
Wij then go for the forecasting of Bitcon price! Spil vanaf my last article, wij will attempt and do two types of forecasts:
The very first will be predicting on a point-by-point ondergrond, that is predicting the t+1 point, then shifting the window of true gegevens and predicting again the next point along. Repeat. Here is the results of point-by-point predictions:
The 2nd forecast type is a t+n numerous steps ahead forecast, where wij populate the shifting window with predictions initialised from a window of true gegevens and plot N steps ahead. The results of this look like:
What can wij see? Well, wij can see that when predicting 1 step ahead it’s doing a very reasonable job. Periodically it’s out, but te general it goes after the true gegevens fairly well. However, the predictions do emerge far more volatile than the true gegevens. Without doing more tests it’s hard to ascertain why this might be and if a proefje re-parameterisation would fix this.
When predicting the trend however this specimen starts to fall a bit on its face. The trend doesn’t seem particularly accurate to monster and is veranderlijk at times. However! What is interesting is the size of the predicted trend line does seem to correlate with the size of the price moves (volatility).
I am going to use this section to take off my AI hat and waterput on my investment manager hat to explain a few key truths.
The main thing one should realise is that predicting comes back is a pretty futile exercise. I mean sure, it’s the holy grail of forecasting to be able to predict comes back, and whilst some top end hedge funds do attempt to do just that by finding fresh alpha indicators ter truth it’s a pretty hard thing to do due to the massive swaths of outward influences that thrust an asset price. Ter real terms it’s comparable to attempting to predict the next step of a random walk.
However, all is not lost and our exercise isn’t fully pointless. See, whilst with limited time series gegevens, even with numerous dimensions it’s hard to predict comebacks, what wij can see, especially from the 2nd chart, is that there is an avenue there to predicting volatility. And not just volatility, but wij could also expand that to predict market environments ter a way permitting us to know what type of market environment wij are presently te.
Why would this be useful? Well a lotsbestemming of different strategies (which I won’t go into here) work well ter different market environments respectively. A momentum strategy might work well ter a low vol, strongly trending environment whilst an arbitrage strategy might be more successful te producing high comebacks ter a high vol environment. Wij can see that by knowing our current market environment and predicting the future market environments are key to allocating the keurig strategy to the market at any given time. Whilst this is more of a general investment treatment for traditional markets, the same would apply to the Bitcoin market.
So spil you can see, predicting longer-term Bitcoin prices is presently (spil with all types of stock markets) pretty hard and nobody can rechtsvordering to do so from just the technical time-series gegevens because there are a lotsbestemming more factors that go into the price switches. Another kwestie which is worth touching on with the use of LSTM neural networks across a dataset like this is the fact that wij are taking the entire time series gegevens set spil a stationary time series. That is to say, the properties of the time series are assumed unchanged via time. This is unluckily not true spil the factors that influence price switches also vary overheen time, so assuming a property/pattern that the network finds ter the past remains true for the present day is a naive treatment that doesn’t necessarily hold.
There is work that can be done help with this non-stationarity kwestie, the leading edge research of this presently concentrates on using Bayesian methods alongside of LSTMs to overcome the punt of time series non-stationarity.
But that’s far out of scope for this brief article. I may make another postbode ter the future detailing the implementation of this. Check back soon!
Te the meantime, feel free to browse the total code for this project on my GitHub pagina: Multidimensional-LSTM-BitCoin-Time-Series