Australian (ASX) Stock Market Forum

Designing a Neural Network, need your help...

Hi Nick,

You said 1 input. Are you using feedback loops ?

Hi Dave,

I am just at the beginning, where I learn and try to figure out the proper architecture of the neural networks for data forecasting. I have a lot to read. I came across a very valuable (in my opinion) document that describes some apparently successful approach of forecasting Forex - http://arxiv.org/ftp/cond-mat/papers/0304/0304469.pdf

It provides methods to analyse if data is forecastable and to estimate the quality of prediction and this can be used not only with neural networks, but with any kind of system, for example any of the indicators. For example, if we would implement various strategies (ichimoku, MA crossover, etc) for a given sample, we could identify if any of these indicators are worthwhile and which of them are not worth bothering with.

I will probably purchase in the future one of Howard's books, it should provide even more valuable information.
I already installed Python on one of my computers, but I have to focus in one direction, as I tend to look everywhere and not finish anything.

So far I am involved in trying to understand and get comfortable with some C# sample code for neural networks - I got excellent quality code, but I will have to modify it for my needs. I've been a beginner programmer for the last 20 years :)

I am uncertain if using just one or two inputs would be sufficient for forecasting - at the moment (without enough reading), I tend to believe that for pattern recognition, it would be better to provide input data from the last n periods (for example, have 10 inputs to receive smoothed data of last 10 days, plus some extra inputs. I changed my mind from using one input based on this - if we consider a sinewave with values between -1 to +1, even if I tell you that current data is 0.5, you would be unable to tell me if the next value will be higher or lower than 0.5. While if you have the last n results and plot them (provided that data is sampled correctly), it would be trivial to answer the question. If I read more, I might change again my theory :)

To answer your question, in the first stage I will probably build a backpropagation network with 100 inputs and one hidden layer (not sure how many hidden neurons, probably between 2 and 100, on a single layer) and see if it will be capable of learning the data, maybe for 1000 values. With ANN, there is a problem with choosing the number of neurons on a hidden layer - too few and it may never learn, too many and it will overfit the data). Also the learning rate and momentum need to be chosen carefully - too small and it will take forever to learn, too large and it will never learn. At least, this is what I know at the moment.

Later, I may try to implement the network as described in the document above (an Elman-Jordan architecture with two hidden layers, each of 100 neurons). Then, I could laugh how my own network is so much better...

Anyway, ANN and artificial intelligence is such a vast area of research, I guess I will need an entire month to become expert in it...

Cheers,
Nick
 
Hi Nick --

It is possible to use only one input data stream for a neural network. But there will probably be several derivatives created from it, so that input layer to the NN will have several input nodes.

For example, begin by using closing price as the input stream. Since all variables must be transformation invariant, price itself is not a good input. So create several derivative indicators that are transformation invariant. Try RSI(2), RSI(3), Z-Score(5), ROC(1), etc. There should be as many zero crossings per time period (say month) in each of the inputs to the NN as there are changes in state of the target variable per that same time period.

Since a neural network works best when the inputs are all in the range of 0 to 1, normalize everything using a sliding window. There will be, say 10, inputs to the NN. Each row of the input must be independent of all other rows. So, in order to compare values or use changes, the prior data or change must be included as its own variable. For example, RSI(2) today and RSI(2)ChangeFromYesterday. 10 inputs is not many, but look at what happens next.

There are proofs that a single hidden layer is sufficient. Try two, if you wish. Not three or more.

The number of terms in the equation represented by the NN is the product of all the nodes. With 10 inputs, 5 in each of two hidden layers, 1 output, there are 250 nodes. If the period of stationarity is determined to be one year, 252 trading days, the learning process will be fitting a 250 term equation to 252 data points. This is all in-sample and the fit will be really good. Test by predicting the next three months -- 60 trading days -- as an out-of-sample test. Expect results to really bad. (If they are good, double check everything -- you are probably fooling yourself and made a mistake somewhere.) If they are bad, either reduce the number of nodes or increase the number of data points. Reducing the number of nodes means the equation will be less complex -- fewer inputs and / or fewer hidden nodes. Deciding what to keep and what to give up is difficult and will require experimentation. Increasing the number of data points is good provided that relationships being learned are consistent throughout -- this is concept of stationarity. More data is bad when the additional data is different.

Keep us posted.


Best regards,
Howard
 
There are proofs that a single hidden layer is sufficient. Try two, if you wish. Not three or more.
I would agree with a single layer - 2nd & subsequent layers usually make a trivial difference to the results and take an order of magnitude more to learn.

Visit the forums at kaggle.com for more ideas - IMO a small community of v. high quality people solving real world problems.

Also consider getting a (free) AWS account & running your python scripts there 24/7.
 
Hi Nick --

It is possible to use only one input data stream for a neural network. But there will probably be several derivatives created from it, so that input layer to the NN will have several input nodes.

For example, begin by using closing price as the input stream. Since all variables must be transformation invariant, price itself is not a good input. So create several derivative indicators that are transformation invariant. Try RSI(2), RSI(3), Z-Score(5), ROC(1), etc. There should be as many zero crossings per time period (say month) in each of the inputs to the NN as there are changes in state of the target variable per that same time period.

Since a neural network works best when the inputs are all in the range of 0 to 1, normalize everything using a sliding window. There will be, say 10, inputs to the NN. Each row of the input must be independent of all other rows. So, in order to compare values or use changes, the prior data or change must be included as its own variable. For example, RSI(2) today and RSI(2)ChangeFromYesterday. 10 inputs is not many, but look at what happens next.

There are proofs that a single hidden layer is sufficient. Try two, if you wish. Not three or more.

The number of terms in the equation represented by the NN is the product of all the nodes. With 10 inputs, 5 in each of two hidden layers, 1 output, there are 250 nodes. If the period of stationarity is determined to be one year, 252 trading days, the learning process will be fitting a 250 term equation to 252 data points. This is all in-sample and the fit will be really good. Test by predicting the next three months -- 60 trading days -- as an out-of-sample test. Expect results to really bad. (If they are good, double check everything -- you are probably fooling yourself and made a mistake somewhere.) If they are bad, either reduce the number of nodes or increase the number of data points. Reducing the number of nodes means the equation will be less complex -- fewer inputs and / or fewer hidden nodes. Deciding what to keep and what to give up is difficult and will require experimentation. Increasing the number of data points is good provided that relationships being learned are consistent throughout -- this is concept of stationarity. More data is bad when the additional data is different.

Keep us posted.

Best regards,
Howard

Hi Howard,

First of all, I am really grateful that somebody with your experience is guiding us. I will need to read some more about the matters you explained. But please let me know if I understand this correctly - neural networks is all about pattern recognition. You say that one can use only one variable and a few derivatives of that variable. Let's say I use Close and RSI(2), RSI(3), Z-Score(5), ROC(1). Since these variables are based on data of the last 5 days at most (let's make this assumption, even if it may not be correct) - does it mean that the pattern recognition is limited to data of the last 5 days? Does it mean that if I have a very smart ANN, I can supply just the close values of the last five days to its five inputs and, being so smart, it will remove the noise by itself, find the best data and create its own internal indicators and provide better results?

If this is the case, would this approach work? It is an alternative to ANN. I have 4 input variables. I normalize each of them to 0...1 interval and then I create a large matrix and I split this interval in 3 ranges - then, for every new data set, I store the following information - pattern[0..2][0..2][0..2][0..2], increment the number of appearances of this particular pattern, how many time the price increased following this pattern, how many times the price decreased. Then I choose only the patterns that appear most often and that have the largest bias between wins/losses. There is an obvious roughness of this method, compared to ANN, but the advantage would be that learning would be much faster.

By the way, what is the expected learning time for a neural network like you described above? Is it minutes, hours, days, weeks?

Best regards,
Nick
 
I would agree with a single layer - 2nd & subsequent layers usually make a trivial difference to the results and take an order of magnitude more to learn.

Visit the forums at kaggle.com for more ideas - IMO a small community of v. high quality people solving real world problems.

Also consider getting a (free) AWS account & running your python scripts there 24/7.

Hi Keith,

Thank you very much for invitation. I will join that forum as well. It is quite amazing that Amazon offers computer power for running user scripts. My computers sleep for most of the day, so at the moment I could put them to work, if I have some ideas.

Cheers,
Nick
 
But please let me know if I understand this correctly - neural networks is all about pattern recognition.
Yes. And the best pattern recognition system by far is the human brain. However, as demonstrated by some of the TA threads it can suffer from overfitting.:eek:

You say that one can use only one variable and a few derivatives of that variable. Let's say I use Close and RSI(2), RSI(3), Z-Score(5), ROC(1). Since these variables are based on data of the last 5 days at most (let's make this assumption, even if it may not be correct) - does it mean that the pattern recognition is limited to data of the last 5 days? Does it mean that if I have a very smart ANN, I can supply just the close values of the last five days to its five inputs and, being so smart, it will remove the noise by itself, find the best data and create its own internal indicators and provide better results?

If this is the case, would this approach work? It is an alternative to ANN. I have 4 input variables. I normalize each of them to 0...1 interval and then I create a large matrix and I split this interval in 3 ranges - then, for every new data set, I store the following information - pattern[0..2][0..2][0..2][0..2], increment the number of appearances of this particular pattern, how many time the price increased following this pattern, how many times the price decreased. Then I choose only the patterns that appear most often and that have the largest bias between wins/losses. There is an obvious roughness of this method, compared to ANN, but the advantage would be that learning would be much faster.
So you are essentially the brains behind the NN.

At it's simplest, a NN is list of nodes that each have a weighting that it may or may not pass to an output. If one of the inputs is the weather in Timbukto yesterday, you'd hope that node was weighted at 0. For inputs that aren't completely random (eg yesterdays price move), there should be a non-zero weighting. It will take a NN minutes to work out that Timbukto isn't relevant to tomorrows price, whereas you would know it instantly.

Your NN should find there's some fairly obvious patterns that work a little over 50% of the time. Your own brain should also be able to spot them too. Whether they are tradeable or not is a different question.

The clever bit of NNs is the back propagation (or deciding on each nodes weightings). As you propose above, human brains are far better at it for small data sets.

By the way, what is the expected learning time for a neural network like you described above? Is it minutes, hours, days, weeks?
As long as a piece of string... depends on how many layers, how many nodes, how much data, how much processing power, your stopping conditions. When developing a NN aim for minutes, so you can iterate to the next dud idea faster. When you've got something vaguely reasonable throw lots of data at it & let it train for a couple of days ?

And as mentioned by others, it is absolutely essential to try it with out of sample data to see how it actually performs.
 
Yes. And the best pattern recognition system by far is the human brain. However, as demonstrated by some of the TA threads it can suffer from overfitting.:eek:

Hmm, my attempt to using the ANN was to find a better alternative than my brain. Without trying to hurt its feelings, my brain is not capable of winning the Forex game. It is pretty lazy and predisposed to gambling...

The good part is that software now makes good progress. For example, some software based on artificial intelligence has managed to attain super-human capabilities in character recognition. Of course, there is a need of a good brain behind good software...

Also, I was reading years ago about a ANN system developed for playing backgammon. The system played against itself, millions of games and it became, according to expert players, a top player itself. And people even managed to learn from it which of some particular alternative moves were better.

I was reading some article that said that about 21% of Forex players with accounts under $1500 are winning. While about 40 something percent of Forex players with accounts above $5000 are winning. That means, there is hope. And the obvious choice to become a more successful player, is to add more money to my shrinking real account :)

Cheers,
Nick
 
Hi Nick --

Please re-read my posts #5 and #9 in this thread.

Forecasting the direction of price change one bar ahead (or something equivalent as a target for the learning) is a very difficult problem. The markets are nearly efficient, the signal to noise ratio is very low, the data is nonstationary, neural networks are prone to overfitting to the in-sample data, etc.

Become very familiar with modeling and simulation techniques. Begin with modeling stationary data (such as the Iris data), then progress to time series, then to financial time series. Each is more difficult than the previous. Whenever you are reading about or working with any technique, pay very close attention to how validation will be done.

The most value the modeler (that is you) can add is clever transformations of the raw input data to produce a highly predictive model. Expect to spend 80 percent of your time in this task.

Begin by reading a lot. The internet for starters.

Enroll in artificial intelligence / pattern recognition classes in Coursera or one of the other online educational programs. They are (mostly) free and university undergraduate and graduate level. If necessary, review the math needed -- mostly linear algebra, some statistics.

Learn to use one of the machine learning libraries that has a neural network component. I recommend the Python language with the Scikit-Learn library. Also free.

Keep in mind that the problem you are working on is modeling, simulating, and forecasting financial time series. Financial time series analysis is an order of magnitude more difficult than analysis of stationary data.

Expect to spend on the order of 5,000 to 10,000 hours in an attempt to develop a usable solution. Do not be too disappointed if you never find one that you have confidence in using for real money trading, even after all that time and effort.

Best,
Howard
 
Hi Nick --

Please re-read my posts #5 and #9 in this thread.

Forecasting the direction of price change one bar ahead (or something equivalent as a target for the learning) is a very difficult problem. The markets are nearly efficient, the signal to noise ratio is very low, the data is nonstationary, neural networks are prone to overfitting to the in-sample data, etc.

Become very familiar with modeling and simulation techniques. Begin with modeling stationary data (such as the Iris data), then progress to time series, then to financial time series. Each is more difficult than the previous. Whenever you are reading about or working with any technique, pay very close attention to how validation will be done.

The most value the modeler (that is you) can add is clever transformations of the raw input data to produce a highly predictive model. Expect to spend 80 percent of your time in this task.

Begin by reading a lot. The internet for starters.

Enroll in artificial intelligence / pattern recognition classes in Coursera or one of the other online educational programs. They are (mostly) free and university undergraduate and graduate level. If necessary, review the math needed -- mostly linear algebra, some statistics.

Learn to use one of the machine learning libraries that has a neural network component. I recommend the Python language with the Scikit-Learn library. Also free.

Keep in mind that the problem you are working on is modeling, simulating, and forecasting financial time series. Financial time series analysis is an order of magnitude more difficult than analysis of stationary data.

Expect to spend on the order of 5,000 to 10,000 hours in an attempt to develop a usable solution. Do not be too disappointed if you never find one that you have confidence in using for real money trading, even after all that time and effort.

Best,
Howard


hi Howard,

Did you just say: The road is long and hard; it will take years of effort and studies, and in the end you might end up with a useless solution.

You don't sugarcoat things do you?

Beats university any day. :D
 
Hi Nick --

Expect to spend on the order of 5,000 to 10,000 hours in an attempt to develop a usable solution. Do not be too disappointed if you never find one that you have confidence in using for real money trading, even after all that time and effort.

Best,
Howard

Well, to be honest, I like to dream that I will find an automated method that will poor money into my pockets, slowly but surely. If this will not happen, at least I will enjoy some of the side benefits of the journey - learning to program better, reading about interesting things, feeling I am a part of a major common goal (similar to the alchemists dream of making gold), etc. It surely beats watching TV six hours a day...
Of course, there are probably better ways to invest many thousands of hours to reap some benefits.

Thanks for bringing me back to earth :)

By the way, the Iris you mention - is it a database of recognition of eye irises?
If that's the case, I read about some databases for character recognition, MNIST, the data in it would also be stationary? Any advantage in using Iris instead of MNIST (http://yann.lecun.com/exdb/mnist/)?
Where can I find details about Iris (apart for looking into a mirror :) )?

Kind regards,
Nick
 
By the way, the Iris you mention - is it a database of recognition of eye irises?
If that's the case, I read about some databases for character recognition, MNIST, the data in it would also be stationary? Any advantage in using Iris instead of MNIST (http://yann.lecun.com/exdb/mnist/)?
Where can I find details about Iris (apart for looking into a mirror :) )?
The iris dataset is just numeric 4 attributes for a sample of 150 iris flowers.
 
Hi Nick --

Keith is correct about the Iris dataset. It is one of datasets often used to demonstrate machine learning techniques.
https://archive.ics.uci.edu/ml/datasets/Iris

If you have not already, you might download and read the free chapters here:
http://www.quantitativetechnicalanalysis.com/book.html

Here are links to some of the many machine learning courses available free online:
https://www.coursera.org/course/ml
https://work.caltech.edu/telecourse.html
http://ocw.mit.edu/courses/electric...ence/6-034-artificial-intelligence-fall-2010/

Note that none of these courses go beyond analysis of stationary data. None discuss time series.

Read about the developments already made in applications of machine learning to trading (among many other fields):
http://www.amazon.com/Rise-Robots-T...1432569128&sr=8-1&keywords=rise+of+the+robots

Best regards,
Howard
 
Hi Nick --

Keith is correct about the Iris dataset. It is one of datasets often used to demonstrate machine learning techniques.
https://archive.ics.uci.edu/ml/datasets/Iris

If you have not already, you might download and read the free chapters here:
http://www.quantitativetechnicalanalysis.com/book.html

Here are links to some of the many machine learning courses available free online:
https://www.coursera.org/course/ml
https://work.caltech.edu/telecourse.html
http://ocw.mit.edu/courses/electric...ence/6-034-artificial-intelligence-fall-2010/

Note that none of these courses go beyond analysis of stationary data. None discuss time series.

Read about the developments already made in applications of machine learning to trading (among many other fields):
http://www.amazon.com/Rise-Robots-T...1432569128&sr=8-1&keywords=rise+of+the+robots

Best regards,
Howard

Hi Howard,

I enrolled in Coursera class. I just installed MatLab. There is also a package called Octave, which can be used in a similar manner and it's free. I may try it as well later.

I checked the book you suggested, I am very interested in reading it.

I came across a great video that explains how the backpropagation works and why we choose the sigmoid function (or similar) instead of a step function https://www.youtube.com/watch?v=q0pm3BrIUFo

Thanks a lot for the advice.

Best regards,
Nick
 
Hi Nick --

Good decisions!

You wrote "I came across a great video that explains how the backpropagation works and why we choose the sigmoid function (or similar) instead of a step function"

One way to think of a sigmoid transfer function is that it is a step function with curved transitions from the linear central section to the upper and lower limits. Sigmoid is a class of functions. One that is particularly valuable in developing trading systems is "softmax."
http://en.wikipedia.org/wiki/Softmax_function

Best regards,
Howard
 
Top