Howard Bandy "Quantitative Trading Systems" & systems design Q&A

11bblandin · 1 December 2015

I guess my question pertains more to trading systems driven by machine learning, but can also apply to rules-based systems that use chart patterns as setups. Since all past information must be contained in each data point if you wish to include indicator values or candle characteristics from prior bars, won't this affect the accuracy of a monte carlo analysis?

If (as in the book) you use RSI values from the three most recent bars as inputs for a model, should we attempt to retain a similar pattern in a MC simulation? If the prediction generated by the model depends on all of these inputs, then I would think there is inherently some serial correlation in signals to be short or be flat. For instance, if a model determines that a sequence of three days with rising (or falling) RSI values is predictive of a price increase or decrease, shouldn't that be taken into account? A MC simulator will likely sample three consecutive days that don't display the characteristics that would have generated a long or flat signal in the first place.

I still think that MC analysis is extremely useful in examining confidence levels of future outcomes, just wondering if anyone has proposed a way to overcome this, or if it's even significant. I haven't found anything

howardbandy · 2 December 2015

Greetings --

There is a fundamental difference in the approach the developer takes when using traditional trading system development versus using machine learning development.

When using a traditional trading system development platform to build a formula-based trading system, the indicators and rules are written and interpreted first, these generate buy and sell signals, resulting in trades, which are analyzed. In short -- examine an indicator, see what happens later.

When using machine learning to build trading systems, a set of data that includes both indicator values (predictor variables) and buy and sell signals (target variables) is created, with the buy and sell signals formed so that the resulting trades are the ones the developer would like to take. All of this data is passed to the machine learning / pattern identification toolkit where a set of rules or equations are formed that relate the indicator data to the signals. In short -- desirable trades first, then see what happened earlier.

The machine learning toolkit, such as scikit-learn, has a large number of model templates that can be used to fit the indicator data to the signals. One of those models is decision tree. Others include support vector machine, linear regression, neural network, and many more. Traditional trading system development platforms (almost always) use the decision tree template for the model. Decision trees are tree-like sequences of if-then-else statements guiding the computations to "leafs" that have values of Buy or Sell. They have the advantage of being easy to understand. As evidenced by the profitability of traditional systems, decision trees can produce excellent trading systems. One of their weaknesses is a tendency to overfit during training / optimization. This results in them being overly fit to the in-sample data with poor performance out-of-sample.

The best validation for traditional platforms is walk forward. The out-of-sample test data is more recent values of the same data streams used to develop the system. The indicators and rules used are those ranked as "best" by objective function value of all alternatives tested over the in-sample period.

Monte Carlo techniques are typically not used in model development when using a traditional platform. They may be used in guiding a non-exhaustive search for the best solution, but that is only to shorten the search and avoid an exhaustive search.

Walk forward validation is also used for systems developed using machine learning, and in precisely the same way. There are three data sets used for machine learning development. In additional to the out-of-sample validation data, during the learning process, and before the validation step, there is a division of the in-sample data into two sets -- training and testing. The training set is extensively examined to determine the formulas or patterns. The testing set is used to guide the training. It is common to use random selection of data elements to divide the whole set of learning data into training and testing. That can be thought of as a Monte Carlo operation. There can also be guidance to the search using non-exhaustive methods, similar to those used for the traditional platform.

--------------

The most important use of Monte Carlo analysis -- whether the system has been developed using traditional platform or machine learning -- is estimation of risk and profit potential. This occurs after the system has been validated. It is a separate step that uses the out-of-sample trades as its input data, and produces risk and profit metrics as its output.

-------------

It is true that data points passed to the machine learning process must be self contained and independent of each other. That means that historical data that the developer thinks might be valuable as a pattern variable, such as yesterday's RSI value, must be explicitly computed, stored, and included as a variable. (Monte Carlo is not used in this operation.)

A traditional system might appear to have just a few variables -- RSI lookback length, buy level, sell level. While the equivalent data file for the machine learning process probably has more -- RSI today, RSI yesterday, etc. They are called predictor variables. The traditional development implicitly uses previous data through builtin functions that refer to yesterday's value, such as Cross. It can do this because the time series data continues to be available during model development.

It is tempting to compute and include a large number of predictor variables, hoping that the machine learning routine can sort out which are important. That is usually an unreasonable expectation. A smaller number of predictor variables (perhaps a maximum of ten or so) should be determined either by the developer using his or her experience (his or her "domain knowledge" -- in the jargon), or by a using a subset chosen by a search.

---------------

Best regards,
Howard

James Woods · 26 October 2016

Hi Howard,

Big fan of your books, I find them very insightful and they have dramatically changed my approach to trading.

I'm not sure if this is specifically something you can help with, I'm using Amibroker along with FTSE350 EOD equity data to run some swing trading systems that I use on Australian & U.S. equities - I want to make sure they are robust enough to work across global markets.

Unfortunately I cannot find a data provider that covers Europe in the same way that Norgate covers the U.S. & Australia. I have all the historical data and assuming I can obtain a list of which equities were included in the index in a particular year, is there a way you can recommend in terms of creating your own watch lists to check whether a specific equity was included as an index constituent at a specific time.

I am also aware that this would be a very burdensome process to account for ticker changes & mergers etc, unfortunately until I find a data provider that has a paid services covering this I feel it's my only option.

Many thanks,

James

tech/a · 26 October 2016

Clearly I'm not Howard.
But we get our data from here.

https://www.tickdata.com/

James Woods · 26 October 2016

tech/a said:
Clearly I'm not Howard.
But we get our data from here.

https://www.tickdata.com/

Hi Tech/a

Thank you for you quick reply, I've had a look at tickdata. It certainly has more information than I need (only looking for EOD data) and the cost is too high unfortunately for me to justify given my account size.

Appreciate the suggestion though,

James

howardbandy · 26 October 2016

Hi James --

Thanks for the kind words.

I am not familiar with data for European markets.

I'll second Tech/a's recommendation of tick data, providing it has the markets you want. My experience with them is dated, but has always been good.

I am a big fan of Norgate. Although they do not provide the data you need, they might be able to point you to a vendor who does. Or they may have a product available in the future. You might check with Richard Dale, one of the principals of Norgate.
http://www.premiumdata.net/

Another possibility is Quandl.
https://www.quandl.com/collections/markets/global-markets-overview

Best regards,
Howard

James Woods · 26 October 2016

howardbandy said:
Hi James --

Thanks for the kind words.

I am not familiar with data for European markets.

I'll second Tech/a's recommendation of tick data, providing it has the markets you want. My experience with them is dated, but has always been good.

I am a big fan of Norgate. Although they do not provide the data you need, they might be able to point you to a vendor who does. Or they may have a product available in the future. You might check with Richard Dale, one of the principals of Norgate.
http://www.premiumdata.net/

Another possibility is Quandl.
https://www.quandl.com/collections/markets/global-markets-overview

Best regards,
Howard

Hi Howard,

Thank for your response, I'll get in touch with Norgate and find out - if there are any positive developments I'll be sure to let everyone know.

Many thanks,

James

obiwanchernobyl · 24 January 2017

Hi Howard

Long time reader; first time poster. Thoroughly appreciate all the work you've put into your books.

My question is around opportunity cost of trading (i.e. is it worth the effort). I was wondering if you may give some indication of return levels achieved through your methods? perhaps a range, either by you or people you have consulted with that adhere to your methods.

Or is the answer as simple as saying that if you can continually develop systems with a CAR25 of 5% for 1-year Monte Carlo risk-normalised simulations, then after 10 years you may expect at least 7.5 of those years to return greater than 5%? or to put more frank, if CAR90 was continually 50%, you may expect 1 of those years in 10 to return 50% or more?

I'm just trying to justify the effort to myself of what I know will be hard work and was hoping for a little motivation—preferably evidence based.

Thanks
Calen

obiwanchernobyl · 24 January 2017

Apologies—just found the cdf's on page 11 of the Tools Being Used By Your Trading Competition thread, with the ML example showing CAR25 of 11.1%. This appears to be a reasonable answer to my question.

Do you have any examples of cdf's for non-ML systems that have been, or are actually, traded? (not just the examples in the books).

Again, just after motivation of what's possible with some (more) hard work.

Thanks
Calen

Howard Bandy "Quantitative Trading Systems" & systems design Q&A

11bblandin

howardbandy

James Woods

tech/a

No Ordinary Duck

James Woods

howardbandy

James Woods

obiwanchernobyl

Member

obiwanchernobyl

Member

Similar threads

Howard Bandy "Quantitative Trading Systems" & systems design Q&A

No Ordinary Duck

Member

Member

Similar threads

Privacy & Transparency

Privacy & Transparency