Hi Craft --
I began to thread. With the simple intent of pointing out techniques that some of our competition in trading is using. The ensuing discussion went well for many postings. Every civil question addressed to me, or answerable by me, was answered. Then began both going considerably off-topic, and involving personal attacks. The purpose I had was complete.
I had no idea the first post would be interpreted as "my way, nothing else will work." My apologies. How should that material have been presented?
I am open to additional civil on-topic discussion. What more, related to the first post of the thread, should be discussed?
Best, Howard
As I write in my "Foundations" book, the days of chart reading, long term holding, and simple trading algorithms are over. The business of trading is changing with astonishing speed. It is now about applied mathematics, machine learning, Bayesian statistics. Traders without skills in math, programming, statistical analysis, and scientifically developed trading techniques are at a severe disadvantage. Stephen and his colleagues will "eat the lunch" of unprepared traders.
This is obviously the part from first post most of us had problems with. I've bolded the parts where the wording made it pretty clear to us you are trying to stat facts, not opinion. Facts with nothing backing them, I've already posted 2 indexes (1 is from Fundseeder other from Barclays) of accumulation chart of real world traders where discretionary outperformed.
Most of the fireworks then started when Tech started hammering away that it's gona take over the trading and use it or become obsolete (fossil was the exact world he used).
and what I do is clearly a trader by your definition (as is nearly everybody) so take some interest in this thread which otherwise I may have left alone.1. Trading is buying and selling financial instruments with the intent of selling at a higher price than buying. Investing is buying and never selling. My comments apply to trading.
To my point.
people can post systems traded/paper traded/ test results only and you'll have a great number of other people who wont be happy with one condition or another.
But I agree results are great to see---but they are the results of the user not you
See I've done it myself!!!
You talk about the scientific method of learning then validating. The more standard definition would be hypothesizing and then attempting to invalidate.
How do your assertions in this thread fit with the traditional definition of scientific method.
Do you believe an edge in trading rests with finding some non-linear relationship on a combination of obvious and obtuse data? That's what ML does best.
DeepState the problem that Joe Average Trader is going to face is they are likely to apply ML to standard backtested systems or variations not that far from the standard decision tree approach on single instruments or at most a portfolio of EOD or min bar data. Then all ML is going to do is hyper optimise it. Therefor rendering the result useless in forward test.
The reason why JP Morgan has many thousand python developers is because of the breadth of data their systems run on. Large data is what ML is all about. For example American futures exchanges generates over 100 billion order messages each day and the stock markets billions more day in day out, thats just crazy numbers. With 1.7 trillion under management there is value for them in large amount of data. I cannot imagine a trader having the resources to data mine exchange wide tick and quote data to build a ML system from. Then at the other end I'm yet to see a 'simplish' system that has been improved by ML. It doesn't find hidden patters in small data sets. It over fits it.
That's a goodie.
Just some observations:
In the past, some one used their fingers to add 2+8 to arrive at ten. Your PC can do around 10^9 times that now or more.
In the past, a theory was created when someone saw something interesting in their natural setting and put forward an idea. Some could be tested, some were inherently unfalsifiable.
Now, your PC together with a decent financial engineering package and a decent data source, bolted together with some machine learning methods, will produce a large number of theories which fit the data to a level that you specify. Large being arbitrarily big....but big.
Computational power has increased the speed at which we can perform calculations. Machine learning and datamining/datascience has increased the speed at which we generate theories which 'fit' the data or other supervision criterion.
Do you believe an edge in trading rests with finding some non-linear relationship on a combination of obvious and obtuse data? That's what ML does best. A normal fundie/trader has access to heaps of data so, just for this purpose, let's assume a level playing field except for ML algos which can find heaps of relationships that the other traders would not be able to generate quite as quickly.
Is there an edge to be gained for having a special calculator which trawls over the data you already have to find relationships you would not have been able to find in a reasonable time frame? If so, how significant might it be?
No amount of sophistication is going to allay the fact that all of your knowledge is about the past and all your decisions are about the future.
– Ian Wilson (former GE executive)
I think it comes down to the insight being used. Does the idea produce an edge for reasons you can understand and which can be expected to prevail in the future? If so, I don't care much how you got it. Whether you came up with it in the bath or ran a CART over it, if it makes sense, it's valid to use (theory, empirical, sniff test etc etc).
I can believe this. I once watched an interview with James Simons from Renaissance Technologies and he mentioned how much data they process every day. Truly mind boggling.DeepState the problem that Joe Average Trader is going to face is they are likely to apply ML to standard backtested systems or variations not that far from the standard decision tree approach on single instruments or at most a portfolio of EOD or min bar data. Then all ML is going to do is hyper optimise it. Therefor rendering the result useless in forward test.
The reason why JP Morgan has many thousand python developers is because of the breadth of data their systems run on. Large data is what ML is all about. For example American futures exchanges generates over 100 billion order messages each day and the stock markets billions more day in day out, thats just crazy numbers. With 1.7 trillion under management there is value for them in large amount of data. I cannot imagine a trader having the resources to data mine exchange wide tick and quote data to build a ML system from. Then at the other end I'm yet to see a 'simplish' system that has been improved by ML. It doesn't find hidden patters in small data sets. It over fits it.
Craft
Why does data science have to be looking for predictive ability?
As Howard has said all he cares is there is movement
If you have a look at the way he trades and he spelled it out
It makes sense
If you could profit from movement regardless of it being long or short
Wouldn't that be optimal?
Greetings --
The short answer is that the model overfit the in-sample data during learning and does not fit the out-of-sample data.
Some explanation might be helpful.
The system development process is classical learning using the scientific method. I do not understand why, but the trading community has been super slow to recognize that systems to predict stock direction (trades) are similar in almost all respects to systems such as those that predict loan default. It is critical that the data processed for prediction has the same distribution, with respect to the signal being identified, as the data processed for model fitting.
To develop a system that will predict whether a borrower will repay a loan, the lender gathers data that hopefully has some predictive value from a large group of customers, some of whom repaid and others whom have defaulted. If conditions for the period the data represents are relatively constant, the distribution of any randomly chosen subgroups will be the same as the distribution of the entire group and, importantly, of future customers. With respect to loan repayment, the data is stationary. The future resembles the past.
The data scientists develop the model that goes with the data by selecting a random subsample of the data and fitting the rules to it. This is the "training" data and fitting is the learning process. This is "supervised" learning -- each data point has values for the predictor variables (income, job history, etc) and the loan repayment -- the target -- is known. The fitting process is a mathematical process of finding the best solution to a set of simultaneous equations. aX = Y. Where X is a large array of values for the indicator variables and y is a column array of values for the target. The model is the array "a" -- the coefficients of the solution. There will be a solution -- there will be an "a" array -- whether the model has predictive capability or not.
The model -- the rules -- may have identified one or more important features in the data that are consistently associated with probability of repayment. But the developer cannot tell by looking at the learning process. He or she has reserved a separate subsample that is not used at all in learning -- call it the "test" data. As a one-time test, the model is applied to the test data. The value of the target is known, but is only used for later reference. A predicted value for each test data point is computed using the a array. Comparison between the known target values and predicted target values lets the person building the model know whether the model learned general features or just fit the particular data it was given -- including all the randomness.
The scientific method insists on two phases -- learning and testing. Without independent testing, nothing can be said about the model or its predictions.
The trading system development profession has largely ignored the scientific method. Independent, one-time, testing using data that follows the training data and has not been used in development is seldom done. Trading system developers see the equity curve from the training portion, and assume completely without justification, that future results will be similar.
That may be true, but in order for it to be true, two conditions must hold.
1. The future must resemble the past. That is, the data used for learning and the data used for testing / trading must have the same distribution (with respect to the signal). This is stationarity.
2. The model must learn real, predictive signals rather than simply fit to the randomness of the data. This is learning.
When building models for stationary data such as loan repayment, some model algorithms produce in-sample results that can be used to give estimates of out-of-sample performance, while other model algorithms always overfit in-sample and have no value for estimating out-of-sample performance. Out-of-sample testing is always required before estimating future performance.
Back to trading.
A system that works in-sample has fit a set of equations to a set of data. Whether there is true learning or not, there is always a solution -- a set of trades and an associated equity curve. Most are bad and are discarded. We test so many combinations that eventually one fits and the result looks good. It might be simply an overfit solution or it might be a representation of a truly predictive system. We cannot tell anything about the future performance without testing never-seen-before future data. When results of this out-of-sample test are poor, it is because one or both of the two conditions do not hold. Either the data is not stationary beyond the time period of the learning data; or the model has not learned.
I hope this helps.
Thanks for listening, Howard
I don't think he has proven his case on this paragraph. Nor have I disproved his opinion, but I am mighty happy to continue using qualitatively judgment of fundamentals as a basis for investing/trading over longer term holding periods, in spite of his prediction and if I or anybody else succeed using alternative methods to what that paragraph advocates then on a traditional scientific basis - his hypothesis is disproved.As I write in my "Foundations" book, the days of chart reading, long term holding, and simple trading algorithms are over. The business of trading is changing with astonishing speed. It is now about applied mathematics, machine learning, Bayesian statistics. Traders without skills in math, programming, statistical analysis, and scientifically developed trading techniques are at a severe disadvantage. Stephen and his colleagues will "eat the lunch" of unprepared traders.
No amount of sophistication is going to allay the fact that all of your knowledge is about the past and all your decisions are about the future.
– Ian Wilson (former GE executive)
You are correct that there is a great deal of variability in tomorrow's price. Your approach to trading is probably different than mine. I do not need to know tomorrow's specific price. Up or down is sufficient for many models. Greater than or less than is sufficient for others.
Hi Minwa --
Let me understand. You are willing to be the counterparty for any trades I want to make for two weeks?
If I Buy ES or SPY at today's close, you will Sell. My gain (or loss) is your loss (or gain)?
My trades are typically:
A. Market-on-close issued within a few minutes of the close. They may hold for a day, or they may be closed out in the after hours session at profit limits. They may be for futures, ETFs, or options on either.
B. Limit orders to be filled intra-day at what my algorithms estimate to be extreme prices, with exits MOC or limit.
I will pass your offer. But thank you very much.
Best, Howard
I don't know made sense to me.
Up or down is sufficient for many models. Greater than or less than is sufficient for others.
I have absolutely no issue with how Howard advocates monitoring a system - I think its sound advice. I do have an issue with .
I don't think he has proven his case on this paragraph. Nor have I disproved his opinion, but I am mighty happy to continue using qualitatively judgment of fundamentals in spite of his prediction and if I or anybody else succeed using alternative methods to what he advocates then on a traditional scientific basis - his hypothesis is disproved.
If you can not see the need for prediction here then I can not fathom how a duck comprehends things
- The chance of an aspiring trader discovering a magical edge using advanced computational techniques are perhaps no better than him/her discovering an edge in any other alternate method.
In fact I would go one step further and say any easily backtestable strategy that you can think of has been done to death by hedge funds already. The ease of access of global markets allows the best/fastest guy in the room to steal everyone's lunch around the globe.
P.S. wouldn't it be hilarious if all Ren Tech did all these years was to play simple breakout/breakdown trades across the entire stock universe... rather than go deep they just go super wide
Your response re: JP Morgan suggests that there is some belief that ML does have benefits, even though it might be restricted primarily to those with the true savvy??
This! Be it charting/quantopia/stocktwits/Kosec report/amibroker its the same old story.
In fact I would go one step further and say any easily backtestable strategy that you can think of has been done to death by hedge funds already. The ease of access of global markets allows the best/fastest guy in the room to steal everyone's lunch around the globe.
P.S. wouldn't it be hilarious if all Ren Tech did all these years was to play simple breakout/breakdown trades across the entire stock universe... rather than go deep they just go super wide
Your logic runs in the face of all Fundamental analysis as well.
If retail can spot an under valued stock so too can the bigger players and the were doing it before computers as was everyone else.
What has been highlighted is that edges are harder to define and find---and keeping any edge for any length of time harder again.
We use cookies and similar technologies for the following purposes:
Do you accept cookies and these technologies?
We use cookies and similar technologies for the following purposes:
Do you accept cookies and these technologies?