# Strategy Performance using Van Tharp System Quality Number (SQN)



## PolarBear (14 February 2011)

Hi all, 

I've recently read the book by Van Tharp - Definitive_Guide_to_Position_Sizing
and i think it's a great read on the topic of determining how to best select position sizes for your trades. 

I have a question however regarding the system quality number Van Tharp uses to determine if a strategy is good or not.  Basically it is a creative way of working out whether a strategy will perform well long term, based on x amounts of trades.  The premise behind it is that it relates everything back to how much you are risking on each trade, in terms of "R" multiples (where "R" is the amount risked).  You determine your average R profit over all trades, and also the std deviation of these trades.  This is then used along with the number of trades to work out your SQN :

SQN = average of trades in terms of R multiples   /    stddev of r multiple trades    *   sqrt(number of trades).

From Van Tharp, if your SQN > 1.7 you have a system that "statistically" should generate profits. 

I have thought about this topic for quite some time, and really like how he ties in the statistics into this, as I have tried to do in the past.

In his book he infers a problem with the SQN though, for when you have samples over 100.  The problem is that it inflates the SQN result, so one way he suggests to cope with the problem is to just set the trades = 100 when it's greater than 100. 

My problem, is that I have a strategy, tested over a number of different FOREX pairs and over 3 years of data, and I think the equity curve is excellent.
There's about 3000 trades which is an excellent sample size.

When I plug the numbers into the SQN formula though (limiting to 100 trades) the result is no good.  If I use the 3000 trades, the result is TOO good. 

My question overall is, what do I do in this situation when I have 3000 trades and want to work out what the SQN is.

My numbers work out to be :

  avg      std dev     #trades       sqn 
 0.26      2.57          100          1.01 
 0.26      2.57          3149        5.69  

I've attached the equity curve also. 

any ideas appreciated.
thanks.


----------



## tech/a (14 February 2011)

Id take samples of 100 trades over say 20 sets and average it all out.


----------



## howardbandy (14 February 2011)

Hi PolarBear --

Search ASF using my name.  I have made several posts about Dr Tharp's book and his use of SQN.

I like Dr Tharp's work and recommend his books.  We have corresponded, and he has mentioned me on Page 251 of Definitive Guide. 

The System Quality Number is the same as the statistical metric t-test.  T-test Is useful in determining whether a system is working or broken.  

Unfortunately, Dr Tharp gives some poor advice about the use of statistics and about modeling and simulation in general.  If you have more than 100 trades, by all means use the data from all of them, but do not use a value of 100 for N.  When that is done, the metric calculated is no long a t-score and it cannot be interpreted in any consistent way.

The t-test metric can be computed using any number of data points from 2 (the minimum to calculate the standard deviation) to thousands or more.  In general, t-test values greater than about 2.0 (the specific value can be obtained by looking at a table in a statistics book.  It depends on what N is, but not much) indicate that the results are probably (at the 5% level) different than random.  The t-test metric is the ratio of the mean to the standard deviation, multiplied by the square root of N.  In trading systems, the standard deviation is usually 2 to 4 times the mean.  

Say the mean is 1% and standard deviation is 2% for your results of 16 trades, (N =16).  

Then t-score = (1/2) * sqrt(16) which is 2.0.  This tells you that the mean is probably greater than 0.0.  

I have been a little fast and loose with the statistical explanation here.  Apologies to purists.  A more complete description of techniques for statistical analysis of trading systems, including how to tell whether a system is working or broken and what to do about that, is a large part of my next book, "Modeling Trading System Performance."  It should be out by about May 1, 2011. 

Also, be very careful to compute the SQN using only the out-of-sample trading results.  In-sample results have no predictive value.  Using in-sample results will overestimate expected performance and underestimate risk of ruin.

Thanks for listening,
Howard


----------



## PolarBear (15 February 2011)

Hi Howard, 

thanks for the detailed reply. 

From what you are saying then, would I be correct in assuming that my t-score for these set of trades is 5.69
This seems a very good outcome. 

An alternate based on the other reply above, was to consider sets of 100 trades.
If I look at the SQN number across these sets of 100 trades, then average the SQN number, I end up with an average SVN of only 0.9.

Clearly something is amiss here, and I don't feel confident in my understanding of the above to say whether or not the system I have will yield a positive result into the future statistically. 

again, any help much appreciated

thanks


----------



## howardbandy (16 February 2011)

Hi PolarBear --

The SQN can be calculated using any set of data -- actual trades, out-of-sample trades, in-sample trades, or hypothetical trades.  

A score of 3.0 on a set of 20 actual trades would be outstanding.  5.69 is beyond anything that can be expected in real trading.  Be sure you are not fooling yourself.

Thanks,
Howard


----------



## AlterEgo (16 February 2011)

howardbandy said:


> If you have more than 100 trades, by all means use the data from all of them, but *do not use a value of 100 for N*.  When that is done, the metric calculated is no long a t-score and it cannot be interpreted in any consistent way.




Hi Howard,
So what number *should *you use for N in a large sample size, if you're saying that 100 is too large?



howardbandy said:


> A score of 3.0 on a set of 20 actual trades would be outstanding.




Maybe I'm on the right track with my latest trading system then. I've been trading it for 2 months, with 36 trades (real trades) so far. T-test works out to 3.08 by my calculations. Do these figures look alright?

Mean: $214.23
Standard Deviation: $417.21
No. Trades: 36


----------



## howardbandy (17 February 2011)

Hi AlterEgo --

Your calculation is correct.  A t-score of 3.08 shows significance at the level of 0.0025.  Which suggests that your results are unlikely to have come from a distribution with a mean of 0.0 and a standard deviation of 417.

N is the number of data points used in the calculation.  However many data points there are, use that.  The t-test table has no upper limit, so there is no need to artificially limit the value of N used in the calculation.

Thanks,
Howard


----------



## AlterEgo (18 February 2011)

howardbandy said:


> Hi AlterEgo --
> 
> Your calculation is correct.  A t-score of 3.08 shows significance at the level of 0.0025.  Which suggests that your results are unlikely to have come from a distribution with a mean of 0.0 and a standard deviation of 417.
> 
> ...




Thanks Howard. I obviously misunderstood what you said earlier.

I'm puzzled though (as I think the OP is too) that as you increase N to very high numbers (100's or even 1000's) that the t-score will become a very large number indeed! Please correct me if I'm wrong, but as I see it, as you do more and more trades the mean and the standard deviation should not change all that much, but N will increase quite a lot. This would make the t-score work out to quite a large number. Eg. If I could maintain a similar mean and standard deviation as above over say 200 trades (and I don't see why those values would alter to a great degree - maybe I'm wrong in thinking that), the t-score would then calculate out to 7.26, but as you just said above, values that high are practically impossible in real trading, so how does that work?

thanks


----------



## AlterEgo (18 February 2011)

Continuing on from my post above, when I look at my back-test trade results, and I look at a smaller sample size of say 50-100 trades, I get a similar t-score to what I've got in real trading. But if I include a larger number of trades in the calculation (closer to 1000) then the t-score becomes way too high (> 8), which as you said is not possible in real trading. So is it really realistic to include very large numbers of trades in the t-test calculation, since the t-score seems to be so highly influenced by the value of N?

thanks


----------



## AlterEgo (18 February 2011)

Hi again Howard,

So just thinking this through, assuming that it's unlikely for my t-score to get much larger than 3 or so in real trading, I should therefore expect my mean to decrease, and/or my standard deviation to increase the more trades that I do. Would this be correct?


----------



## howardbandy (18 February 2011)

Hi AlterEgo --

The value of the t statistic comparing two means, under the assumption that the standard deviations of the two data sets are equal, is calculated as:

n = number of observed data points.

numerator = (mean of the n observed data points - mean of baseline or hypothesized data) 
denominator = standard deviation of the n observed data points

compute the quotient, then multiply that result by the square root of n (use n-1 if n is small).  The result is the t statistic. 

If you have a baseline or hypothesized set of data and you want to test whether the mean of the observed data is different than the mean of the baseline data, put that mean into the numerator.  Otherwise, to test to see if the observed results are different than random, put 0.0 in for the mean of the baseline data.  Then the calculation for the t statistic becomes:

t = (mean / stdev) * sqrt(n).

Using whatever value of t came out of that calculation, go to a table that gives "the critical values for the t statistic".  It will have rows and columns.  Usually there is a row for each value of n, starting at 1 and going up to infinity.  At about row 30, it will change from counting by 1 to counting by 5.  Somewhere around the row for n = 200, the next and final row will be for infinity.  You will see that the values in the columns change quite quickly for small values of n, but not very much for large values.  Actually, you need to subtract 1 from the number of data points to get the n that is used in the table -- it is called the number of "degrees of freedom."  This matters for small n, but becomes insignificant for n greater than about 30.

The columns each contain the critical level of the t statistic for some "level of confidence."  Depending on whether you are testing for "different than" or "greater than", you will use a "two-tailed" or "one-tailed" test.  The quick and dirty rule of thumb I mentioned was to look for the t statistic to be greater than about 2.0.  You will see the values in the columns are around 2.0 for level of confidence of around 0.05.  The numbers decrease as n increases.

Look for the entry in the table that represents the test you want to establish a confidence limit on.  Go down to the row corresponding to n, across to the column for the number of tails and level of confidence.  Compare the t statistic you computed to that number.  If your computed number is greater, you have that level of confidence that the means are different.  The formal language can be confusing, so I'll say "different."  Purists will immediately complain and I agree with them.

The point is, given a ratio of mean to standard deviation for some n, you can compute a t statistic.  If you have that same ratio, but greater n, you can still compute a t statistic and it will be a bigger number.  The bigger the t statistic, the higher the level of confidence.  

As the level of confidence increases, you can think of that as being more certain that the mean you measured is different than either 0.0 or the baseline mean, whichever you used.

There are formulas for putting confidence limits on the mean of your observed data.  To do this, you can use either the confidence level you just looked up or a table of the "standard normal distribution."  But that is for another posting.

Back to your question ----
If you do a small number, say 10, trades and have mean to standard deviation ratio of 0.50, then continue on and do a large number, say 300, and still have the same ratio of 0.50.
The t statistic will be 1.50 for the 10 data points.  0.50 * sqrt (9).  Nine rather than 10 because one data point was "used up" in estimating the value of the mean that was used to calculate the standard deviation.
The t statistic will be 8.64 for the 300 data points.  0.50 * sqrt(299).  

When we go to the t distribution table, using the row for n = 9, 1.50 is significant around the 80% level.  Usually not a high enough level to trust.  So we do not conclude that the mean is different than 0.0.  That is, we could have observed the 10 data points coming from a random distribution with a mean of 0.0 in about 1 out of every 5 series of selecting 10 data points from a normal distribution with a mean of 0.0 and the same standard deviation that was calculated.

When we go to the t distribution table using row = infinite, 8.64 is significant at the highest level shown in the table.  We can conclude that it is extremely unlikely that the 300 data points came from a normal distribution with a mean of 0.0.

For a given ratio of mean to standard deviation, increasing the number of data points increases our confidence in the measurement of the mean of those data points.

If you really are getting numbers that high, great.  My caution is that is extremely rare.  If the data used came from truly out-of-sample results, you have a great system and you will soon own all of Melbourne.  If you want to know how to to trade this system to maximize your wealth without going bankrupt, look for my new book, Modeling Trading System Performance -- everything you need to know is all explained in detail.  Or call me.

But a t statistic that high is usually the result of an error (such as a future leak in a trading system) or repeated testing / modifying until the logic has been fit to the data and the results are no longer out-of-sample.  If this is the case, you will have a good idea in about 10 more trades, and know with fair certainty in 20 more -- details also in MTSP.

I have probably exceeded my quota for the day.

I hope this helps,
Thanks,
Howard


----------



## AlterEgo (18 February 2011)

Thanks for the explanation, Howard.

Yes, you’re quite correct - the value of >8 was using the entire data set, ie. including the in-sample data, just to illustrate what I was saying about the effect of the size of N. Don’t worry, I’m under no illusions that I’ll get a value as high as that in the real world! The 3.08 is from true out-of-sample data though – ie. real trades.

My back-test results have been very encouraging, but I know I can never be absolutely certain of what results I’ll actually get in the real world unless I trade it for real, which is what I’m doing now. At the moment, all I can say is that after 36 real trades, the results are, so far, in line with what I’ve seen in back-testing. Whether that remains to be the case, after many more trades, remains to be seen. I've just got another 6 trades in the last couple of days - all are in profit at this point in time, but haven't exited yet. We'll see how they go.

I won’t own Melbourne any time soon though – the rate of return isn’t high enough for that!!! A high t-score doesn’t necessarily mean a high % return.

thanks


----------



## R35 (8 March 2011)

PolarBear said:


> Hi all,
> 
> I've recently read the book by Van Tharp - Definitive_Guide_to_Position_Sizing
> and i think it's a great read on the topic of determining how to best select position sizes for your trades.
> ...




AlterEgo,

A good way to compare models when your trading volume is high is to use the "number of trades per year" instead of the fixed number of "10" - that way you can then compare apples vs apples from a system perspective and not penalize the system due to frequency...

Frequency of trading is probably more valuable that expectancy( assuming it's above 0 and stays above 0).


----------



## cipherscribe (26 January 2014)

I know this is a super old thread, and apologies for the resurrection. I found it when I googled about criticism about the SQN. I am just reading through Van Tharp's work on money management now, and was wondering about another issue that reminded me of the criticism of the Sharpe ratio that brought about the Sortino Ratio.

Van Tharp himself describes how the SQN went down when he added a few high _*Positive *_R multiples, and discussed that it does make sense why this would happen.

However, in the real world, deviations of a positive nature should not produce poorer system statistics. The ability of a system to produce a 10R reward does not necessarily imply that its possible to have a _*drawdown *_of 10R.

Is there anything that can be done to improve this SQN number to reduce the negative effect of large wins on the ratio? Or does tharp talk about this further in the book? Or should I just go and grab Bandy's book? 

Cheers!

Adrian


----------

