The Use of Statistics in Making Investment Decisions

Disclaimer: This article contains information that was factual and accurate as of the original published date listed on the article. Investors may find some or all of the content of this article beneficial but should be aware that some or all of the information may no longer be accurate. The information and/or data in this article should be verified prior to relying on it when making investment decisions. If you have any questions regarding the information contained in this article please call IFA at 888-643-3133.


Throughout the IFA Website, there are many assertions regarding the expected returns of different asset classes, and there are many warnings against attributing outperformance to active managers based on a few years of hot returns. One could easily make the mistake of thinking that IFA is merely expressing opinions without any mathematical support. In fact, however, nothing could be further from the truth. IFA has always relied on the scientific method of statistical analysis.

Perhaps the single most important application of statistics lies in the realm of hypothesis testing. Virtually all of the experimental and observational scientific studies (across all fields) take the approach of proposing a "null hypothesis" and then either rejecting or failing to reject the null hypothesis, based on the probability of making the recorded observations, assuming that the null hypothesis is actually true.

The specialty within economics known as "asset pricing theory" lends itself quite readily to hypothesis testing because for every asset (or asset class), there is usually a set of historical returns that can be tested.  A common test would propose the null hypothesis that the expected return of the asset class is no different from the risk-free rate of return. Given a series of historical returns, we can calculate a parameter known as the "t-statistic" which will give us a quantitative indicator of the probability of observing these returns under the assumption of the null hypothesis. A higher value of the t-statistic indicates a lower probability of the occurrence of the observed returns, and vice versa. The t-statistic is commonly used to construct a 95% confidence interval around the observed mean, and if this confidence interval does not contain the value assumed in the null hypothesis, then we can reject the null hypothesis with a 95% level of confidence. Generally, a 95% confidence level is associated with a t-statistic of 2.

The value of the t-statistic depends on 3 separate parameters: The number of observations (N), the average of the observations, and the standard deviation of the observations. The t-statistic is directly proportional to both the square root of the number of observations and the average of the observations. It is inversely proportional to the standard deviation, so the more volatile the asset class, the less likely we will be able to draw firm conclusions.
To see exactly how the calculation of a t-statistic operates, we will use the returns for the last five calendar years of the top three mutual funds used in 401k plans (according to )
The three funds to be evaluated are:

  1. AGTHX - American Funds Growth Fund of America (benchmarked against the S&P 500 Index)
  2. PTTAX - PIMCO Total Return Fund (benchmarked against Barclays Aggregate Bond Index)
  3. AEPGX - American Funds EuroPacific Growth Fund (benchmarked against the MSCI EAFE Index)

The results are summarized below: (1/1/2006 to 12/31/2010)

Fund Average
Excess Return
T-Statistic # of Years Needed to get a T-stat of 2
AGTHX 0.73% 5.60% 0.29 235
PTTAX 1.82% 3.38% 1.20 14
AEPGX 2.54% 4.96% 1.15 15

For all three of these funds, we are unable to reject the null hypothesis that their expected returns are no higher than their benchmarks at a 95% confidence level. 401k Plan sponsors who have incorporated these funds based on short-term performance have no right to be surprised if the future returns captured by plan participants fall short of expectations. The chart below shows different sample sizes needed to get a T-stat of 2.

If you have your own set of returns data and you would like to see how many years you would need to get a t-statistic of 2, the calculator below can do that for you.


Furthermore, if you wish to calculate the t-statistic for your returns data, plese use the calculator below.


In the next example, we will use the t-stat calculation to evaluate an asset class that has seen an enormous increase in popularity in the last few years, commodities. Using 19 years of calendar year returns data for the Dow Jones UBS Commodity Index, we will test the hypothesis that the expected return of commodities is no different than the risk-free rate (One-Month T-Bills).

1/1/1992 to 12/31/2010 (19 years)

  Average Excess Return Standard Deviation T-Statistic # of Years Needed to get a T-stat of 2
DJ UBS Commodity Index 5.20% 18.94% 1.20 53

Again, we are unable to reject the null hypothesis at a 95% confidence level, so those investors who have poured their hard-earned money into commodities (or commodity futures) might be setting themselves up for bitter disappointment.

Lastly, we will examine IFA's statistical justification for tilting a portfolio towards the factors of small cap and value. Based on 83 years of data, we will test the hypothesis that the expected return of the IFA US Small Cap Value Index is no different than the IFA US Large Company Index.

1/1/1992 to 12/31/2010 (19 years)

  Average Excess Return         Standard Deviation T-Statistic
IFA US Small Cap Value Index 5.29% 16.64% 2.90

Since the t-statistic is greater than two, we are able to reject the null hypothesis and conclude that small value has a higher expected return than the large blend segment of the market. In this case, however, we can offer an explanation that small value stocks are riskier and thus should carry a higher expected return. In general, no conclusion should ever be drawn from data alone, because as we all know, if the data is tortured for long enough, it will confess to anything. It is crucial to have a sound explanation for the observed data.

Although it is possible to find actively managed funds that have shown outperformance with a t-statistic greater than 2, IFA strongly cautions investors against throwing their money at these managers even when there appears to be a statistical justification for doing so. The reason, quite simply, is that since there are thousands of active managers, by chance alone, we expect to see some that have outperformed their benchmark after expenses. The problem is that the number that we actually do see is, in fact, no higher than what we would expect from chance alone (i.e., it is no higher than what we would observe if all active managers were monkeys throwing darts at the Wall Street Journal) . This means that when an active manager appears to exhibit outperformance, there is no reliable way to determine if it was due to luck (i.e., a false positive) or skill. Two papers that elegantly address this point are:

  1. "False Discoveries in Mutual Fund Performance: Measuring Luck in Estimating Alphas" by Barras, Scaillet, and Wermers which evaluated 2,076 fund managers over 32 years and found  that 99.4% of active fund managers showed no genuine stock-picking ability.
  2. "Luck versus Skill in the Cross Section of Mutual Fund Alpha Estimates" by Fama and French which evaluated 819 actively managed funds over 22 years and found that 97% could not be expected to beat a risk-appropriate benchmark.

IFA has always encouraged investors to obtain as much education as possible so that they can make informed decisions. Most investors will find that having a good understanding of statistics is incredibly helpful. Whenever they come across an advertisement such as "Fund XYZ beat its 5-year Lipper average", they would do well to ask, "What is the t-statistic behind that number?" Odds are, it will not be included in the advertisement, and investors should not waste their time or their money on such spurious claims.