T Stat Curve

The Active vs. Passive Debate's Missing Link: Statistical Significance of TWO Time Periods

T Stat Curve

We live in a world that is neither black nor white, but mostly grey. Uncertainty is among us in almost all of our daily actions and while we are constantly looking for order among the chaos, it is something that we just end up accepting.

In a world of no certainty (other than death and taxes), we must be able to speak the language of statistics in order to describe or estimate the variables in our universe. Essentially, statistics can help us describe what we “expect” to happen, i.e. the odds. For example, there is a 60% chance of rain today, a 51% that the S&P 500 is going to be positive today or a 77.5% chance it will be positive over the next 12 months.

Uncertainty obviously has its role in the wide world of investing. We don’t believe that it is a very bold statement to suggest that return on an investment is always joined at the hip with risk. Most investors who experienced the deep decline in stock prices in late 2008 to early 2009 understand this concept very well.

Where we believe statistics are seldom mentioned is in the debate of active versus passive investing. Most arguments are framed in the “made-up” world of black and white, leaving most investors scratching their heads on whether they should buy index funds, strategic beta funds, active funds, some of each, etc. So it is not to say which approach is going to outperform the other, rather, it is to say, what are the odds that one approach is going to outperform the other based on all available information or data? And, as a plausible next question, are you willing to stake a substantial amount of money on it based on those odds?

Most serious practitioners of investment management, as well as most academics, agree that it is virtually impossible to consistently beat a properly measured benchmark over time, especially when considering the possibility of luck or illegal activity as the reason for outperformance. Financial magazines will always provide highlights of the top money managers in any single calendar year and Morningstar (one of the most highly cited rating systems for mutual funds) is constantly organizing managers by performance or stars as to insinuate that their performance rating system can be useful information for investors. But once again, there is no real statistical significance test to evaluate the claims made by these organizations.

Here is an example:

The Fidelity Contrafund was Morningstar’s 2007 Domestic Equity Manager of the Year, outperforming its Morningstar's estimated benchmark (the Russell 1000 Growth Index) by 7.97% in that year. Wow! Sounds like a great opportunity right? Well, depends on how you look at it. In the world of black and white, this sounds great. The Fidelity Contrafund is better than buying a low cost large growth index fund. But let’s approach this from the view of reality, or the world of “grey.” What are the odds beyond a reasonable doubt that the Fidelity Contrafund is going to be such a great investment that I am willing to commit a significant amount of my savings to it? If an investor were to buy the Contrafund after it’s stellar performance in 2007, they would have experienced the substantial underperformance the fund had just 2 years later (-7.98% versus the Russell 1000 Growth Index).

The point is that basing our decisions on one year of performance will surely set us up for disappointment. In academic and statistical terms, basing conclusions off of a sample size of one will get you laughed out of the room. In order to have confidence that we have hired the right manager for our portfolio, we must see high excess returns that are also consistent over an acceptable period of time. The chart below shows the relative over and underperformance of the Fidelity Contrafund versus the Russell 1000 Growth Index since the fund's inception.

As you can see, there are periods of both significant outperformance and underperformance. There is no question that the historical track record of the fund has been quite impressive. But remember, in the world of “grey,” there is uncertainty and with uncertainty we must know the odds.

Fortunately, we can calculate those odds using the t-statistic to measure statistical significance. By knowing the average alpha and the standard deviation of the alpha, we can determine the required track record in order to obtain a t-stat of 2 (the most widely accepted metric of statistical significance). You can use the calculator below to estimate any required number of periods on your own.

The chart below shows the required number of years given a combination of average alphas and standard deviation of the alphas.

Minimum Track Record for a Statistically Significant Alpha
(t-stat > 2)
  Average Alpha
1% 2% 3% 4%
Standard Deviation of Alpha 4% 64 16 7 4
6% 144 36 16 9
8% 256 64 28 16

For the Fidelity Contrafund, we can see that the fund has delivered an average outperformance, or average alpha, of 2.28% per year versus the Russell 1000 Growth Index. But there has been substantial deviation around that average as you can see from both the green and red bars in the chart. In order to be certain (95% certain) that the Fidelity Contrafund management team is skillful and not just lucky, we must have a sample time period of 50 years worth of data. 50 years! Not 1 or 3 or 5. However, most of the financial media uses these short time intervals to push investors into making decisions. To an investor, we would summarize the decision to invest in something like the Fidelity Contrafund as follows, “we cannot be 95% confident that the management team has genuine skill and you would need your entire life in order to decipher whether or not it was in fact skill.” Do you like those odds and would you make that bet? 

You can find more examples of some of the most “impressive” active mutual funds in history in Step 3 under “Charts” on our website to see if you like the odds.

We have yet to find a single one.

But there are still two of caveats even with the 50-year requirement. The first one is that based on the 95% confidence level there is still a 1 in 40 chance that we would be wrong (having to do with the far right tail of the distribution). In fact, we would expect to have 1 out of every 40 active managers to have a t-statistic greater than 2 just by random chance alone. 

The second caveat would be that this first data sample would be considered only an "in-sample period." To avoid a data mining error, a second period, known as an "out-of sample" test would be required. For example, we can look at a fund’s performance in two independent time periods to make sure that there is in fact a persistence of skill. So in the case of the Fidelity Contrafund, if we wanted to look at two independent time periods, we would need 100 years of data (double of 50 years), in order to ensure that skill was the explanation of the excess return over the benchmark. I believe it is fair to say that all investors will be dead by the time they make that determination.

The table below summarizes the required number of years necessary for a t-stat greater than 2 for two independent time periods, using a combination of average alphas and their standard deviations. Try to find the standard deviation of alphas in the data provided by fund rating services. If you would like IFA to calculate it for you, please give us a call.

Minimum Track Record for Two Independent Periods with Statistically Significant Alpha (t-stat > 2)*
  Average Alpha
1% 2% 3% 4%
Standard Deviation of Alpha 4% 128 32 14 8
6% 288 72 32 16
8% 512 128 56 32

Here is an example of an in-sample and out-of-sample test. Lets assume that it is December 31, 1997 and we decided to analyze Warren Buffett's alpha over a large value index, like the Russell 1000 Value.  The data available today only goes back to 1980, even though Buffett took over Berkshire Hathaway in 1965. If you click the "First 18 Year Period" in the chart below (if you have Flash on your device), you will see that Berkshire Hathaway demonstrated a highly unusual 16.65% alpha with a t-stat of 3.07. Based on that one sample, you would think that Buffett had exceptional skill at identifying mispriced investments and that you could expect that same statistically significant alpha in future periods. Well, now click the second period for the 18 years from 1998-2015 and you see that the out-of-sample period ended up with an alpha of 2.45% and a large standard deviation of alpha at 16.26% (meaning highly variable alpha), yielding a highly insignificant t-sat of 0.64. What happened to the Oracle of Omaha? Many Peter Lynch and Bill Miller investors suffered similar fates.

Once we frame investment decisions like a good statistician, we can start to have more clarity about which is the right investment strategy to follow: active or passive. For over 16 years, IFA has championed a passive investment approach because active managers have failed to provide a long enough track record to establish reasonable and prudent levels of confidence in their future outperformance over an index fund that tracks their benchmark. If you would like to benefit from our fiduciary wealth services, please call us at 888-643-3133.