Overview: Benchmarks and Attribution Analysis Kathryn Dixon Jost, CFA Vice President, Educational Products Rightly or wr...
40 downloads
1360 Views
759KB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
Overview: Benchmarks and Attribution Analysis Kathryn Dixon Jost, CFA Vice President, Educational Products Rightly or wrongly, all portfolio managers live and die by their benchmarks, or rather, the benchmarks their clients choose for them. Therefore, it is imperative that managers thoroughly understand the components of their benchmarks and educate their clients in this regard. Merriam-Webster’s Collegiate Dictionary defines “benchmark” as “a point of reference from which measurements may be made.” The point of reference must be relevant to the task at hand, and herein lies the challenge of choosing and constructing a benchmark. The benchmark must not only represent the client’s goals and objectives but also be a reasonably achievable target for the manager given the manager’s investment style. A primary consideration in constructing a relevant benchmark for taxable investors is having a methodology that accurately introduces the effect of income and capital gains taxes on the portfolio. This inclusion of tax considerations is a new frontier for both clients and managers but one whose time has come. AIMR is participating in fleshing out standards for the presentation of after-tax returns through the AIMR Taxable Portfolios Subcommittee and for the construction of after-tax benchmarks through the AIMR Benchmarks and Performance Attribution Subcommittee.1 Although compliance with the standards promulgated by the subcommittees is not mandatory, with the demand by clients for after-tax reporting and with the U.S. SEC jumping on the bandwagon to require after-tax performance reporting for mutual funds, the momentum is growing for after-tax performance reporting across the board, and with it, the need for after-tax benchmarks. Just as important as the client’s concerns in choosing a benchmark are the manager’s concerns of not being saddled with a benchmark that inadequately represents his or her management style. An inappropriate benchmark can set the bar too high from both the manager’s and the client’s perspective. If a mismatch exists between the manager’s style and the benchmark, the comparison of the manager’s performance with the index’s performance will not provide any meaningful information to either the client or the manager. The client will lack the tools 1
AIMR’s Benchmarks and Performance Attribution Subcommittee Report, available at www.aimr.org/standards/pps/benchmark.html, provides a summary of standard benchmark rules.
2001, AIMR®
necessary to accurately judge the contribution of either the investment style or the manager’s skill to the overall portfolio, which may lead to unnecessary manager turnover. Benchmarks also play an important role in the entire risk-budgeting and performance attribution process—a process that can be applied in the portfolio construction, periodic rebalancing, and performance analysis stages of investment management. The value of the attribution process depends on the consistency between the attribution factors and the actual investment parameters within which the investment manager operates. And the attribution analysis can provide even greater beneficial feedback if undertaken at both the manager level and the macro (or total portfolio) level. In an endeavor to improve the benchmarking process and thus the information that the process can provide, the client has the opportunity to not only choose a standard benchmark as the reference point for performance but also to construct an individualized benchmark that more effectively matches the client’s long-term objectives. And if the client’s choice remains a standard index, knowing the construction parameters of the standard indexes, such as the inherent classification and rebalancing differences in the various standard index families, can be an important step in making the appropriate benchmark choice. Choosing knowledgeably can ultimately limit client and manager misunderstandings about performance and enhance the successful achievement of long-term client objectives. Therefore, defining the point of reference for investment management performance is a critical decision from the inception of a client–manager relationship.
Choosing a Benchmark Christopher Luck’s examination of the most commonly used indexes in the United States—those of Frank Russell Company and Standard & Poor’s— reveals that choosing an index has become more complicated than ever before, partly because of the boom in technology stocks and their ensuing impact on the indexes. To guarantee the effectiveness of an investment strategy, the plan sponsor must understand benchmark construction and make sure that the benchmarks chosen do not conflict with each other.
www.aimr.org • 1
Benchmarks and Attribution Analysis Luck’s conclusion that variations in the construction of indexes have exacerbated the differences between the indexes means that, for the most part, managers cannot simply substitute one index for another. When the indexes are compared along such dimensions as return correlations, tracking error, and turnover, the issue of noncomparability becomes readily apparent. And although concentration levels in the indexes do not pose the huge risk that some people claim, people are correct in their belief that the S&P indexes are more conservative than the Russell indexes when it comes to incorporating new companies. That is, S&P does not include new companies (potentially ones without any reported earnings) in its indexes as readily as Russell does. In terms of performance measurement, then, plan sponsors should choose an index that maximizes manager allocation, asset allocation, and performance-measurement decisions. Although most people agree on the properties of a good benchmark, questions arise as to whether the benchmark selected is representative of the manager’s style and whether the selection of the index will affect manager behavior. Certainly, in passive management, the choice of index affects manager behavior, but index choice also has an impact on an active manager’s behavior. Thomas Richards emphasizes that the performance differential between a manager’s active portfolio and the portfolio benchmark should be attributable to the level of manager skill, not the result of systematic market factors unrelated to the investment process. To guarantee that a manager’s benchmark is not simply a clone of the manager’s active investment process, Richards separates portfolio risk and return into three components: systematic risk, investmentstyle bias, and manager skill. In reviewing the four types of benchmarks currently used in the institutional investment community—peer group comparisons, broad market indexes, investment-style indexes, and custom benchmarks—he analyzes the various strengths and weaknesses of each. Past performance, return correlations, and general characteristics of the benchmark provide most of the information that managers typically need, but further analysis of systematic risk, tracking error, and industry or sector risk exposure will result in choosing more-appropriate benchmarks and help prevent surprise “fat tail” outcomes.
Performance Attribution Careful choices must also be made when using benchmarks to evaluate the investment process (i.e., as part of a performance attribution analysis). As Kevin Terhaar explains, the goal of performance eval-
2 • www.aimr.org
uation is to determine the extent to which the tradeoffs that portfolio managers make add value. But if portfolio managers and analysts do not maintain consistency between the investment process and the performance attribution analysis used to evaluate the process, the attribution may not be accurate. Furthermore, performance analysis is not helpful if the attribution is to an area over which the portfolio manager has no control. Terhaar uses three examples to illustrate the importance of maintaining consistency between the performance evaluation and the investment process so that the results are attributed to those responsible for making the investment decision. For example, if country selection or industry selection is not a relevant decision variable, then the attribution should not be performed on the basis of country or industry variables. The total fund’s performance, not just the portfolio managers’ performance, should also be subject to an attribution analysis. Jeffery Bailey explains that performance evaluation should include three steps: performance measurement, attribution, and appraisal. Each element provides valuable information about the account’s performance over the performance period—most importantly, whether the performance is a result of luck or skill. Bailey explores the possibilities of macro-attribution as a framework for better understanding how each investment policy decision has or has not added value to the portfolio. The ability to evaluate the performance of the total portfolio in addition to the performance of each separate asset class is important at the plan sponsor level. A macro-attribution should encompass allocations to asset categories, investment styles, and investment managers at the macrolevel and quantify the consequences of investment policy decisions in terms of dollars, not just returns. Macro-attribution thus encourages those involved in the investment process to be objective and systematic in choosing appropriate benchmarks and requires a long-term perspective.
Applications of Attribution Analysis Understanding how benchmarks are constructed and used in attribution analysis sheds light on the ultimate usefulness of benchmarks—accurately assessing a portfolio’s relative return and risk levels. Ideally, according to Wayne Kozun, risk attribution and performance attribution should both be part of an integrated attribution system. But even though such a system is not currently available, risk budgeting (the allocation of risk among the different asset classes of a fund) is possible and is a worthwhile tool that can,
2001, AIMR®
Overview and should, be intrinsic in the creation of an investment program. Kozun explains in detail the process of risk budgeting used by the Ontario Teachers’ Pension Plan Board—a plan sponsor that embraced a risk management system after it was privatized more than a decade ago. The value of risk budgeting is that it compels managers to focus on earning maximum return for each unit of risk taken. Once the risk tolerance and risk assumptions of the plan are determined, risk can be defined in a quantifiable, standard way for all asset classes in a given portfolio. And once risks are defined and budgeted and a methodology for calculating risk is established, other issues—from how to price illiquid assets to modeling the event risk of the total fund—can be tackled. Nevertheless, risk budgeting is viewed by a few in the profession as a radical concept. Consequently, the successful implementation of a risk-budgeting system may require changing the way a plan sponsor’s staff and board think about risk and performance measurement. Clearly, once the components of portfolio return have been determined, one can see who the mostactive managers are and who is really adding value and where. Using benchmarks appropriately helps to determine what “bets” are being made by managers and these bets’ impact on performance and risk. Another critical area in terms of measuring the total risk–return trade-off of a taxable portfolio is after-tax benchmarks. Lee Price, who has long been involved in AIMR’s Taxable Portfolios Subcommittee, provides a straightforward method for calculating aftertax returns. This method can be used to create aftertax benchmarks for measuring the performance of taxable accounts.
2001, AIMR®
The same rules for calculating after-tax returns according to AIMR-PPS™ standards apply to calculating after-tax benchmarks. The problem with aftertax benchmarks is that no single after-tax performance number applies to all users of the benchmark. Price recommends a methodology to introduce specific, individualized tax rates into the calculations of after-tax returns and benchmarks. Factoring the capital gain realization rate (CGRR) into the equation is the key to assessing after-tax returns. Using a constant CGRR for the performance period is an improvement over no CGRR at all, but using the actual CGRR for each performance period greatly contributes to the relevancy of the after-tax calculation. Finally, shadow portfolios (i.e., benchmarks) that vary depending on each client’s situation can be created to offer the most precise and detailed aftertax benchmark possible.
Conclusion Benchmarks are a critical tool in the investment management industry. Although an imperfect measuring stick, benchmarks—whether standard or customized indexes—are the accepted barometer of the market. Many analysts and managers are striving to improve the benchmarking process by educating themselves and clients about the components and construction of standard indexes and by adjusting benchmarks for appropriate tax rates to generate after-tax performance measures. A great deal of progress to this end has been made in the past decade, and more progress can be expected in the next few years. When so many investment decisions rest on comparative analyses with a benchmark, the quest for a method to design the most accurate benchmark possible is an extremely laudable endeavor.
www.aimr.org • 3
The Importance of Index Selection Christopher G. Luck, CFA Partner and Director of Equity Portfolio Management First Quadrant, LP Pasadena, California
Although the most commonly used indexes in the United States share some similarities, their differences have recently sharpened, primarily because of the technology boom. No longer can one assume that the standard industry benchmarks serve as good substitutes for each other. Standard & Poor’s and Frank Russell Company indexes (the focus of the presentation) differ markedly in their construction parameters: weighting, rebalancing, and reinvestment procedures. Close examination of these parameters reveals that choosing indexes wisely will likely become even more complicated, and more important, in the future.
lthough the number of market indexes available is quite large, the most commonly used market indexes in the United States are those created by Standard & Poor’s and Frank Russell Company. These indexes have been sliced and diced in an amazing number of ways, including mega cap (S&P 100 Index, Russell Top 200 Index), large cap (S&P 500 Index, Russell 1000 Index), midcap (S&P MidCap 400 Index, Russell Midcap Index), small cap (S&P SmallCap 600 Index, Russell 2000 Index), and small/ midcap (Russell 2500 Index). The most common broad market indexes are the Russell 3000 Index, the S&P SuperComposite 1500 Index, and the Wilshire 5000 Index. Value and growth variants exist for all these indexes as well. Conventional wisdom holds that the standard domestic equity benchmarks are fairly good substitutes for each other. There is some truth to this belief, but more support is evident for the opposite proposition: The standard domestic equity benchmarks are not good substitutes for each other; in fact, they are in some ways more different than they are similar. In this presentation, I will discuss the characteristics of publicly available indexes, primarily those of S&P and Russell, but I will also include the Wilshire 5000, another widely accepted broad market index. I will look at the changing characteristics of these standard market indexes as well as those of the specialized value and growth indexes. And because concentration levels in the indexes pose particular risks, I will use concentration as a characteristic with which
A
4 • www.aimr.org
to compare the indexes. I will also discuss managing portfolios to indexes and what implications the differences in the indexes have for active managers.
Index Methodologies S&P and Russell differ in how they construct their indexes. For example, they do not select stocks for inclusion in the indexes in the same manner. The makeup of the S&P indexes is committee driven; a committee of about nine people decides which companies should enter or be dropped from the indexes. S&P looks for leading companies in leading industries and does not state explicitly that the S&P 500, for example, is a large-cap index by design (even though it is). In contrast, Russell’s indexes are completely formula driven. As such, the management of the Russell indexes is extremely transparent. There are rules to follow, and those rules are followed religiously, although Russell has made changes to its rules over time. W e ight in g. S&P indexes are market-cap weighted, whereas Russell’s are market-float weighted. Cap weighted is more transparent than float weighted because determining the number of shares outstanding is easier than determining those available for purchase. That is, making floatweighted adjustments for employee stock ownership plans, concentrated holdings, and cross-ownership is a more subjective endeavor than simply determining
2001, AIMR®
The Importance of Index Selection the number of shares outstanding. In the United States, that distinction is relatively minor, but internationally, a far greater divide exists between the two categories. Although little difference usually exists between the two methodologies in the U.S. markets, noticeable differences have been apparent at times. Floatweighted adjustments led to the odd occurrence that for about a year (1999), Microsoft Corporation was the largest holding in the S&P 500 while General Electric Company (GE) was the largest holding in the Russell 1000. The float-weighting rules for the Russell indexes required a float adjustment that lowered the total outstanding shares of Microsoft. Thus, the price per share times the number of Microsoft shares outstanding, calculated differently by the two indexes, resulted in Microsoft being a smaller holding in the Russell 1000 than it was in the S&P 500. As of November 2000, GE was the largest holding in both indexes. Rebalancing. A wide disparity is apparent in how S&P and Russell periodically rebalance their indexes. For active managers, rebalancing presents a serious problem, especially for those who manage to the small-cap indexes. S&P rebalances its indexes continuously. Corporate actions, such as mergers and takeovers, primarily drive the rebalancing in the S&P indexes. Because the S&P indexes contain a fixed number of companies, when one company goes out of business or is acquired, S&P finds a replacement. S&P can also decide that a particular company simply no longer belongs in an index. At the end of December 2000, for example, several S&P small-cap index names, including Spartan Motors and the Bombay Company, were removed because of their lack of representation. The removal of certain companies for lack of representation has been common in the S&P small-cap index because S&P has been striving to keep the index representative of this segment, but the problem of lack of representation has been occurring more frequently across all the indexes, albeit relatively rarely. For S&P, the idea is to have an “exit strategy” for companies other than that of corporate actions in order to ensure that the indexes broadly represent their respective capitalization segments. Russell makes all index changes on June 30. On May 31, company capitalizations are used to pick the 1,000, 2,000, and 3,000 largest companies, and on June 30, Russell weights the indexes by the outstanding float. So, the Russell 3000, Russell 2000, and Russell 1000 have 3,000, 2,000, and 1,000 stocks, respectively, on June 30. Company spin-offs and companies that go out of business are not immediately replaced or dropped in response to the event; rather, any resulting changes to the indexes are not made until the following June 30. In fact, at the beginning of Novem-
©2001, AIMR®
ber 2000, the Russell 3000, 2000, and 1000 indexes did not contain exactly 3,000, 2,000, and 1,000, stocks, respectively. Broad Market Indexes. I do not know any clients who use the S&P 1500 as a broad market index. My clients typically choose the Wilshire 3000 or, just as often, the Wilshire 5000—the index that holds almost every company headquartered in the United States and that contains about 7,000 stocks. Unlike the Russell indexes, the Wilshire 5000 is rebalanced monthly. Value and Growth Indexes. Another point of comparison is that Russell and S&P construct their value and growth indexes differently. I would have to say that the Russell indexes are generally more widely accepted as measures of the value and growth sectors than the S&P indexes. ■ S&P/Barra. S&P, in cooperation with Barra, has created value and growth indexes based on the S&P 500, S&P MidCap 400, and S&P SmallCap 600. S&P differentiates between value and growth strictly according to a company’s book to price ratio (B/P), and the indexes are reconstituted twice a year on June 30 and December 31. On these days, the stocks in the index universe are classified as either value or growth; 50 percent of the total capitalization goes to the value component, and 50 percent goes to the growth component. Although each index does not have an equal number of names, both indexes have equal capitalization, at least on June 30 and December 31. S&P would argue that its base indexes (e.g., the S&P 500) can be reconstituted by simply adding up everything in the growth and value indexes (e.g., S&P 500/Barra Growth Index and S&P 500/Barra Value Index). ■ Russell. Russell constructs its style indexes by using two criteria for the differentiation between value and growth: B/P (like the S&P/Barra indexes) and I/B/E/S International’s forecast earningsgrowth measure. The Russell value and growth indexes are reconstituted once a year on June 30, with a nonexclusive style split; 70 percent of the total number of companies in the indexes is exclusively in one index or the other, and the remaining 30 percent is split between the two indexes. Thus, some companies end up in both the Russell 1000 Value Index and the Russell 1000 Growth Index, for example, even though the capitalizations from each index should add up to the overall capitalization in the Russell 1000. Number of Companies. As Table 1 shows, compared with their S&P counterparts, the Russell indexes are more broadly diversified. The Russell
www.aimr.org • 5
Benchmarks and Attribution Analysis Table 1. Number of Names in Indexes as of September 30, 2000 Index
Base Index
Value Index Growth Index
S&P 500
500
389
111
S&P 400
400
266
134
S&P 600
600
461
179
Russell 1000
978
742
529
Russell Midcap
781
606
403
Russell 2000
1,941
1,253
1,271
Russell 2500
2,427
1,637
1,518
Russell 3000
2,916
na
na
Wilshire 5000
6,793
na
na
na = not applicable
value and growth indexes are much larger than those of S&P. As of September 30, 2000, the Russell 1000 consisted of 978 companies. (Remember that the Russell indexes are reconstituted only once a year on June 30 and that Russell does not have an automatic replacement rule for companies in the index that are acquired.) The Russell 1000 Value included 742 companies, and the Russell 1000 Growth included 529, which is indicative of the overlap between the two indexes. The S&P 500, however, had 500 companies in the base index—389 in the value index and 111 in the growth index. Notice that the Wilshire 5000, with 6,793 names, is much broader than the Russell 3000, with 2,916 names. For additional information on these companies’ indexes, I suggest visiting their Web sites.1
■ Return correlations. Table 2 shows historical return correlations for various S&P and Russell indexes for their common time periods. These time periods are somewhat different for each index, but the large-cap indexes cover the period 1979–2000; the midcap indexes, 1991–2000; and the small-cap indexes, 1994–2000. The data indicate high correlation for these index pairs. For example, the S&P 500 and the Russell 1000 have a correlation of 0.9952. Note that the value indexes are more highly correlated than the growth indexes and that the large-cap indexes are more highly correlated than the smallcap or midcap indexes. But the historical correlation of two other broad market indexes—the Russell 3000 and the Wilshire 5000—is even higher (0.998). Correlation, however, is a relatively simple measure of substitutability. The closer one looks at the indexes, the more obvious the differences become. ■ Return regressions. A return regression on each of the index pairs provides further analysis. Table 3 shows the alpha and beta coefficients from these regressions over the same time periods as the correlations in Table 2. The alpha coefficients for the midcap sector are unusually large. Likewise, the beta coefficients differ from 1 (perfect covariance), which implies that the two indexes have not substantially covaried. Even though the correlation coefficients indicate that these indexes are good substitutes for each other, the beta figures suggest otherwise.
Table 2. Historical Return Correlations
Index Comparisons As I stated, the S&P and Russell indexes can be compared along various dimensions—weighting, rebalancing, and composition of subindexes. Other dimensions, including return correlations, tracking error, and turnover, are also useful in distinguishing between the two sets of indexes. Their similarities and differences affect the degree to which an S&P index can be substituted for a Russell index, so I will emphasize that issue. I will also examine the behavior of the two sets of indexes in relation to concentration risk. Lack of Substitutability. The Russell and S&P indexes can be compared in several areas that highlight their differences and raise the issue of noncomparability. By and large, these differences are readily apparent. 1 See www.russell.com, www.barra.com, www.spglobal.com, and www.wilshire.com.
6 • www.aimr.org
Russell 1000 S&P 500
Russell 1000 Value
0.9952
S&P 500 Value
0.9907
S&P 500 Growth
0.9881
Russell Midcap S&P 400
Russell Midcap Value 0.9653
S&P 400 Growth
0.9427
Russell 2000 S&P 600 Value S&P 600 Growth
Russell Midcap Growth
0.9626
S&P 400 Value
S&P 600
Russell 1000 Growth
Russell 2000 Value
Russell 2000 Growth
0.9721 0.9735 0.9683
Note: Data for Russell 1000 and S&P 500, February 1979–September 2000; data for Russell Midcap and S&P 400, January 1991– September 2000; and data for Russell 2000 and S&P 600, January 1994–September 2000.
2001, AIMR®
The Importance of Index Selection Table 3. Regression Differences: Alpha and Beta Russell 1000
Russell 1000 Value
Russell 1000 Growth
Table 4. Historical Tracking Error (annualized) Russell 1000
S&P 500 Alpha Beta
–0.16
S&P 500
1.006 0.16
Beta
0.980
Alpha
–0.04
Beta
1.032
Russell Midcap
Russell Midcap Value
Russell Midcap Growth
S&P 400 Alpha
1.33
Beta
0.858 0.85 0.914
S&P 400 Growth Alpha
1.04
Beta
0.894 Russell 2000 Value
Russell 2000 Growth
S&P 600 Beta
–0.77 1.021
S&P 600 Value Alpha Beta
–0.26 1.116
S&P 600 Growth Alpha Beta
–0.06 0.845
Note: Data for Russell 1000 and S&P 500, February 1979–September 2000; data for Russell Midcap and S&P 400, January 1991– September 2000; and data for Russell 2000 and S&P 600, January 1994–September 2000.
■ Tracking error. Table 4 gives the tracking error of the index pairs, again over their common time history. Even at the large-cap level (S&P 500 and Russell 1000), the tracking error—almost 1.5 percent a year—is significant. For the large-cap value and growth index pairs, the tracking error is about 2.5 percent, and for the midcap and small-cap pairs, it is even larger (as high as 6.62 percent). These large tracking-error numbers provide further evidence that these indexes are not good substitutes for each other. ■ Turnover. Table 5 indicates the turnover of the S&P and Russell indexes for the last 10 years and ©2001, AIMR®
Russell Midcap Growth
3.66%
S&P 400 Growth
6.56%
Russell 2000
S&P 600 Growth
Beta
Russell Midcap Value
4.34%
S&P 400 Value
S&P 600 Value
Alpha
Russell 2000
S&P 400
S&P 600
S&P 400 Value
2.68%
Russell Midcap
S&P 500 Growth
Alpha
1.97%
S&P 500 Growth
Alpha
Russell 1000 Growth
1.48%
S&P 500 Value
S&P 500 Value
Russell 1000 Value
Russell 2000 Value
Russell 2000 Growth
4.33% 3.81% 6.62%
Note: Data for Russell 1000 and S&P 500, February 1979–September 2000; data for Russell Midcap and S&P 400, January 1991– September 2000; and data for Russell 2000 and S&P 600, January 1994–September 2000.
the 12 months ending September 2000. Turnover is clearly a drag on performance. Whether a manager manages actively or passively versus a benchmark, the turnover in the indexes affects the turnover in the manager’s portfolio. The greater the turnover in the portfolio, the greater the leakage to the brokerage community. Historically, the larger the stock capitalization, the smaller the amount of turnover in the index. The S&P 500’s turnover rate has been 4.5 percent; the Russell 1000’s, 6.3 percent. On the midcap side, the S&P 400 has had a turnover rate of 16.6 percent, and the Russell Midcap, 20.8 percent. The higher turnover for midcap and small-cap indexes is predominantly caused by midcap and small-cap stocks growing out of their universes, which forces them into the next index level (e.g., from the small-cap to the midcap index). Such changes have a commensurately larger impact than when, for example, a large-cap stock falls from the large-cap range into the midcap range. Turnover for the value and growth indexes is much higher than that of the base indexes, which is no surprise because value and growth indexes are using a smaller subset of the universe. Turnover is high in these indexes not only because of capitalization changes but also because stocks migrate across the growth–value divide. Compared with the data compiled from January 1990, the trailing 12-month figures have increased
www.aimr.org • 7
Benchmarks and Attribution Analysis Table 5. Historical Turnover Levels (annualized) Large Cap Index
Midcap Turnover
Index
Small Cap Turnover
Index
Turnover
Since January 1990 S&P 400
16.6%
S&P 600
17.0%
S&P 500 Value
S&P 500
22.7
S&P 400 Value
37.1
S&P 600 Value
42.0
S&P 500 Growth
23.9
S&P 400 Growth
59.5
S&P 600 Growth
52.6
6.3
Russell Midcap
20.8
Russell 2000
30.3
Russell 1000
4.5%
Russell 1000 Value
16.7
Russell Midcap Value
30.0
Russell 2000 Value
38.9
Russell 1000 Growth
16.1
Russell Midcap Growth
47.3
Russell 2000 Growth
45.4
Trailing 12 months S&P 500
12.5
S&P 400
34.5
S&P 600
25.8
S&P 500 Value
26.5
S&P 400 Value
43.4
S&P 600 Value
51.7
S&P 500 Growth
28.9
S&P 400 Growth
89.7
S&P 600 Growth
70.6
Russell 1000
13.3
Russell Midcap
40.2
Russell 2000
35.0
Russell 1000 Value
26.0
Russell Midcap Value
36.2
Russell 2000 Value
44.1
Russell 1000 Growth
30.5
Russell Midcap Growth
74.7
Russell 2000 Growth
52.8
Note: Data through September 2000.
significantly across the board. The extraordinarily large 12.5 percent turnover for the S&P 500 is probably the result of the technology boom, because the stocks that made it into the index were stocks such as Yahoo!, JDS Uniphase, and other relatively large-cap stocks that were not previously included. The Russell base indexes exhibit much higher turnover than the S&P base indexes, especially in the small-cap and midcap sectors, but that trend has not appeared in the value and growth indexes. The Russell Midcap Growth Index had 75 percent turnover for the trailing 12-month period. The S&P 400 Growth Index had nearly 90 percent turnover, which is an extraordinarily high turnover to manage a portfolio against. Even the Russell 2000, which is a much broader index than either the Russell Midcap or the S&P 400, has had growth and value turnover levels close to 50 percent. That amount of turnover can be troubling for an indexer who, by definition, must match the index change for change. Active managers, presumably, do not have to replicate the holdings of an index, but having a benchmark with a high level of turnover compounds the turnover levels already encountered with actively managed portfolios. Essentially, managing any portfolio with a high level of turnover is extremely difficult, not the least because of the transaction costs incurred. The data show that the base S&P indexes have lower turnover than the Russell base indexes, which is primarily a function of S&P’s “management” of its indexes. S&P appears to want lower turnover, particularly in the S&P 500, relative to the competition, and
8 • www.aimr.org
with committee-driven rules, S&P can use low turnover as one of its criteria. The key point is that S&P manages its indexes, whereas Russell uses formulas to create and modify its indexes. For the growth and value sectors, the S&P indexes have higher turnover than the similar Russell indexes. That higher level of turnover is primarily a function of the rebalancing that S&P does twice a year; Russell rebalances only once a year. The turnover difference is also a result of the either/or distinction that S&P imposes on each company. S&P does not allow the same company in more than one index, but Russell does. ■ Inclusion of unseasoned companies. The inclusion of “unseasoned” companies is an increasingly meaningful criterion for analyzing these indexes. This issue was particularly relevant during the initial public offering (IPO) and technology boom of the late 1990s. For about four or five years, the technology sector seemed invincible, and consequently, the number of companies that never reported earnings increased dramatically in the indexes; some indexes even included companies that had reported nothing but negative earnings since going public. From about 1987 to 1996, whether looking at the S&P 500 or the Russell 1000, the number of companies in the index that never reported positive earnings was small. Starting roughly in 1997, however, the number of such companies rose significantly and was particularly dramatic in the Russell indexes. The reason this phenomenon occurred in the Russell indexes is that the Russell indexes are formula
2001, AIMR®
The Importance of Index Selection driven. Russell looks at the capitalization of a company, and if the company meets the threshold for a certain index, it enters that index. When, for example, the capitalizations of Amazon.com and priceline.com grew large enough to be included in the top 1,000 companies, they were moved into the Russell 1000. If a company was a growth stock, it also was moved into the growth variant of the index. S&P has been more conservative about adding such companies to its indexes. Table 6 shows the technology orientation of the indexes. The Russell indexes tend to have a slightly higher technology orientation, but interestingly, the difference is not large and is fairly consistent. Russell has been admitting technology-oriented companies more readily than S&P, whereas S&P, with its bias toward seasoned companies, has been including technologyoriented companies more slowly.
Table 6. Percent of Technology-Oriented Stocks in S&P and Russell Indexes Base Index
Value Index
Growth Index
S&P 500
35.9%
14.2%
56.8%
S&P 400
32.9
14.8
48.7
S&P 600
30.9
18.2
46.1
Russell 1000
37.2
13.7
60.4
Russell Midcap
29.1
12.8
55.3
Russell 2000
38.2
10.4
47.7
Notes: Technology orientation based on Barra industry classification, September 30, 2000. Internet industry weight as of September 30, 2000, was 1.8 percent for the S&P 500, 3.4 percent for the Russell 1000, 0.3 percent for the S&P 600, and 3.3 percent for the Russell 2000.
The same pattern appears in the midcap and small-cap Russell indexes. Beginning in 1997, Russell’s indexes experienced an increase in the number of companies with no reported earnings, especially in the small-cap indexes. For the Russell 2000, 16 percent of its companies never reported positive earnings. For the S&P 600, the number is only 1.5 percent. The technology orientation underscores one key philosophical disparity between the two indexes— the inclusion rate of “unseasoned” companies. Substitutability. Although the S&P and Russell indexes differ, they have some similarities that make them good substitutes for each other under certain circumstances. The S&P 500 and the Russell 1000, the large-cap indexes, are well correlated and have been good substitutes for each other. They have a strong historical correlation and reasonably low tracking ©2001, AIMR®
error. The growth and value variations of these two indexes have also been reasonably good substitutes for each other, although to a lesser extent than the base indexes. All the large-cap indexes may be better substitutes for each other in the future, because since the technology correction in the market, the difference between the two indexes arising from their technology versus seasoned-company orientations is likely to narrow; in fact, their correlations may well increase. If the market continues to reward neweconomy companies relative to the seasoned companies favored by S&P, the degree of substitutability of the two indexes will be altered. For the small-cap and midcap sectors, the base indexes have not been good substitutes for each other; they have had high relative tracking error. The numbers are even worse for the value and growth indexes, especially for the growth sector. Because of the new-economy orientation—the speculative technology exposure within the Russell indexes—it is unlikely that the S&P and Russell small-cap and midcap indexes will track each other closely, which has strong implications for plan sponsors and anyone managing to these indexes. If one person manages to the S&P 400 and another to the Russell Midcap, they will be managing different portfolios. Similarly, if one person manages to the Russell 2000 and another to the S&P 600, they, too, will potentially produce widely disparate portfolios. These results are even more pronounced for the value and growth indexes. As for the broad market indexes, the Russell 3000 and the Wilshire 5000 are good substitutes for each other. Although the number of companies in each index differs significantly, the indexes have a historical correlation of 0.998 and a historical tracking error of 1.05 percent. There is no reason to predict that this high degree of correlation will change. Concentration Risk. Several market analysts claim that the U.S. indexes have become more concentrated, and thus riskier, in recent years. To examine this hypothesis, I chose the seven largest international markets (United Kingdom, Japan, Canada, Germany, France, the Netherlands, and Switzerland) and looked at the primary large-cap index in each market. For each index, I found the top stock and compared it with the top stock in the S&P 500 and Russell 1000. At the end of September 2000, General Electric was the largest stockholding in the S&P 500 and the Russell 1000 with, respectively, a 4.6 percent and 4.5 percent weighting. Compared with the non-U.S. indexes, the U.S. indexes do not seem concentrated. As of September 30, 2000, the FTSE 100 (United Kingdom) had a 11.0 percent weight in Vodafone Group, the TOPIX
www.aimr.org • 9
Benchmarks and Attribution Analysis (Japan) had 7.7 percent in NTT DoCoMo, the TSE 100 (Canada) had 34 percent in Nortel Networks Corporation, the DAX (Germany) had 13.0 percent in Deutsche Telekom, the CAC 40 (France) had 11.3 percent in France Telecom, the AEX (Netherlands) had 13.8 percent in ING Groep, and the SMI (Switzerland) had 21.2 percent in Novartis. Therefore, merely by looking at the largest stock in each index (a very basic dimension), the U.S. indexes seem well diversified in both absolute and relative contexts. The top-tier names (in capitalization terms) in the S&P 500 today make up a slightly higher percentage of the total index capitalization than they did in the late 1980s, as shown in Panel A of Figure 1. Although the concentration level has jumped modestly in the past several years (currently 24.2 percent for the top 10 names), it is not all that different from the historical norm (20.3 percent for the top 10 names). So, the jump has not been particularly significant relative to historical concentration levels. If I had enough data, I suspect I would find that the low concentration through the 1990s, not the current high concentration, is the exception. Some of the Russell indexes, in terms of their historical norms, are not particularly concentrated at the present time. Panel B of Figure 1 shows that the Russell 1000 has less concentration than that found in the S&P 500. The top 10 names in the Russell 1000 currently make up 20.9 percent of the index, but the historical average for this time period is 16.6 percent. Again, compared with historical norms, the Russell 1000 has a slightly higher concentration at the present time but not significantly so. The Russell 2000, shown in Panel C of Figure 1, yields some interesting results. Because Russell rebalances only once a year, the concentration of stocks in its indexes becomes more pronounced until rebalancing occurs, when the largest-cap stocks in the Russell 2000 are moved to the Russell 1000 and the smallest-cap stocks in the Russell 1000 are moved to the Russell 2000. So, in effect, Russell reduces concentration levels on every rebalance date (June 30). The result is the jagged line shown in Panel C. Historically, the top 250 names composed 33.0 percent of the index; the current figure is 33.7 percent. The biggest difference appeared in the 1998–99 period, when the rebalancing dramatically affected the weights of the holdings. The recent changes in company capitalizations have had a much bigger impact on the Russell 2000 than on the Russell 1000 or S&P 500. Russell has considered rebalancing twice a year, probably because of the violent concentration swings that have occasionally occurred since 1997. The world has changed, and the amount of change that occurs within a year is much greater now than in the
10 • www.aimr.org
past, a fact that would easily justify more frequent rebalancing. And although the concentration levels in the Russell 2000 are not vastly different from the historical averages, since 1997, the extent of the changes in concentration levels as a result of rebalancing has been significant. These changes have been driven, more or less, by the technology sector. Su mm a r y . Although both sets of indexes are actively managed, their approaches differ. S&P’s active-management process consists of a committee deciding which companies to move in and out of the S&P indexes. The active-management decisions for the Russell indexes come from a group of people who establish “rules” for inclusion and exclusion. Russell establishes fairly transparent rules for its indexes, which is to be applauded, but from period to period, the rules nevertheless can change. Some of Russell’s decisions are proprietary, such as the weightings in growth and value stocks, so its approach is not as transparent as it may seem. In addition, certain biases accompany each of the index sets. Wilshire, by contrast, simply includes the stock of every company headquartered in the United States, so no decision making is involved in creating the Wilshire 5000 Index. Other points of comparison can be summarized as follows: • High turnover is more evident in the Russell indexes, except for the small-cap growth and value sectors, than in the S&P indexes. • Russell, to its credit, is much more transparent about its methodology than S&P. Transparency is not inherently right or wrong, but more transparency is generally better than less transparency. • Russell indexes are broader than S&P indexes. Russell’s midcap and small-cap indexes are particularly well-suited for managers seeking a broad index. • S&P indexes tend to be more conservative than Russell’s. S&P indexes have fewer post-IPO names (i.e., more seasoned companies), and this conservatism has created dissimilarities between the indexes. Finally, although some managers and sponsors combine S&P and Russell indexes, the overlap between the indexes creates a measurement problem. Simply put, combinations of S&P and Russell indexes are not optimal.
Managing to Indexes Managers use benchmarks—or indexes—for two reasons: They use them to make asset allocation and style decisions, which is an ex ante use of benchmarks,
2001, AIMR®
The Importance of Index Selection Figure 1. Number of Names as a Percent of Total Index Capitalization A. S&P 500, December 1979–September 2000 Percent of Total Capitalization 100 90 80 70 60 50 40 30 20 10 12/79
12/84
12/89
12/94
12/99
B. Russell 1000, December 1990–September 2000 Percent of Total Capitalization 90 80 70 60 50 40 30 20 10 0 12/90
12/91
12/92
12/93
12/94
12/95
12/96
12/97
12/98
12/99
12/98
12/99
C. Russell 2000, December 1990–September 2000 Percent of Total Capitalization 50
40 30 20 10 0 12/90
12/91
12/92
12/93
12/94
12/95
12/96
12/97
Top 250 Names
Top 25 Names
Top 100 Names
Top 10 Names
Top 50 Names
©2001, AIMR®
www.aimr.org • 11
Benchmarks and Attribution Analysis and for performance measurement, an ex post use of benchmarks. So, whether making an ex ante decision about allocations to a manager or an ex post decision about manager performance, an obvious question is whether the index selected is representative of the manager’s style. That is, is it the right index to use? If a plan sponsor asks a manager to manage to a benchmark that is not the one the manager normally uses, the sponsor will likely be disappointed in the results, because the manager is being asked to manage a risk budget that differs from the one the manager has historically managed. The point is that the index should be representative of the manager’s style. Another issue is that all of the indexes, on a weighted basis, selected by the sponsor should total to the sponsor’s overall fund target. I have seen cases in which a sponsor has S&P 500 mandates as well as Russell mandates, yet ensuring that such combinations add up to a coherent whole at the fund level is difficult. Some consultants believe indexes cause management difficulties associated with liquidity, turnover, and availability (cap weighting versus float weighting, for example). Such issues can be pronounced in the small-cap and midcap sectors of the marketplace. Therefore, a good question to ask when selecting an index, for both passive and active management, is whether introducing the index will lead to greater management difficulties or higher costs in managing the portfolio. Another good question is
12 • www.aimr.org
whether the selection of the index will affect manager behavior. Certainly, the choice of the index on the passive side will affect manager behavior, but it will also have an impact on behavior on the active side, particularly for structured managers. Regarding performance measurement, the plan sponsor should ask whether the index provides a meaningful comparison with the manager’s actual return and whether the manager could have passively invested in the index.
Conclusion In portfolio management, the choice of the benchmark index is surprisingly important and will become even more important as the economy and market structure evolve. Some of S&P’s biases, for example, reflect that it is more conservative than Russell, and if that trend continues, the S&P and Russell indexes will differ more in the future than they will converge. The philosophical differences between the two sets of indexes have begun to accentuate the significant differences in performance between the two. Sponsors should be cognizant that the index they have chosen may encourage manager behavior that differs from that which caused them to hire the manager in the first place. Sponsors must choose the index that maximizes manager allocation, asset allocation, and performance-measurement decisions.
2001, AIMR®
Alternatives to Broad Market Indexes Thomas M. Richards, CFA Principal Richards & Tierney, Inc. Chicago
A good benchmark should be unambiguous, investable, appropriate, and specified in advance and should contain securities for which the manager has an informed opinion. The strengths and weaknesses of four types of benchmarks (peer group comparisons, broad market indexes, investment-style indexes, and custom benchmarks) used in the institutional investment community are presented. In addition, a case study is used to discuss the selection, quality criteria, and evaluation of a good benchmark.
n this presentation, I will define a benchmark portfolio, list the properties of a valid benchmark, and review four types of benchmarks currently used in the institutional investment community. My remarks are intended to raise questions and issues about these various benchmarks with respect to their usefulness for investment managers, consultants, and fund sponsors. To address the topic of benchmark selection fully, I will provide a case study that highlights the problems of choosing a benchmark and delves into the strengths and weaknesses of the various alternatives. At the end of the presentation, I will make some recommendations that pertain to investment managers, consultants, fund sponsors, and even AIMR.
I
Definition of Benchmark At Richards & Tierney (R&T), we believe a benchmark should be an investable portfolio that incorporates the prominent fundamental and performance characteristics of an investment manager’s actual, active portfolios in the absence of active management. (Keep in mind the phrase “in the absence of active management.” It is important.) Just as fishermen have their favorite fishing holes, investment managers have their favorite places to “fish” for stocks. Thus, an investment manager’s fishing hole is analogous to his or her investment style. The portfolio or benchmark that represents this fishing hole should be investable, presumably in a passive manner. Furthermore, no systematic biases or risks should be evident between the manager’s actual, active portfolios and the bench©2001, AIMR®
mark portfolio that represents the manager’s investment style. Essentially, the beta of the active portfolio versus the benchmark should be 1.0. The performance differential between the manager’s active portfolio and the benchmark should be solely the result of the manager’s idiosyncratic skill of identifying attractive and unattractive investment opportunities, not the result of any systematic market factors unrelated to the manager’s investment process. Finally, regarding the phrase “in the absence of active management,” a manager’s investment style or benchmark should not be a clone of the manager’s active investment process. As shown in Equation 1, all portfolios, P, are exposed to three types of risk and return. The largest risk—and the one with the greatest return impact on a portfolio—is the risk of the asset category or market target, M, which is often referred to as the portfolio’s systematic risk. A second risk involves the manager’s investment style. As mentioned earlier, managers focus their investment analyses on certain types of securities. A benchmark portfolio that represents a manager’s investment style is typically different from the asset category market target. The performance differential between a manager’s benchmark, B, and the market target, M, is referred to as investment-style bias or misfit risk. This risk is generally the second largest risk in a portfolio. The third risk in a portfolio involves a manager’s investment skill. In particular, it is the risk of whether a manager can identify the better performing securities from the universe of securities followed by the firm. This risk is represented by the performance differential
www.aimr.org • 13
Benchmarks and Attribution Analysis between a manager’s actual portfolio, P, and the benchmark portfolio, B, or P – B (which equals A for active management). Thus, the relationship may be stated as P = M + (B – M) + A.
(1)
Therefore, a portfolio’s risk and return is one part market, one part investment-style bias, and one part active management.
Properties of a Valid Benchmark A valid benchmark should have five distinct properties. U na mb iguo us . The benchmark should be unambiguous, which might seem obvious, but in many cases, performance standards and/or client expectations are ambiguous. A lack of specificity almost always leads to problems. Inv e s t a ble . A valid benchmark should be investable. Often, clients and consultants specify “top-quartile” performance as the manager’s benchmark. But because no one can guarantee such performance, it is not an investable alternative. It is a viable investment objective but not a valid benchmark. A ppropri at e . A valid benchmark should be appropriate and incorporate the prominent fundamental risk and performance characteristics of the manager’s portfolio. Spe c i fi ed in A dv a nc e . For a benchmark to be valid, it needs to be specified in advance. Again, this property might seem obvious, but numerous cases exist in which the performance standard was changed after the fact—by managers, consultants, and clients. This situation is most likely to occur when the manager’s benchmark is ambiguous and uninvestable. Informed Opinion. Finally, managers should have an informed opinion about the securities included in the benchmark. Otherwise, the manager could be held accountable for the performance of securities he or she knows nothing about. The importance of this property might not be transparent, but as the case study will show, it is an essential property.
Types of Benchmarks The first type of benchmark, peer group comparisons, has been in place for a long time. Peer group comparisons are not investable; they are ambiguous; they are not specified in advance; and the composition and source of returns are unknown to all. Although I agree that top-quartile performance (a peer group
14 • www.aimr.org
comparison) is a reasonable and viable investment objective, I do not consider it to be a fair and appropriate performance evaluation standard. One of the reasons broad market indexes, the second type of benchmark, were adopted as investment benchmarks was because of the index fund movement. Fund sponsors reasoned that if an active manager could not b eat a low-cost index fund representing the total market, then the manager should not be retained. The problem with this logic is the time frame, which is usually defined as “over a market cycle.” But what is a market cycle? Investment styles come into favor and fall out of favor over various periods of time. For example, small-cap stocks fell out of favor in the early 1980s and have not yet recovered to any meaningful degree. Thus, investment-style indexes, the third type of benchmark, were introduced as a means to deflate the impact of investment style and to better assess a manager’s investment skill. Although investmentstyle indexes are a step in the right direction, they represent a “90 percent solution”; they work most of the time, but when they are most needed, they can and do fail, as the case study will demonstrate. The fourth, and final, type of benchmark is the custom benchmark. It is simply the universe of securities that are actively researched by an investment manager and weighted in a fashion to properly reflect the normative risk characteristics of the manager’s actual portfolios over time. I am not biased against market indexes, investment-style indexes, and custom benchmarks as long as they satisfy the properties I listed. Unfortunately, in many cases, broad market indexes and investment-style indexes do not satisfy the necessary properties and quality criteria. Consequently, misinformation and bad decisions can and do result.
Case Study The case study presented here centers on Investment Manager ABC, a large-cap value manager. The firm’s investment philosophy is to invest in securities that have certain dividend-yield and business characteristics. As a result, the firm’s research and portfolioselection process focuses on a unique universe of stocks. This universe consists of approximately 300 stocks and has evolved slowly over time in response to changes in companies’ dividend-yield and business characteristics. Because the firm was interested in closely monitoring its investment process and in evaluating its analysts and portfolio managers, a custom benchmark was built and has been maintained since the late 1980s. It was built by weighting the firm’s research universe of stocks based on the longterm normative risk characteristics of Manager
2001, AIMR®
Alternatives to Broad Market Indexes ABC’s actual portfolios. The stocks in the benchmark are neither market-cap weighted nor equally weighted. This custom benchmark has been used primarily for internal management and evaluation purposes. In Figure 1, the area labeled “M” represents the market, in this case the domestic equity market. If the domestic market is defined as the opportunity set available to institutional investors, it contains approximately 2,500–3,500 securities and can be represented by a variety of broad market indexes, such as the Russell 3000, the Wilshire 2500, and the R&T Institutional indexes. The securities in these broad market indexes are usually market-cap weighted. The area labeled “I” represents the securities in a generic style index. In this case, it could be the S&P 500/Barra Value Index, the Russell 1000 Value Index, or a variety of other large value indexes. These indexes contain from 400 to more than 1,000 securities, and again, the securities are typically market-cap weighted. The area labeled “U” represents the universe of securities researched by Manager ABC that are in the custom benchmark, and as mentioned earlier, it contains approximately 300 securities that are weighted in a specified fashion. Finally, the area labeled “A” represents the actual portfolio, which is a subset of “U.” The actual portfolio represents the most attractive investment opportunities within the research universe, as determined by Manager ABC’s investment analysis. The intersection of “U” and “I” represents the securities in the style index about which
Figure 1.
C hoos in g t he B e nc hm a rk . Manager ABC needs to choose a benchmark for public dissemination. The time is the early 1990s. Clients are asking about benchmarks. Investment-style indexes have been introduced to the investment community, and Manager ABC must decide which benchmark to use with clients, prospects, and consultants. One way not to choose a benchmark is to consider past performance. Some investment managers nevertheless take past performance into account and let it affect their thinking. In this case, Manager ABC
Venn Diagram of Investment Universe
A
©2001, AIMR®
Manager ABC holds an investment opinion. Much of index “I,” however, is outside “U”; these are securities in the index about which Manager ABC holds no opinion and for which he should not be held accountable. (Note that some stocks followed by Manager ABC lie outside index “I” and do not qualify for the generic index.) The intersection of “M,” “U,” and “I” represents the securities that are in the market, the style index, and Manager ABC’s universe. These are the securities that Manager ABC has analyzed and about which he presumably holds an informed investment opinion, but as can be seen, much of the market, “M,” and index, “I,” lie outside “U.” This area represents the securities that are in the market and the style index but not in Manager ABC’s universe and about which Manager ABC holds no opinion. The question I want to explore is whether these securities matter: Should Manager ABC and his clients care about these stocks?
U
I
M
www.aimr.org • 15
Benchmarks and Attribution Analysis allowed past performance to enter into his decisionmaking process. Table 1 shows four benchmark options, along with their past performance as of the early 1990s. Based on past performance, the custom benchmark and the S&P 500 Index do not appear to be good choices. That is, their past performance was much better than the two generic style indexes, so they would have been tougher indexes to beat, at least during the 1980s. Manager ABC chose to go public with the generic style indexes and a preference for the S&P 500/Barra Value Index. In fairness to Manager ABC, the decision was not entirely based on past performance. It was influenced by a variety of factors, including return correlations and general portfolio risk characteristics. The general risk characteristics of the actual portfolio, custom benchmark, and generic style indexes were similar, and the correlations of Manager ABC’s actual portfolios with the custom benchmark and generic style indexes were high. Analyzing the Benchmark Choice. Although many managers consider only past performance when deciding on a benchmark, additional analysis is needed. And although using only (1) past performance, (2) return correlations, and (3) general risk characteristics to choose a benchmark works most of the time, the problem is that the time this system does not work is usually the time when the manager most needs a good benchmark. When Genius Failed—the story of the rise and fall of Long-Term Capital Management—clearly shows that in the world of finance and investments, 90 percent solutions are not good.1 The author discusses
“fat tails” and points out that surprise, negative outcomes occur much more frequently than one might expect. One cannot be too careful. ■ Systematic risk. Earlier, I indicated that no systematic biases or risks should exist in the benchmark relative to the active portfolio. Therefore, one would expect the beta of the active portfolio relative to the benchmark to be close to 1.0. As Table 2 shows, the beta of the active portfolio relative to the generic style indexes was materially less than 1.0. The usual explanation is that generic style indexes and broad market indexes do not contain any cash, but active portfolios typically have some cash for transaction purposes. Because custom benchmarks also have some cash to reflect the active portfolio, the beta gets closer to 1.0. Other measures of potential systematic bias are the correlations shown in Table 2. The first correlation, VAM/MFT (value of active management, which is P – B, or A in Equation 1/misfit, which is B – M in Equation 1) measures the independence of the manager’s investment process relative to the manager’s investment style. What it means is that how a manager performs versus the benchmark should not be related to whether the manager’s style is in or out of favor relative to the market. One can measure this criterion with a correlation coefficient that should not be significantly different from zero. As Table 2 shows, this correlation is significantly different from zero for the generic style indexes. 1 Roger Lowenstein, When Genius Failed: The Rise and Fall of LongTerm Capital Management (New York: Random House, 2000).
Table 1. Past Performance of Four Benchmarks Benchmark
1 Year
3 Years
5 Years
10 Years
Periods ending December 1991 Custom benchmark
28.70%
15.08%
15.02%
20.46%
S&P 500/Barra Value Index
22.56
12.92
12.68
17.32
Russell 1000 Value Index
24.62
12.77
12.16
16.86
S&P 500 Index
30.51
18.38
15.29
17.50
Custom benchmark
12.69
10.29
17.02
18.90
S&P 500/Barra Value Index
10.52
8.06
14.13
16.26
Russell 1000 Value Index
13.81
9.24
14.99
16.24
7.71
10.82
15.83
16.08
Custom benchmark
13.48
18.07
14.28
17.55
S&P/Barra Value Index
18.59
17.12
13.55
15.30
Russell 1000 Value Index
18.12
18.77
14.03
15.28
S&P 500 Index
10.05
15.65
14.48
14.86
Periods ending December 1992
S&P 500 Index Periods ending December 1993
16 • www.aimr.org
2001, AIMR®
Alternatives to Broad Market Indexes Table 3. Tracking Error of Active Portfolio versus the Benchmarks
Table 2. Active Portfolio versus Benchmarks: Beta and Return Correlations Correlationsa Benchmark
Beta
VAM/ MFT
Tracking Error (standard deviation)
Benchmark
MFT/ EXC
Custom benchmark
1.92 3.33
Custom benchmark
0.98
–0.130
0.895
S&P 500/Barra Value Index
S&P 500/Barra Value Index
0.86
–0.236
0.652
Russell 1000 Value Index
3.06
0.710
S&P 500 Index
4.27
Russell 1000 Value Index a
0.91
–0.182
Significance level = 0.181.
risk, one can reasonably expect that a portfolio will track closer to its benchmark than to a broad market index because the benchmark involves only the risk and return of active investment skill and a broad market index contains both active-investment-skill risk and investment-style-bias risk. As Table 3 illustrates, the generic style indexes show improved tracking relative to that of the market. The custom benchmark has even better tracking. ■ Industry, sector, and risk exposures. As previously noted, the general risk characteristics of the active portfolio, generic style indexes, and the custom benchmark are similar. Basically, all these portfolios are large-cap, value-oriented portfolios. Although the general risk characteristics of the portfolios should be similar, verifying that no material exposures exist to any particular industry, economic sector, or risk factor is also important. For example, Figure 2 shows the weightings in the utility sector. The custom benchmark’s weighting approximately tracks the active portfolio’s weighting, which is desirable. The generic style indexes, however, seem to have a systematically greater weight than the actual portfolio. This bias can cause significant distortion. In general, generic style indexes have certain exposures and characteristics
The second correlation, MFT/EXC (misfit, or B – M/excess, or P – M) measures the relationship of the portfolio and benchmark relative to the market. In other words, if a manager’s style is in favor, one would expect both the portfolio and the benchmark to perform better than the market. When it is out of favor, one would expect them to perform worse than the market. Again, this criterion can be measured with a correlation coefficient that should be positive and close to 1.0. ■ Tracking error. Another measure of the goodness of a benchmark is its tracking error. A good benchmark should reduce some of the “noise” in the performance evaluation process. That is, the volatility of an active portfolio relative to its benchmark should be less than the volatility of the active portfolio versus a broad market index. Otherwise, the benchmark is not capturing any of the manager’s investment-style risk. As noted earlier, a portfolio’s risk and return relative to the market is one part investment-style bias and one part investment management skill, and the purpose of a benchmark is to capture a manager’s investment-style bias or risk. Therefore, if tracking error is used as a measure of
Figure 2. Value Manager Economic Sector Weight: Utilities, 1992–2000 Portfolio Weight (%) 30 10
20 0 1/92
1/93
1/94
10
0 1/92
1/95
Russell 1000 Value S&P/Barra Value
1/93
1/94
1/95
Russell 1000 Value S&P/Barra Value
©2001, AIMR®
1/96
1/96
1/97
1/98
1/99
Custom Benchmark Actual
1/97
1/98
1/99 9/00
Custom Benchmark Actual
www.aimr.org • 17
Benchmarks and Attribution Analysis that are somewhat unique relative to what is seen in many investment managers’ portfolios. This situation results largely because the weightings found in most active managers’ portfolios are less extreme than the style indexes capitalization weightings. Keep in mind that the objective of a good benchmark is to eliminate all systematic biases and risks, not to eliminate all risks. Active management risk should exist. What is desirable is that the risks that exist in a manager’s portfolio relative to the benchmark represent investment opportunities as perceived by the investment manager. ■ Coverage. In Figure 1, I showed the relationship among the securities in the various portfolios (i.e., the market, the generic style indexes, the custom benchmark, and the active portfolio) and pointed out that many of the securities in the market and in the style indexes lie outside the manager’s universe. These are securities about which Manager ABC holds no opinion but for which he is held accountable if the benchmark is either the broad market or a generic style index. The question I raised was: “Do these securities matter as long as the general risk characteristics of the benchmark and the active portfolio are the same?” This question captures the issue of coverage. Figure 3 illustrates the coverage ratios of the two generic style indexes versus the manager’s universe. Historically, 25–50 percent of the securities in the two generic style indexes were not in Manager ABC’s world or universe. The percentage is much greater relative to a broad market index. I cannot speak for anyone else, but I would not want to be held accountable to a benchmark for which I covered only half of the securities. As a rule of thumb, we prefer to see coverage ratios indicating that only 15 percent or less of the benchmark securities are not researched by the manager.
Figure 4 highlights the annual performance differentials of the stocks that were in the two style indexes but not in Manager ABC’s world (i.e., his custom benchmark). The data for Figure 4 were obtained by calculating the performance of the securities that were in the generic style indexes but not in Manager ABC’s world and then comparing the performance of these securities with the performance of the custom benchmark. In the 1988–90 period, the performance differentials were negative in the range of 5 percent, and in the 1991–93 period, the performance differentials were positive in the range of 5 percent. In the 1994–98 period, the performance differentials were minimal. So, through 1998, one might conclude that coverage does not matter. The performance impact of the stocks not researched by Manager ABC was not great. But in 1999, the performance differential was very large (about 20–25 percent), which had a significant impact on Manager ABC. In fact, 1999 was an unusual year in many respects. The domestic equity broad market indexes were up 20–21 percent; however, a wide spread of returns existed within the market. As Table 4 shows, the performance differential between large growth and large value was about 55 percent; between small growth and small value, it was more than 60 percent. In addition, within an investment style, a wide spread of returns existed. For example, in the largecap value investment style, the S&P 500/Barra Value Index was up 12.72 percent, the Russell 1000 Value Index was up 7.35 percent, and the R&T Large Value Index was down nearly 12 percent. So, a spread of 25 percent existed within this investment style, yet these indexes are supposed to be measuring investment skill. In short, 1999 was a narrow market. Individual securities mattered.
Figure 3. Percentage of Stocks in Style Indexes Not Researched by Manager, December 1987–December 1999 Percent 60 50 40 30 20 10 0 12/87 12/88 12/89 12/90 12/91 12/92 12/93 12/94 12/95 12/96 12/97 12/98 12/99 Russell 1000 Value
18 • www.aimr.org
S&P/Barra Value
2001, AIMR®
Alternatives to Broad Market Indexes Figure 4. Return Differentials of Stocks in Style Indexes Not Researched by Manager, 1988–99 Return Differential (%) 30 25 20 15 10 5 0 –5 –10 88
89
90
91
92
93
Russell 1000 Value
Table 4. Returns for Investment-Style Indexes and Large-Cap Value Style Benchmarks, 1999 Item
Return
Investment styles Large growth Large value
44.12% –11.63
Small growth
55.34
Small value
–5.29
Large-cap value benchmarks S&P 500/Barra Value Index Russell 1000 Value Index R&T Large Value Index
12.72 7.35 –11.63
■ Manager ABC results for 1999. In 1999, Manager ABC’s custom benchmark was up about 2 percent. Manager ABC’s clients’ portfolios were in the range of 0–2 percent, essentially in line with the custom benchmark. That performance, which was not particularly good in absolute terms, was in line with expectations relative to the custom benchmark, but it was not in line with the broad market indexes or the two generic style indexes I have been discussing. Manager ABC’s active portfolios significantly underperformed the market and the generic style indexes. Additionally, the return series of these various benchmarks was materially different from the one shown in Table 1. The custom benchmark no longer outperformed the generic benchmarks. If Manager ABC had his druthers, surely he would have pre©2001, AIMR®
94
95
96
97
98
99
S&P/Barra Value
ferred for clients to evaluate his 1999 performance relative to the custom benchmark, but changing horses in midstream became impossible. As a result, Manager ABC lost a lot of business. Unfortunately, the business was lost because of random noise in the evaluation standard, not because the manager lacked investment skill.
Suggestions The case study illustrates some important points. What happened in 1999 can and will happen again. “Once-in-a-century” floods have a way of happening more than once a century. The investment world is replete with “fat-tail” outcomes. Money managers should thus be serious about their benchmarks and be willing to assume ownership. Knowing their world and the details of what they are being held accountable to are crucial. If they are going to let clients and consultants specify their benchmark, they should be prepared to structure portfolios that perform relative to a variety of benchmarks. Unfortunately, doing so will cause considerable return dispersion among active portfolios, which many consultants and clients will dislike. Additionally, AIMR needs to recognize that managers can have different benchmarks within a particular discipline. As a result, portfolio returns will be different. Therefore, managers should be required to publish benchmark returns along with active returns. Instead of aggregating returns across
www.aimr.org • 19
Benchmarks and Attribution Analysis accounts, managers should provide individual account returns along with the benchmark returns and related information. With respect to fund sponsors, they should recognize that investment managers are in the best position to define their benchmarks, and fund sponsors should be willing to listen to their recommendations. Fund sponsors should conduct in-depth analyses of whichever benchmarks are agreed to by both parties. Conducting ongoing quality control analyses of chosen benchmarks and monitoring and controlling aggregate investment risks should also be priorities.
Conclusion Investment managers need to be extremely careful and serious about which benchmarks they specify or adopt. In the past, managers were willing to agree to anything as long as they captured the account. Once the account came in-house, nothing would change internally and lots of dancing would be done to keep the account. Unfortunately, this practice generated too much unnecessary hiring and firing of managers. Investment managers must understand their world and adapt to their clients’ world. Fund sponsors need to better understand benchmark portfolios. If they are going to employ active
20 • www.aimr.org
managers, they have an obligation to appreciate the potential problems in whichever benchmark is adopted. Rather than wasting time hiring and firing managers, fund sponsors should focus on monitoring and controlling their fund’s aggregate investment risks. AIMR should rethink its investment performance reporting standards. It needs to appreciate the implications of the plethora of benchmarks across all investment disciplines. For example, a large-cap value manager whose client specifies the S&P 500 as a benchmark should produce a portfolio whose returns are materially different from those of a manager whose client specifies a large-cap value generic index or a custom benchmark as the benchmark. Also, consultants should understand the strengths and weaknesses of various benchmarks. They should not be cavalier about specifying a manager’s benchmark and think that they know more about the manager’s process than the manager does. Finally, caveat emptor: All parties—managers, sponsors, and consultants—should grasp that even against the most appropriate benchmarks, skillful investment managers will underperform, even for extended periods of time.
2001, AIMR®
Return, Risk, and Performance Attribution Kevin Terhaar, CFA Executive Director Brinson Partners Chicago
To identify sources of manager skill and added value, portfolio managers and analysts must strive for consistency between the investment process and the performance attribution analysis used to evaluate the process. Otherwise, the attribution can yield erroneous results. Three examples explain why consistency is paramount in deciphering risk-adjusted performance and evaluating investment expertise.
n this presentation, I provide three examples to explain why, when evaluating performance, the attribution must be consistent with the investment process, the risk modeling, and all other aspects of the portfolio management process. If they are not consistent, some bizarre, or even erroneous, results can occur. In all three detailed, but fictitious, examples, I ignore currency effects in order to simplify the exposition. I also assume that the returns are continuously compounded, or logarithmic, so that I can aggregate returns through simple addition. Performance evaluation and attribution should accurately reflect the decision-making process and should operate within the context of relevant and controllable variables. Performance analysis is not helpful if the attribution is to an area that the portfolio manager has no control over. The results should be attributed to those responsible for making the investment decision, and the credit for outperformance and the responsibility for underperformance should be given to whomever it is due. Evaluating performance is a backward-looking, retrospective process. If a manager wants to measure the outcome of certain investment choices, he uses the historical weights and data on the returns and risks of the portfolio for the evaluation period. Strategy, however, is a forward-looking, or forecastingbased, exercise. Managers set active strategy weights for specific positions in order to garner added value. Therefore, managers accept more risk when they believe the risk–return trade-off is going to be positive. The purpose of performance evaluation is to determine whether the trade-offs made by portfolio
I
©2001, AIMR®
managers add value or whether they add risks without adding return.
Example 1 This first example illustrates the importance of performance evaluation being consistent with the investment process. Consider an international equity portfolio with a two-country benchmark: 50 percent in France and 50 percent in Germany. For the performance-measurement period, the passive, or index, return for France was 6.0 percent; the German index return was 6.7 percent. Thus, the benchmark return was 6.35 percent. The portfolio manager took an active bet in both of these countries. She was overweight France, with 53 percent exposure versus 50 percent for the benchmark, and underweight Germany, with 47 percent exposure versus 50 percent for the benchmark. This portfolio manager was thus overweight the poorer performing market and underweight the better performing market, and therefore, an attribution along the country dimension shows that country selection was negative: Country selection = (Active weight of France French market return)
×
+ (Active weight of Germany German market return)
×
×
= (3% 600 basis points) + (–3% 670 bps)
×
= –2 bps.
The manager had roughly a 2 basis point detraction from return based on her country selection.
www.aimr.org • 21
Benchmarks and Attribution Analysis Given an actual portfolio return of 6.45 percent, the outperformance was 10 bps. How did the portfolio add 10 bps of value? Country selection had a negative impact of 2 bps, so the plan sponsor (or analyst) could attribute a positive return of 12 bps to security selection. In this case, the stock pickers should be the ones who get the credit for outperformance and receive the big bonuses because they were responsible for the added value. But what if the plan sponsor considers the breakdown of the benchmark along industry dimensions rather than along country dimensions? The benchmark is weighted 40 percent in autos and 60 percent in semiconductors; the auto industry return was 4.75 percent, which was well below the semiconductor industry return of 7.42 percent. Taking those passive industry weights and multiplying them by the index returns in autos and semiconductors also generates the 6.35 percent return of the benchmark: 40%(475 bps) + 60%(742 bps) = 6.35%.
Industry selection = (Active weight of autos × Auto return) + (Active weight of semiconductors × Semiconductor return) = (2% × 475 bps) + (–2% × 742 bps) = –5 bps.
Recall that the portfolio’s actual return was 6.45 percent (i.e., added value of 10 bps). The portfolio lost 5 bps of return because of poor industry selection. Therefore, 15 bps must have been earned from security selection. This outcome seems to reinforce the fact that the portfolio manager’s macrobets were ill chosen, even though the stock pickers did a great job of adding value. The problem with such a conclusion is that the portfolio was not managed according to just a country decision or an industry decision; it was managed along both dimensions simultaneously. Table 1 shows the benchmark weights by industry and country, and Table 2 shows the index returns by industry
Table 1. Benchmark Weights by Industry and Country France
Germany
Total
Autos
25%
15%
40%
Semiconductors
25
35
60
50%
50%
Total
22 • www.aimr.org
Table 2. Returns by Industry and Country Industry
France
Germany
Total
Autos
4.00%
6.00%
4.75%
Semiconductors
8.00
7.00
7.42
Combined
6.00
6.70
6.35
Table 3. Manager’s Active Bets: Industry and Country Industry Autos Semiconductors
The manager’s active strategy was to overweight autos by 2 percent and to underweight semiconductors by 2 percent, contrary to the relative returns. As a result, the attribution shows negative industry selection:
Industry
and country. French autos had the lowest return (4 percent), but French semiconductors had the highest return (8 percent). German semiconductors had a better return than both German and French autos. The manager’s bets by industry and country are given in Table 3.
100%
Total
France –3%
Germany 5%
6
–8
3%
–3%
Total 2% –2
If the benchmark weights from Table 1 are applied to the index returns in Table 2, the actual benchmark return of 6.35 percent can be computed (shown in Table 2): 25%(400 bps) + 25%(800 bps) + 15%(600 bps) + 35%(700 bps) = 6.35%.
This decomposition allows the plan sponsor to look at both industry and country dimensions concurrently, as the portfolio manager did. Remember that the manager was 3 percent overweight in France and 3 percent underweight in Germany, and she was 2 percent overweight in autos and 2 percent underweight in semiconductors. The manager’s semiconductor and auto bets, or positions, were mostly consistent with the simultaneous country and industry returns. Within semiconductors, the portfolio was overweight France and underweight Germany. Table 2 shows that in terms of return, the French semiconductor industry was better performing than the German. Within autos, the portfolio was overweight in Germany and underweight in France, which was also consistent with the returns (higher returns for German autos than French autos). The one area where the portfolio was not consistent with relative returns was within Germany, where autos were overweighted and semiconductors were underweighted. For this period, German semiconductors outperformed German autos. The decision to underweight the outperforming industry hurt the portfolio’s performance. Multiplying the returns in France and Germany in the auto and semiconductor industries by the
2001, AIMR®
Return, Risk, and Performance Attribution manager’s actual weights, or positions, the plan sponsor can calculate the portfolio’s actual return— 6.45 percent, as shown in Table 4. This analysis indicates that there was no positive security selection; the added value came from joint industry and country strategies. When the attribution is done by industry or country alone, the stock pickers are rewarded for what appears to be positive security selection, and the country and industry analysts are blamed for what appears to be poor country and industry selection. In other words, the first two attributions rewarded the wrong people. The attributions were not consistent with the investment process, which took into consideration both industries and countries when setting strategy. In this admittedly contrived example, the first two attributions assumed an investment process in which the country and industry selection happened first, followed by the security selection within the industries or countries. But if the portfolio is not managed in such a hierarchical manner, the performance attribution will point to the wrong conclusion. If the portfolio is managed by industry analysts across countries, as this portfolio was, then performance must be evaluated within that framework. An attribution that slices the performance results in ways that are inconsistent with the management process can lead to erroneous results.
Example 2 This example examines the decisions of a European equity analyst who is responsible for consumer products companies in a non-U.S. portfolio. The equity analyst’s manager is interested in the analyst’s stockpicking ability. To get this information, the manager wants to evaluate the analyst’s performance solely based on security selection, so he needs to control for market exposure. In other words, he must eliminate any influences that are solely the result of the analyst’s market risk exposures. The analyst should not be moving into high-beta stocks simply because her market view is positive; nor should she be moving into low-beta stocks when she believes a market decline is forthcoming if these market “calls” are not hers to make. Consequently, the design of an attribu-
tion system or a performance evaluation system should control for, or ignore, any bets an analyst might take in a beta or market dimension. The attribution system must be designed so that all market effects attributable to beta adjustments are removed. This provision allows the manager to strip out the market effects, leaving only the alpha portion—the superior specific return. The analyst can then be judged on her ability to generate alpha instead of the generation of total value added. The capital asset pricing model (CAPM) states that Ri – Rf = αi + βi,m(Rm – Rf),
where Ri = the return of security i Rf = the risk-free rate αi = alpha (or specific return) for security i βi,m = beta (or systematic risk) for security i relative to market m Rm = return of the market In other words, the excess return on any asset in excess of the risk-free rate is equal to alpha plus the stock’s systematic excess return. To evaluate whether the analyst has produced any alpha, the systematic return (beta times the market excess return) must be stripped off the total return. Suppose that one of the stocks the analyst picked was Unilever. The company has dual headquarters and market listings in the Netherlands and in the United Kingdom. For an analyst looking at Unilever, the fundamental characteristics of the company—the cash flows, earnings, dividends, and so on—should be invariant to trading location. And whether the stock trades in Amsterdam in euros or in London in sterling should not have any effect on Unilever’s business or its evaluation by the analyst. As a result, the decision to hold Unilever, and the subsequent performance attribution, should be unaffected by the choice of trading location. But it is not. The analyst can potentially influence the performance evaluation system through the choice of which Unilever shares—London’s or the Netherlands’—to hold in the portfolio. When the time comes for the evaluation of security selection, alpha will be
Table 4. Manager’s Active Bets and Index Returns: Industry and Country Industry
France (active weight × return)
Autos
0.22 × 4.0 = 0.88
+
0.20 × 6.0 = 1.20
=
Semiconductors
0.31 × 8.0 = 2.48
+
0.27 × 7.0 = 1.89
=
4.37
=
6.45%
Total
©2001, AIMR®
3.36%
Germany (active weight × return)
+
3.09%
Total 2.08%
www.aimr.org • 23
Benchmarks and Attribution Analysis calculated by taking out the cost of market impact. The savvy analyst knows that the market baskets of the United Kingdom and the Netherlands differ: Because of their dissimilar compositions, the British and the Dutch equity markets do not always behave in similar ways. Nonetheless, these two markets can be compared by using 10 years of monthly data and calculating the betas of Unilever in the United Kingdom and the Netherlands against the respective MSCI (local) country indexes. The resulting historical beta in the United Kingdom is 0.8, and in the Netherlands, 0.9. Using these values, the alpha is twice as large in the United Kingdom as in the Netherlands. Thus, if an analyst suspected the return on Unilever to be similar in each market, it would be to the analyst’s advantage to pick either the low-beta or the high-beta market, depending on the analyst’s expectations for market return. Such a strategy can produce an alpha that magnifies an already good security selection. Thus, betas influence alphas and can produce apparently outstanding security selection ability. Any inconsistencies in measuring risk characteristics of the stock, the portfolio, and the market can affect the outcome of the equity analyst’s performance attribution. In thinking about this example, one might ask whether the local market, either British or Dutch, is in fact the market that should be used in the evaluation of the analyst’s performance. The answer is almost certainly no. Because this is a European consumer products analyst, the better choice is likely to be an index of European consumer products companies. This answer logically leads to the next question: Was the decision to overweight Unilever a good one compared with the other European consumer products companies vying for the analyst’s recommendation? Comparing the analyst against an appropriate industry or market index solves one problem but can raise other issues of concern. The performance attribution must account for any responsibility the analyst may have had in determining the overall portfolio’s exposure to consumer products companies. If the analyst had input into the degree of exposure of European consumer products companies in the non-U.S. equity portfolio—as opposed to that decision being made at a more senior portfolio management level—then the adjustment of the analyst’s selections for beta effects might be exactly the wrong thing to do. To clarify, consider that exposure to consumer products companies can be added to the portfolio in two ways. One is to increase the allocation or weight of these companies in the overall portfolio. If the
24 • www.aimr.org
portfolio is overweight consumer products companies on the recommendation of the analyst, the resulting performance can and should be attributed to the analyst. The other way to add exposure in consumer products companies is to let the analyst put consumer products stocks with high betas into the portfolio. In this situation, stripping out the beta contribution to return in the performance attribution would be incorrect, because the analyst had the mandate to choose high-beta stocks. Clearly, many issues need to be considered when designing an equity performance attribution so that the information culled from the data is correct in light of the specific investment process and responsibility for decision-making in the portfolio.
Example 3 This third, and final, example focuses on riskadjusted performance, specifically on the information ratio. Because two definitions of the information ratio exist, I want to highlight the importance of clarity in communication and terminology. All participants must understand what is contained in the performance evaluation and compliance materials used within their organization, and relevant terms should be precise and well defined. In this example, I use a balanced benchmark that has the characteristics shown in Table 5. The benchmark return was 9.25 percent, and the correlation between equities and bonds was 0.35. Using the correlation, weights, and risk numbers, the benchmark volatility can be calculated as slightly more than 11 percent.
Table 5. Benchmark Characteristics Asset Class
Weight
Equity
70%
Bonds
30
Return
Risk
10.00%
15%
7.50
5
Suppose the manager was underweight in equities and overweight in fixed income in the portfolio, as shown in Table 6. Table 5 shows the passive return of equities to be 10 percent and the passive return of fixed income to be 7.5 percent; thus, the active strategy adversely affected the return of the portfolio. The actual return in the equity portion of the portfolio was 11 percent—100 bps greater than the passive benchmark return. The equity portfolio had a tracking error (active risk) of 3 percent and earned 100 bps, for a risk–return trade-off of 0.33. Thus, the equity manager added 33 bps of return for every 100 bps of risk taken in that portion of the portfolio.
2001, AIMR®
Return, Risk, and Performance Attribution Table 6. Portfolio Strategy, Return, and Risk Asset Class
Weight
Return
Equity
40%
11.00%
Bonds
60
8.00
Active Risk 3.00% 1.00
Fixed-income management also added value. Because the actual return in the bond portfolio was 8 percent, versus 7.5 percent for the benchmark, the fixed-income portion of the portfolio added 50 bps. The tracking error in bonds was 1 percent, so the fixed-income portfolio had a better risk–return tradeoff (0.5) than did the equity portfolio. Thus, the information ratio for bonds in this portfolio was 0.5, and for equities, 0.33. Using the actual equity and fixed-income returns and the allocation information, one can see that the return of the overall portfolio was 9.2 percent. The benchmark return was 9.25 percent, so the portfolio underperformed the benchmark by 5 bps. Even though the portfolio’s equity and bond positions did well, the asset allocation strategy of underweighting equities, which outperformed bonds by a wide margin, hurt the portfolio return; however, the underweighting in equities reduced the total risk of the portfolio. Remember that the correlation between equities and bonds was 0.35 and that equities had three times the risk of fixed income. The benchmark’s volatility was 11.1 percent, but the portfolio’s total volatility, or absolute risk, was 7.7 percent. The manager thus decreased the return of the portfolio slightly, by 5 bps, while improving the overall risk of the portfolio rather dramatically, which means that the portfolio’s Sharpe ratio was much higher than that of the benchmark. The Sharpe ratio is the riskadjusted excess return of the portfolio, where risk is measured on an absolute basis as opposed to a relative basis (i.e., benchmark). Additional risk statistics can also be calculated for the portfolio. Although familiarity with betas of individual stocks relative to an equity index is widespread, betas can also be calculated for fixed-income securities relative to a benchmark or for a balanced portfolio against a benchmark portfolio. Beta is simply the price sensitivity of one instrument, or group of instruments, relative to the price sensitivity of another instrument, or group of instruments. With the reduction in overall portfolio risk from 11.1 percent to 7.7 percent, the manager reduced the beta for the portfolio relative to the benchmark from 1.00 to 0.65, which is a significant reduction. In this case, the 0.65 means that for every 1 percent gain (loss) in the benchmark, the manager can expect the portfolio to gain (lose) only 65 bps because of systematic risk. ©2001, AIMR®
Referring back to the CAPM equation, one can see that the return on the portfolio is the result of two components: alpha, which is not benchmark related, and beta, which is benchmark related. Thus, in risk terms, the portfolio can have two types of risk. The manager can take bets, or assume risks, relative to the benchmark or take bets, or assume risks, that are unrelated to the benchmark. Tracking error combines both those risks into one number; however, it can be split into two pieces. One piece is the risk that results from having a beta different from 1.00. In the example just described, the risk of underweighting equities and overweighting bonds led to a beta of 0.65. The second part of tracking error is the residual risk, which is related more to security selection or security-specific sources. And just as active risk can be decomposed into benchmark-related and residual components, active return (or value added) can too. As stated earlier, the main point of this example is to evaluate the information ratio of the portfolio. Remember that the portfolio had 5 bps of negative added value. It also had a tracking error of 4.7 percent and residual risk of 2.7 percent. The information ratio can be calculated two ways. One calculation involves taking the added value, the negative 5 bps, and dividing it by tracking error. This methodology produces an information ratio of –0.01, which indicates that the risk taken by the manager relative to the benchmark went unrewarded. Alternatively, if the information ratio is calculated using alpha (residual return) and residual risk, the information ratio is high. The alpha for the portfolio was 1.08, and the residual risk was 2.7 percent, which yields an information ratio of 0.40. This information ratio falls between the risk-adjusted return numbers for the equity portion of the portfolio, 0.33, and the bond portion, 0.50. Risk-related portfolio attribution systems can use either method to calculate an information ratio, and each produces quite different feedback. In this example, the same portfolio generated both a negative information ratio and a positive information ratio. One measure conveys a complete absence of skill, whereas the other indicates good selection ability on the part of the portfolio manager. Both approaches are valid, but in the world of investment consulting, calculating the information ratio by using the value added return to tracking error is probably more common than using alpha to residual risk. This preference is because tracking error is the total risk taken relative to the benchmark in the effort to add value, and thus, the measure of added value shows the amount of return the portfolio earned from the risk taken.
www.aimr.org • 25
Benchmarks and Attribution Analysis
Conclusion Performance attribution must measure and be consistent with the relevant variables in investment decision making. If country selection or industry selection is not a relevant decision variable, then performing an attribution on the basis of country or industry variables can produce completely erroneous results. Furthermore, the dimensions along which performance is measured should be controllable. If portfolio managers or analysts cannot control certain aspects of the investment process and have no influ-
26 • www.aimr.org
ence in the original decision, then saddling them with the responsibility for the decision—either penalizing them for a bad outcome or rewarding them for a good outcome—is improper. Finally, someone must be explicitly accountable for decisions with respect to major sources of risk and return. If a source of risk exists in the portfolio that the portfolio manager has control over—that is, a risk that the manager can increase, minimize, or eliminate (by holding the benchmark)—then the manager needs to be explicitly responsible for that risk and its effect on the portfolio.
2001, AIMR®
Return, Risk, and Performance Attribution
Question and Answer Session Kevin Terhaar, CFA Question: How common is the problem discussed in Example 1— a mismatch between the benchmark and the actual management process? Terhaar: Example 1 reflects the changes under way at Brinson Partners. We have seen the global equity markets become more industry focused and less topdown country focused. Our analysts are currently set up to be industry analysts across countries, but if you look at the typical global equity risk and attribution process, it was built as a top-down country model first, so country decisions are given primary importance in both risk modeling and in return attribution. Thus, using that approach to do performance attribution for one of our analysts can give misleading results. Question: In Example 1, you initially found 10 bps of added value from stock selection, but then it dissipated. Where did it go? Terhaar: The added value dissipated because of the actual allocations within French and German autos and French and German semiconductors. When I ignored everything except the country decision, it looked like the manager made a poor country decision, given the French and German returns, because she overweighted France and underweighted Germany. When I ignored everything other than the industry decisions, it looked like the manager made a poor decision, given the index
©2001, AIMR®
returns, on the industry allocation; she overweighted autos and underweighted semiconductors. There was no currency effect, so if the added value was not from the country or industry decisions, it must have been from stock selection. But when I looked at the combined overweighted French semiconductors, which was the highest return industry in either country, and the underweighted French autos, which was the lowest return industry in either country, I could see that those joint industry and country decisions accounted for all the added value and that stock selection did not add any value. Question: How can you go from tracking error to residual risk? Terhaar: Put simply, the answer is A2 + B2 = C2. Total tracking error is C2, and say the benchmark risk is B2; then, A2 is the residual risk because it is everything unrelated to the benchmark. By taking the total active risk (tracking error), squaring it, and subtracting the squared benchmark (beta) risk, you get the squared residual risk. Question: If an analyst adds value by predicting specific country betas, why is that analyst not rewarded? Terhaar: If it is the analyst’s decision to make, then the analyst should be rewarded. But if predicting country betas (or more realistically, making bets on overall market performance) is not the
analyst’s decision to make, you probably do not want that person taking that risk. If you do not want that person to take that risk, you shouldn’t reward him or her for it. Ultimately, the answer depends on how the portfolio is managed—that is, where the responsibility for the decision lies. Does the responsibility lie with the analyst or with the portfolio manager? If it lies with the portfolio manager, you do not the want the analyst taking that kind of bet. And you do not want to reward the analyst for taking the bet if he or she is not responsible for it; analysts can introduce risks into the portfolio that you have no control over, and you may not realize it until too late. Question: Is attribution related only to performance measurement, or is it also related to risk management? Terhaar: Attribution is related to risk management. At Brinson Partners, we have two distinct groups: risk management and performance evaluation. Ultimately, we’d like to have a system that incorporates our views of risk, the portfolio investment process, and the management process so that all three things work together to tell us what we need to know about who makes what decisions and whether they are good or bad decisions. Making the investment process identify the risks you want to take is important, and the performance evaluation can then show whether the risks that were taken were adequately compensated.
www.aimr.org • 27
Benchmarks and Attribution Analysis for the Total Fund Jeffery V. Bailey, CFA Director, Benefits Finance Target Corporation Minneapolis
Total fund benchmarks and performance evaluation play crucial roles in setting and monitoring investment policy. Performance evaluation includes three steps: performance measurement, attribution, and appraisal. Macro-attribution, in particular (despite its disadvantages), offers a framework for evaluating the value added by each primary investment policy decision in an investment program. An example of macro-attribution highlights the pros and cons in applying the process and analyzing the feedback.
arget Corporation is a typical plan sponsor, in that equities are a large component of its pension fund investment program; however, Target is concerned about more than just the investment management of its equities portfolio. Like most plan sponsors, Target maintains a global portfolio composed of a variety of asset classes. Consequently, being able to evaluate the performance of the pension fund’s total portfolio and each separate asset class is important. The focus of this presentation is performance attribution for the total fund, but I will begin by discussing the basic elements of performance evaluation. I will then explore performance attribution at the “macrolevel” in some detail. To illustrate the macro-attribution process, I will present a case study that I will use as the basis of a performance appraisal. Finally, I will end with a discussion of total fund benchmarks and the benefits and drawbacks of macro-attribution.
Performance Measurement. Performance measurement, which involves calculating account and benchmark returns, seeks to answer the simple question: What was the account’s performance? The concept of calculating time-weighted returns on benchmarks and accounts has been around for a long time. Only recently, however, have account owners attempted to gain a broader understanding of how and why an account achieved a certain level of performance relative to an assigned benchmark.
Performance Evaluation
Performance Appraisal. Performance appraisal, the most complex of the three elements of performance evaluation, attempts to answer the question: Was the account’s performance caused by luck or skill? Attribution analysis might explain that the account’s results were caused by superior stock selection, but that answer is incomplete. The plan sponsor wants to assess the likelihood that the account’s superior performance was caused by sustainable management skill rather than the manager’s good
T
Performance evaluation is a broad concept that involves measuring, dissecting, and critiquing the performance of an investment portfolio. It takes an informed look at the past in an attempt to understand the quality of investment results. Performance evaluation comprises three elements: performance measurement, performance attribution, and performance appraisal.
28 • www.aimr.org
Performance Attribution. Performance attribution tries to answer the question: Why did the account produce the observed performance? The sponsor seeks to explain the causes of differences between the account’s return and the return of the benchmark. Performance attribution extends the results of performance measurement to investigate both the sources of the account’s relative performance and the relative contributions of those sources.
2001, AIMR®
Benchmarks and Attribution Analysis for the Total Fund fortune. Keep in mind that the odds of a truly superior manager underperforming his or her benchmark for a sustained period of time (e.g., three to five years) are surprisingly high. Past performance does not say anything definitive about future performance. If a plan sponsor believes a manager to be skillful, then the sponsor should retain that manager regardless of past performance. But to make this determination, the sponsor has to devise a way to distinguish between manager skill and luck—a difficult challenge but one that sponsors constantly face.
Performance Attribution Performance attribution requires an appropriate benchmark and an analytical framework for decomposing the account’s performance relative to the specified benchmark. This framework, however, must correspond to the particular problem being addressed. For example, when an individual investment manager is under review by a plan sponsor, the sponsor could conduct an attribution analysis on the manager’s performance. The framework could be a multifactor risk model. The manager’s exposures to the risk-model factors could be broken out, and the benchmark exposures to those same risk-model factors could be identified. Then, the total performance relative to the benchmark could be attributed to those differences between the manager’s and the benchmark’s risk exposures. Such an approach might be reasonable if the sponsor is evaluating the performance of all the managers. But if the sponsor attempts to discuss the results of the analysis with a particular manager who is unfamiliar with the riskmodel-based analysis, then the discussion will likely be unproductive. Much of the attribution work on individual managers’ portfolios is done from the perspective of sector selection or stock selection. As such, the attribution breaks down performance by how the manager weighted sectors within his or her portfolio relative to the benchmark’s sector weights or by how the manager chose securities within sectors relative to the performance of the corresponding sectors of the benchmark. Most investment managers are attuned to this type of analysis. Yet for a plan sponsor to carry out this analysis without understanding the manager’s underlying strategies for the period studied is fruitless. The attribution framework should be consistent with the context of the investment issues being analyzed. In that spirit, performance attribution conducted at the plan sponsor level (which I will refer to as macro-attribution) should relate to key investment policy decisions that pertain to allocations among ©2001, AIMR®
weights and asset categories, investment styles, and investment managers. On a macrolevel, an attribution system is needed that takes into account this high-level decision-making process. Macro-attribution should quantify the consequences of investment policy decisions for the performance of the plan’s assets not only in terms of returns but also in terms of dollars. The plan sponsor can say that the outcome of its tactical asset allocation decisions of overweighting stocks and underweighting bonds has added 50 basis points to the total fund return, but the company’s senior managers would rather hear that these decisions have added $50 million to the fund’s performance. Investment professionals often focus so single-mindedly on returns that they forget the rest of the world thinks in terms of dollars.
Macro-Attribution Inputs The macro-attribution process requires three categories of inputs: policy allocations; asset category and manager benchmark returns; and account valuations, cash flows, and returns. Policy Allocations. Macro-attribution requires that the plan sponsor set policy allocations. These are the “normal” weightings assigned to various asset categories and investment managers. For example, a sponsor might select a 60 percent equities/40 percent bonds mix as the fund’s long-term asset allocation. Policy allocations define the neutral positions that the plan sponsor desires to hold in the absence of information about the near-term relative attractiveness of particular asset categories or managers. These allocations depend on the plan sponsor’s risk tolerance and long-term risk–reward expectations with regard to the capital markets and individual managers. Policy allocations apply not only to asset categories but also to individual managers. A plan sponsor may conduct an asset/liability study to develop the long-term policy allocations for individual asset categories, but an additional analysis should be carried out to develop the long-term allocations among the plan’s investment managers. As I will discuss later, the policy allocations assigned to the fund’s managers should, when combined, provide a coherent whole that makes economic sense for the plan. Asset Category and Manager Benchmark Returns. Macro-attribution also requires obtaining asset category and manager benchmark returns. A benchmark is a set of securities and associated weights representing the persistent and prominent characteristics of an asset category or a manager’s
www.aimr.org • 29
Benchmarks and Attribution Analysis investment style. Benchmarks can be thought of as index funds designed to represent the asset category or manager style. As Thomas Richards discussed, benchmarks should be unambiguous, investable, measurable, and specified in advance, and they should reflect ownership.1 Account Valuations, Cash Flows, and Returns. The final inputs for macro-attribution are account valuations, cash flows, and returns. To compute a macro-attribution in a dollar format, the size of the investment positions and the movements of cash into and out of those positions are needed.
Macro-Attribution Example Macro-attribution defines the various levels of investment policy decision making into which the total fund’s performance is decomposed. Each of these levels actually represents an investable alternative (or investment strategy) for the sponsor. I will refer to these investment policy decision-making levels collectively as the investment policy hierarchy. In concept, macro-attribution is simple. The sponsor has a beginning-period and an endingperiod asset value for the total fund. Macro-attribution explains how the total fund’s value changed in the intervening period between the beginning- and ending-period dates by attributing that change to the various investment policy hierarchy levels. The components of the investment policy hierarchy that I will consider are net contributions, risk-free assets, investment policy (i.e., asset categories), benchmark misfit (style bias), managers’ contribution, and allocation tactics. Net Contributions. Table 1 shows the changes in asset value for a fund over 11.5 years, from December 31, 1988, to June 30, 2000. The beginning value of this fund was $296.1 million, and the ending value 1
See Mr. Richard’s presentation in this proceedings.
was $1.0351 billion. The objective of the macro-attribution analysis is to explain how the fund grew by $739 million in 11.5 years. At the first level of the macro-attribution analysis (net contributions), the focus is solely on the cash that is flowing into and out of the total fund. According to Table 1, in the sample period, a net $72.5 million was paid out in benefits from the pension fund. That is, the plan sponsor made contributions to the fund and paid out benefits in amounts so that the net result was an outflow of $72.5 million. The cumulative value of the fund at the end of the period, if the fund had been invested at a zero rate of return for the period, was, therefore, $223.6 million. Although the sponsor is unlikely to ever pursue such an investment strategy, the net contributions level provides a baseline for evaluating the investment strategies that follow. Risk-Free Assets. Table 1 also shows another route the fund could have taken, as defined by the second level of the macro-attribution analysis (riskfree assets). The beginning value of the fund, $296.1 million, could have been invested in a risk-free asset, such as 90-day U.S. T-bills. During the period, 90-day T-bills averaged a 5.4 percent return. If the $296.1 million (accounting for the inflows and outflows that netted a total outflow of $72.5 million in the sample period) was invested at the available 90-day T-bill return each month for the entire period, the fund would have had a value of $532.8 million at the end of the period. Therefore, the fund would have increased in value by $309.2 million solely as a result of investing in the risk-free asset. Remember, I am trying to explain the difference between $296.1 million in beginning assets and $1.0351 billion in ending assets, or $739 million. A risk-free investment explains about $300 million of the difference. Investors often forget how powerful risk-free investing can be over time. Cash can earn a good return and prevent a lot of heartache. Most plan
Table 1. Attribution of Changes in Asset Value: Net Contributions and Risk-Free Assets
Investment Alternatives Net contributions (preserve capital) Risk-free assets (preserve purchasing power)
Incremental Contribution (millions)
Cumulative Value (millions)
($ 72.5)
$223.6
309.2
532.8
Annualized Return 0.0% 5.4
Note: Data for December 31, 1988, to June 30, 2000. Beginning value = $296.1 million; ending value = $1.0351 billion; final return = 14.0%.
30 • www.aimr.org
2001, AIMR®
Benchmarks and Attribution Analysis for the Total Fund sponsors would not consider holding only risk-free assets to be a desirable investment approach, but it is an investable strategy. A fund could simply buy Tbills and produce a consistently positive result over the long term. Investment Policy. Most plan sponsors are willing to accept some degree of risk. They certainly take capital market risks. As a rule, plan sponsors set policy allocations that involve holding risky assets. Table 2 lists the policy allocation at the asset category level for the fund in this example: a 35 percent allocation to domestic equities, 20 percent to international equities, 25 percent to domestic fixed income, 20 percent to alternative assets, and no allocation to cash. A plan’s investment policy can change over time, so Table 2 is a snapshot of the policy at a particular point in time (June 30, 2000). The Asset Category Benchmark column shows the targets for these particular asset categories. Interestingly, the benchmark for the alternative assets is simply the alternative asset investment itself. That is, the fund is using the actual performance of the alternative assets as the benchmark. The reason is that it is difficult to measure the performance of limited partnerships or illiquid investments over the short term. This situation is a typical problem encountered by any performance measurement and performance attribution system.
The fund grew to $532.8 ($296.1 less $72.5 plus $309.2) million simply by investing in a risk-free asset. If the total fund used the policy allocations from Table 2 and invested monthly in each of the asset category benchmarks indicated by the plan sponsor’s investment policy (35 percent of assets in the Wilshire 5000 Index, 20 percent in the MSCI ACWI ex United States, and so on) the annualized return earned by the fund for accepting this additional risk would have been 12.8 percent, as shown in Table 3. This strategy represents the third level of macro-attribution (investment policy). The incremental dollar return of taking capital market risk was about $417 million. Notice that the cumulative value of the fund at this point is $950.1 ($296.1 less $72.5 plus $309.2 plus $417.3) million. Essentially, the investment policy level of the macro-attribution analysis can be viewed as a pure index fund approach. If the total fund’s assets were invested in the Wilshire 5000, the Lehman Aggregate, and so on at zero cost and in accordance with the sponsor’s designated policy allocations, the total fund would have grown to $950.1 million. Again, keep in mind that I am trying to explain the difference between the $296.1 million in beginning assets and $1.0351 billion in ending assets, and with only about $85 million left to account for, I am a good portion of the way along.
Table 2. Investment Policy: Asset Category Allocations Asset Category
Allocation Policy
Asset Category Benchmark
Domestic equity
35.0%
International equity
20.0
MSCI All Country World Index (ACWI) ex United States
Domestic fixed income
25.0
Lehman Aggregate
Alternative assets
20.0
Alternative assets
Cash reserves Total
0.0 100.0%
Wilshire 5000
90-day T-bills Policy asset mix
Table 3. Attribution of Changes in Asset Value: Net Contributions, Risk-Free Assets, and Investment Policy
Investment Alternatives
Incremental Contribution (millions)
Net contributions (preserve capital)
Cumulative Value (millions)
Annualized Return
($ 72.5)
$223.6
Risk-free assets (preserve purchasing power)
0.0%
309.2
532.8
5.4
Investment policy
417.3
950.1
12.8
Note: Data for December 31, 1988, to June 30, 2000. Beginning value = $296.1 million; ending value = $1.0351 billion; final return = 14.0%.
©2001, AIMR®
www.aimr.org • 31
Benchmarks and Attribution Analysis Tables 2 and 3 highlight the importance of strategic asset allocation (i.e., how much and which capital market risks the fund should bear) to the success of a plan sponsor’s investment program. The decisions that follow in the investment policy hierarchy are definitely interesting and entertaining for the plan sponsor, but they generally do not add a tremendous amount to the total value of the fund.
When a plan sponsor hires an investment manager, the sponsor’s total fund acquires an exposure to two sources of return and risk: the manager’s investment style (as represented by the manager’s benchmark) and the manager’s active management decisions. From a macro-attribution perspective, distinguishing between these two sources of return and risk is important. The sponsor has control over investment style decisions through its manager selection choices; however, the sponsor has no direct control over the active-management value provided by its managers. Just as I calculated the investment policy level contributions by using the benchmarks for each asset category in Table 2, I can calculate the contributions of each manager’s investment style. So, to calculate the contribution to the total fund’s growth derived from Domestic Equity Manager 2’s investment style, I can take 11 percent of 35 percent of the fund’s total assets and multiply this amount by the return of the S&P 500 Index. Most plan sponsors would (or should, unless they are actively timing style exposures) expect that returns calculated in this manner for all the domestic equity managers’ benchmarks will add up to the total return for the asset class (domestic equities), but that is often not the case. The difference in return on the aggregate of the managers’ benchmarks within an asset category compared with the
Benchmark Misfit. Although the investment policy level of the investment hierarchy dominates the other levels, most plan sponsors are not content with this simple indexing approach. They typically take it a step further by hiring active managers to make judgments about the relative value of securities. Table 4 shows the investment manager structure for the total fund in the example. Again, it represents a snapshot in time, and the structure can change. The Policy Allocation column shows the sponsor’s desired long-term allocations to asset classes and managers, and the Actual Allocation column shows the weightings actually in effect. For example, Table 4 shows how the sponsor would like to divide the 35 percent allocation to domestic equities among Managers 1 through 5 and compares those targets with the actual allocations. The Benchmark column lists the benchmark assigned to each manager.
Table 4. Investment Policy: Investment Manager Structure Asset Category/Manager
Actual Allocation
Policy Allocation
Benchmark
Domestic equity
33.4%
35.0%
Wilshire 5000
Manager 1
30.2%
30.0%
Custom
Manager 2
11.1
11.0
S&P 500
Manager 3
15.6
17.0
Custom
Manager 4
13.3
12.0
Custom
Manager 5
29.8
30.0
DCF benchmark
International equity
19.3
20.0
Manager 1
42.3
42.5
Manager 2
43.1
42.5
MSCI Europe, Australasia, Far East Index
Manager 3
14.5
15.0
MSCI Emerging Markets Free Index
Domestic fixed income
MSCI ACWI ex United States MSCI ACWI ex United States
24.4
25.0
Manager 1
24.8
25.0
Manager 2
25.1
25.0
Lehman Aggregate
Manager 3
24.9
25.0
Lehman High Yield
Manager 4
25.2
25.0
Custom
Alternative assets
18.8
20.0
18.8
20.0
Various managers Cash reserves
4.1
0.0
Manager 1
100.0
100.0
32 • www.aimr.org
Lehman Aggregate Lehman Aggregate
Alternative assets Managers’ returns 90-day T-bills 90-day T-bills
2001, AIMR®
Benchmarks and Attribution Analysis for the Total Fund return on the asset category benchmark is known as benchmark misfit or style bias. Table 5 shows that benchmark misfit, or style bias (the fourth level of macro-attribution), produced a negative incremental dollar return of $10.5 million over the period and an annualized return of 12.5 percent. Although this is not a big number, it cannot be ignored. Benchmark misfit can be quite large, yet many plan sponsors are careless about the allocation process among managers. Because benchmark misfit is typically unintended and its expected return is zero, it represents a source of uncompensated risk that should be minimized. Managers’ Contribution. Plan sponsors hire active managers in anticipation of outperforming the assigned benchmarks. That is, the plan sponsor expects the managers to generate a positive value of active management (VAM). Accordingly, the fifth level of macro-attribution (managers’ contribution) involves calculating the contribution of managers’ VAM to the growth in the total fund. For instance, Table 4 shows that Domestic Equity Manager 1 used a custom benchmark and that Domestic Equity Manager 2 had the S&P 500 as a benchmark. In comparing the managers’ actual returns with their benchmark returns and weighting the resulting VAM by the managers’ respective policy allocations, Table 5 shows that the VAM of all the managers created $98.5 million more in incremental fund value than would have been generated from investing in the benchmarks. These managers generated an annualized return of 14.2 percent, and the benchmarks generated an aggregated return of 12.5 percent. Thus, added value from manager skill is present in this fund for the 11.5 years. Allocation Tactics. As I mentioned previously, only rarely will the actual manager and asset category allocations match the policy targets. These differences generate an allocation-tactics impact, which
represents the sixth and final level of macro-attribution. Table 5 shows that the dollar return from allocation tactics for this plan sponsor was relatively minor, although slightly negative (–$3.0 million). Thus, allocation tactics lowered the annualized return for the fund in the 11.5 years from 14.2 percent to 14.0 percent. This sponsor had procedures in place to keep allocation tactics to a minimum. Other sponsors, either intentionally or through neglect, may find their actual manager and asset category allocations deviating significantly from policy targets and thereby generating sizable contributions (both positive and negative) to total fund performance. Accounting for allocation tactics concludes this macro-attribution analysis. I have decomposed the $739 million increase in the total fund’s value into the six “investment strategies” of the investment policy hierarchy and have expressed the contributions of each policy decision-making level in terms of both returns and dollars. Making sense of these attribution results is now up to the plan sponsor.
Performance Appraisal The macro-attribution analysis has explained the growth, or the progression, of fund assets from $296.1 million to $1.0351 billion. The third phase of the investment policy hierarchy—the investment policy level—generated by far the largest income for the fund. Looking across a large universe of total funds, one usually finds that all the subsequent investment policy hierarchy components show negative contributions. Investment policy added $417 million to the fund’s overall performance; the managers’ contribution added only $98.5 million, and small negative increments were associated with benchmark misfit and allocation tactics. The vast majority of the $98.5 million that was generated by the active managers
Table 5. Attribution of Changes in Asset Value: Total Fund
Investment Alternatives Net contributions (preserve capital)
Incremental Contribution (millions) ($ 72.5)
Cumulative Value (millions) $ 223.6
Annualized Return 0.0%
Risk-free assets (preserve purchasing power)
309.2
532.8
5.4
Investment policy
417.3
950.1
12.8
Benchmark misfit (style bias)
(10.5)
939.6
12.5
Managers’ contribution (VAM)
98.5
1,038.1
14.2
Allocation tactics (rebalancing strategy)
(3.0)
1,035.1
14.0
Note: Data for December 31, 1988, to June 30, 2000. Beginning value = $296.1 million; ending value = $1.0351 billion; final return = 14.0%.
©2001, AIMR®
www.aimr.org • 33
Benchmarks and Attribution Analysis came from the domestic equity management program ($48.5 million). International equity contributed $38.5 million, and fixed income, $9.4 million. Plan sponsors often summarize fund performance with figures or tables that show total fund performance versus that of the investment policy benchmark for various time periods (such as the last quarter, 1 year, 3 years, 5 years, 10 years, and since inception of the fund). Remember that the investment policy benchmark is the collection of asset category target returns weighted by their respective policy allocations. For example, based on the plan sponsor’s assigned weights and asset category benchmarks shown in Table 2, the one-year return for the investment policy benchmark is 9.4 percent, as shown in Table 6. The total fund, however, earned a 12.8 percent annualized return after fees and expenses for the year, so by that yardstick, the sponsor’s investment program added value during that period (and as Table 6 indicates, for longer periods as well).
Table 6. Total Fund Performance vs. Benchmark Performance Time Period
Fund
Benchmark
Last quarter
–0.7%
–1.0%
1 year
12.8
9.4
3 yearsa
14.7
12.3
5 yearsa
16.6
14.4
10 yearsa
14.0
12.4
14.0
12.8
Since 1/1/89
a
Notes: Data as of June 30, 2000. First available fund data started on 1/1/89. a Annualized.
But the data in Table 6 do not tell the whole story. The statistical significance of those results must also be considered. Figure 1 shows the actual return produced by the total fund compared with the return produced by an investment in the investment policy benchmark. The zero line indicates what would have happened if the sponsor’s investment program had produced zero added value during the cumulative time period. The solid dark line is the actual added value for the period. During the 11.5 years, the actual value added is above the zero line 1.1 percent on an annualized basis. The cone, or trumpet, shape, of Panel A indicates statistical probability. The thin, solid line and the dotted line in Panels A and B represent a confidence interval, ± 10 percent. Thus, a 20 percent chance exists that given the amount of volatility experienced by this program (annualized
34 • www.aimr.org
standard deviation of 1.89 percent over the entire period), the fund’s return would have fallen within the upper or lower confidence band. Effectively, I am testing a null hypothesis that the sponsor’s investment program has no ability to add value to the investment policy benchmark. Considering that the total fund’s cumulative return exceeded the upper 10 percent confidence level, the fund appears to have benefited from skill, not luck. Panel B of Figure 1 shows the added value of the sponsor’s investment program for rolling 18-month periods. In this panel, the volatility measurement used to test the null hypothesis of “no skill added” changes over time as the rolling window moves along. Panel B has a jagged confidence interval as opposed to the smooth confidence band in Panel A because Panel A used only one volatility number for that entire period of time. The information ratio (active return to the volatility of active return) for this total fund is 0.58 (as shown in Table 7), which is a fairly large number and one that is atypical in this type of analysis. The probability statistic (97.4 percent), which is similar to the confidence interval, indicates that the actual 1.1 percent value added for this fund is in the 97.4 percentile of possible results, given an expectation of zero value added and the observed standard deviation of value added returns. Again, one is left to decide whether the plan sponsor was extremely lucky or whether the null hypothesis (i.e., no skill was brought to bear on the investment decision making) should be rejected. Table 8 shows the attribution contributions for the investment policy hierarchy for different periods of time ending June 30, 2000. For this fund, the result is essentially the same in all periods: Investment decision making added value. Such consistent results are not true for all funds, and even this fund could have had subperiods when it underperformed or when various decision-making levels showed negative contributions (e.g., net flow and style bias for the 10-year period).
Total Fund Benchmarks A single correct total fund benchmark does not exist, because the benchmark depends on the context of the analysis. The liability stream of a pension fund could be used as a benchmark for a long period of time, for example. That is, a company maintains a pension fund to pay its promised retirement benefits, so the liability stream of the fund might be a reasonable, very long-run benchmark for the fund to use. But when attempting to explain performance to an investment committee, the most appropriate total fund benchmark is a composite of the asset category benchmarks that I have referred to as the investment
2001, AIMR®
Benchmarks and Attribution Analysis for the Total Fund Figure 1. Value of Active Management, December 31, 1988, to June 30, 2000 A. Cumulative since Inception Annual VAM Return (%) 4.0 3.0 2.0 1.0 0 –1.0 –2.0 –3.0 –4.0 12/88
12/90
12/92
12/94
12/96
12/98
6/00
B. Rolling 18 Months Annual VAM Return (%) 6.0 5.0 4.0 3.0 2.0 1.0 0 –1.0 –2.0 –3.0 –4.0 6/90
6/91
6/92
6/93
6/94
6/95
Upper (10%)
policy benchmark. The benchmark weights are the same as the policy allocations to the asset categories, so this investment policy benchmark satisfies the properties of a valid benchmark—unambiguous, investable, specified in advance, and so on. Although I cannot definitively say which total fund benchmark is the most appropriate one to use, ©2001, AIMR®
6/96 VAM
6/97
6/98
6/99
6/00
Lower (10%)
I can unequivocally state what constitutes an inappropriate total fund benchmark. Peer groups (i.e., fund universes) are not appropriate total fund benchmarks. Too many differences exist in the amount of risk that pension plans and endowment funds are willing to take. As a result, comparisons are usually not valid from one plan to another. Another
www.aimr.org • 35
Benchmarks and Attribution Analysis Table 7. Investment Program Value Added over Investment Policy Benchmark Item
Amount
Months
138
Annualized value added
1.10%
Annualized standard deviation of value added
1.89%
Information ratio
0.58
Probability
97.4%
Cumulative value added (millions)
$98.5
inappropriate benchmark is a noninvestable target, such as the rate of change in the U.S. Consumer Price Index. The plan sponsor has no influence over either the direction of capital markets or capital market returns relative to inflation. Outperforming or underperforming inflation is out of the sponsor’s control. Another benchmark to avoid is the actuarial rate of return. Pension funds have different actuarial rates of return, and they are not relevant to the opportunities the funds face in the investment world. In addition, the actuarial rate of return is raised or lowered at will by the sponsor; it has no direct bearing on the types of securities in which the plan invests and is thus not a valid benchmark.
Macro-Attribution Pros and Cons Macro-attribution is an extremely useful way to look at the performance of a fund. Although macro-attribution analysis has many advantages, some serious drawbacks nevertheless exist. Pros. One of the most important benefits of macro-attribution is that it integrates investment policy with performance measurement and provides an appropriate context for evaluating performance results. It makes sense for a plan sponsor to try to understand how each decision in the investment policy hierarchy has or has not added value to an investment program. Macro-attribution requires the plan sponsor to articulate fully the major inputs to invest-
ment policy, which I consider to be the most important aspect of macro-attribution. The plan sponsor must specify several key items: the policy asset category mix, the asset category benchmarks, the manager benchmarks, and the manager allocations. Some plan sponsors view having to make these decisions as an imposition, but I disagree. A plan sponsor’s primary responsibility is to make these decisions. If the plan sponsor cannot or will not do so, it is not running the fund in an appropriate manner. Macro-attribution also provides a framework in which to evaluate the performance of the plan sponsor. By working within the context of the investment policy hierarchy, which represents the primary investment policy decisions that the plan sponsor must make, the investment committee has a tool for trying to understand how well the plan sponsor has implemented its investment program. The plan sponsor maintains control over most, but not all, items in the investment hierarchy. For instance, the plan sponsor does not directly control the managers’ added value. But the plan sponsor does control benchmark misfit and always has the opportunity to realign or eliminate managers. Therefore, performing macro-attribution allows the investment committee to evaluate the plan sponsor because it makes apparent the relative importance of plan sponsor decisions versus the contributions of investment managers. Finally, macro-attribution produces the “right” metric to use in total fund universe comparisons. Total fund returns deflated by the investment policy benchmark returns and adjusted for volatility are appropriate inputs to universe (peer) comparisons. Different plan sponsors have different risk policies, so they have different allocations among asset categories. Thus, if investment committees feel compelled to compare the performance of their funds with those of other plan sponsors, then they should compare deflated fund returns rather than raw fund returns.
Table 8. Total Fund Performance Attribution for Periods Ending June 30, 2000 (millions) Time Period
Net Flows
Risk-Free Assets
Investment Policy
Style Bias
Managers’ Allocation Contribution Tactics
Change in Value
10 years (beginning value = $323.0)
($35.5)
$270.4
$389.5
($3.5)
$84.5
$6.6
$712.1
5 years (beginning value = $475.1)
21.0
182.9
282.4
0.1
64.7
8.8
560.0
3 years (beginning value = $628.1)
93.7
126.6
160.8
1.7
45.4
8.8
407.0
(54.0)
52.7
36.4
(0.2)
30.3
1.6
66.7
12 months (beginning value = $968.4)
Note: Ending value for all periods is $1.0351 billion.
36 • www.aimr.org
2001, AIMR®
Benchmarks and Attribution Analysis for the Total Fund Cons. Macro-attribution has difficultly dealing with zero net asset value accounts. For example, when an account invested in futures is not equitized, breaking out the dollar impact or the return impact is complicated. Macro-attribution also has problems in the area of alternative investments, which have appraisal/cost-based pricing. The plan sponsor cannot get a market value for its alternative investments on a regular basis, so determining whether a particular investment’s performance has fallen behind relative to a particular benchmark poses a challenge. And for an investment in a limited partnership, the sponsor often cannot get a reasonable valuation on the investment for years. Macro-attribution also requires having appropriate benchmarks and a longterm perspective. Doing this analysis without good benchmarks makes no sense because the attribution merely explains noise, which is a waste of time. Therefore, a concentrated effort must be made to find appropriate benchmarks. A consistent, long-term perspective evaluated in appropriate intervals is also needed for macro-attribution to be effective in guiding investment decision making. Conclusions based on short-term attribution results are of little value.
©2001, AIMR®
Conclusion Macro-attribution is not a panacea. Many issues remain unanswered. As with many tools available in the investment management business, macro-attribution takes those who use it on a journey rather than to a specific destination. The discipline of having to specify the inputs for the macro-attribution creates the value in the process. Identifying the policy allocations for a particular investment program and evaluating the performance results in terms of the investment policy hierarchy allow the participants in the macro-attribution process to better understand their investment program and make more-intelligent decisions. The questions of whether the fund managers added value for a particular period of time or whether a misfit problem existed for that same period of time are not easily answered because of the extreme amount of noise inherent in investment performance data. Nonetheless, macro-attribution increases the chances that such issues are approached in an objective and systematic manner. It creates a more disciplined investment management environment and enhances opportunities to add value to the total fund’s investment results.
www.aimr.org • 37
The Integration of Risk Budgeting into Attribution Analysis Wayne A. Kozun, CFA Vice President, Tactical Asset Allocation and Real Return Assets Ontario Teachers’ Pension Plan Board North York, Ontario
Risk budgeting, defined as the allocation of tracking error among the different asset classes of a fund, is a complex task but a valuable tool. In adopting a risk-budgeting system, the plan sponsor must define the risk to be budgeted, determine a methodology to calculate the risk, and juggle the issues involved with implementation. These issues include accurate and timely pricing of all (even illiquid) assets, modeling the effects of event risk on the total fund, choosing an internal or external risk management system, integrating the portfolio accounting system with the risk system, and educating staff and the plan’s board members about what a risk-budgeting system can and cannot accomplish.
efore risk can be analyzed and put into the proper context, the analyst must understand the plan sponsor’s goals and objectives. At the Ontario Teachers’ Pension Plan Board (OTPP), we manage a defined-benefit plan for about a quarter million teachers in Canada’s largest province, Ontario. A characteristic of our plan—one that is common in Canada for public pension plans—is that the pensions paid by the plan are indexed to inflation. Every year, the pensioners’ annual benefits increase based on the Canadian consumer price index. This feature significantly changes the risk characteristics of our plan vis-à-vis a plan that pays a defined benefit without making inflation adjustments because our riskfree asset is indexed to inflation. The OTPP’s interest in risk management originated when the plan was privatized in 1990. At that time, we had about C$19 billion in assets—all government bonds. We wanted to diversify into an asset mix of 70/30 or 60/40 (equities/bonds) but were constrained by cash flow needs. The bonds were not marketable; they could not be liquidated easily, so we had to come up with another solution. To overcome this problem, we synthetically sold the bonds using derivatives—both interest rate and equity swaps. Because plan trustees often perceive derivatives negatively, we set up a proactive and elaborate risk management system to address, and hopefully alle-
B
38 • www.aimr.org
viate, their concerns. Thus, the OTPP entered the risk management arena. Besides the obvious diversification benefit of expanding the asset mix beyond government bonds, other benefits were associated with our using derivatives. Canada limits the amount of foreign investment for registered pension assets, which include pension plans plus Registered Retirement Savings Plans (the Canadian equivalent of IRAs): Only 25 percent of plan assets can be invested in non-Canadian assets. But because of the way the laws are structured, derivatives effectively circumvent this restriction.
Risk Budgeting Although risk budgeting has been a hot topic recently, many investors do not know what it means. We define risk budgeting as using the risk tolerance and risk assumptions of the plan to drive the plan’s overall investment strategy. Given the information ratio and the required return for the plan, the aggregate risk, or tracking error, of the plan can be calculated. Once the tracking error is known, it can be allocated among the different asset classes in the fund. This process characterizes risk budgeting. For the most part, we focus on the budgeting of active risk, although the plan does have some absolute risks, such as surplus risk (i.e., the risk of the
2001, AIMR®
The Integration of Risk Budgeting into Attribution Analysis funding ratio falling), which is the most important risk for a pension plan. Active risk is not always the most important risk because the bulk of the plan’s risk comes from the investment policy asset mix. This system of risk budgeting allows us to assign active risk in a quantifiable, standard way for all asset classes in our portfolios. As a result, we can consistently apply risk to domestic equity, fixedincome, and currency overlay portfolios as well as to hedge fund or hedge-fund-type portfolios that run long–short strategies. We manage the bulk of our assets internally, and our compensation plan is based on the return a manager earns on his or her allocated risk, which can be a significant amount—in some instances, in excess of 100 percent of salary. Until the risk-budgeting process was integrated into the managers’ compensation equation, the process did not get the attention it deserved from managers; once it affected their paychecks, however, they began to pay attention.
Risk versus Performance Attribution In many ways, risk attribution and performance attribution represent two sides of the same coin. Risk attribution uses historical data to forecast how much the plan’s portfolio will gain or lose given different market scenarios. Performance attribution takes historical returns and tries to explain why or how the manager performed as he or she did. Risk attribution is thus a forward-looking methodology, whereas performance attribution is the opposite, but the two go together hand in hand. In the perfect investment management system, perhaps 10 or 15 years from now, we will have a single, integrated system that will consider performance attribution and risk attribution in a coherent, unified framework. Unfortunately, such a system is not yet available.
Risk Management at OTPP Prior to 1995, asset-mix studies incorporating an overall portfolio risk policy were done infrequently at OTPP. Once we started to perform these risk policy asset-mix studies, however, we were able to reexamine the optimal levels of active and passive risk for a portfolio and budget these types of risk. But many rules constrained our investment choices and proved to be unworkable, ineffective, and suboptimal. For example, we could not have sector or country weights that deviated substantially from those of the benchmark. If a manager wanted to underweight Germany and overweight the Netherlands, even though the bet was not that risky because of the high correlation between those two markets, the manager ©2001, AIMR®
could not pursue it. The problem was that these outdated asset allocation rules did not consider the portfolio in its entirety or the effect that relationships, such as correlations, can have on the aggregate risk of the portfolio. We learned quickly that risk should be analyzed at as high and broad a level as possible. When risk is evaluated at a low level, such as that of the individual manager or portfolio, an excessive number of constraints have to be considered. But when risk is looked at from a larger perspective or in a broader type of framework, many of these constraints disappear or are diversified away because of imperfectly correlated assets. Therefore, in the mid-1990s, we adopted a value at risk (VAR) framework to measure the daily market risk of the portfolios managed by our internal managers. That framework led, in turn, to the risk-budgeting framework we use today and our integration of risk with manager compensation levels. Our goal is to have the risk-budgeting process be the main driver of the tactics and strategies of our plan. Recently, we have begun to use other types of risk modeling. Our asset/liability model is at the heart of our ability to set a strategic asset mix. The asset/liability model uses a 10-year investment horizon and can adjust for changes in the benefits paid to pensioners based on demographic and mortality assumptions. In the future, we hope to use other methods that we have not yet had the time or the capability to explore, such as integrating credit risk and market risk or evaluating risk in terms of a structural approach in which risk can be attributed to a variety of factors, including interest rates, economic growth, oil prices, and so on. Currently, our model is more of a statistical artifact than a tool for assessing structural relationships.
Implementation Issues After a VAR system is implemented, the first step is to define the risk level. For our plan, we defined risk to be a 1-in-100 event on an annual basis. In terms of standard deviation, assuming normality, this 1-in100 event corresponds to 2.33 standard deviations. Typically, however, capital market returns are not evenly distributed; they have fat tails. To build in a buffer zone, we use 2.60 standard deviations rather than 2.33. We also use a historical full valuation methodology, which means that we take all the assets in our portfolio as of the day before and reprice them for each day of the past 14 years. Applying this methodology is difficult when the portfolio contains illiquid assets, particularly private equity and real estate—two of the most prevalent illiquid asset categories. Most banks with VAR-type risk management systems use relatively short time
www.aimr.org • 39
Benchmarks and Attribution Analysis horizons, from less than a year to three years, but we prefer to have a consistent time horizon between our risk system and investment strategy. Because a pension plan has, theoretically, an infinite time horizon, we like to have at least a 10-year time horizon when calculating risk. Another consideration when setting up the system was creating the ability to look at such events as the 1987 stock market crash to see how our present portfolio would withstand a similar event. These critical occurrences are not that rare. In fact, in 1998 and 1999, volatility occasionally reached levels that had not been seen since 1987. As part of our risk budgeting, we isolate three types of risk. The first is basic VAR, which is the amount of money that could be lost in a certain time horizon. The second type of risk, and perhaps the most important risk for a pension plan, is surplus risk. Surplus risk is found by subtracting plan liabilities from plan assets and evaluating how much that residual would change as a result of various marketvolatility scenarios. The third type of risk, the one that I refer to most and seems to be the easiest to quantify, is relative risk, or tracking error. In our system, we call this type of risk “management effective risk” (MEaR). Once we decided to implement a risk management system, we had to address several issues. The first issue was whether we should design and operate the system internally or employ a service bureau. When we adopted our risk management system in the mid-1990s, not many service bureaus could handle the task. More firms now provide this type of service, and more custodians have entered the business of offering risk management services or risk calculations. Yet a concern that many plan sponsors have about using a service bureau is whether confidentiality about plan assets is sufficient, particularly if the firm performing the risk analysis is a potential trading partner that could use such knowledge to its advantage. The biggest hurdle in building a risk management system is integrating the portfolio accounting system with the risk system. All asset positions must be valued by the portfolio accounting system; that is, all assets in the portfolio must be priced, even though pricing the more esoteric instruments is not easy. For example, the inputs needed to value options can be difficult to get, particularly for exotic options with two or three different strike prices that require both the price and volatility of the underlying stock or indexes to accurately price the option. Basic portfolio accounting systems generally do not have that capability and thus must be customized to handle the task. As with any quantitative analysis, the data for risk management systems must be culled to control
40 • www.aimr.org
for errors. A variety of data problems can arise. Something as simple as a typo or a stock split that occurred a few years ago for which the price was not properly adjusted (so that the stock’s value in the portfolio exhibits drastic swings from period to period) creates big problems for a risk management system. Furthermore, not all data are available when they are needed. Sometimes, the portfolio requires nondomestic data, which may take longer to acquire than domestic data or may not even exist. Basic information system (IS) constraints pose other challenges. These risk management systems can make a plan sponsor change the architecture for its overall IS framework. We had to determine whether the system we wanted to buy would run on our computer’s operating system, which was Windows NT. Because the system was available only in UNIX, we had to consider purchasing UNIX equipment and hiring people with UNIX skills. If we had not been able to hire the right people, we would have had to find a service bureau to take care of the UNIX system. Ultimately, we decided to acquire UNIX equipment and people with UNIX skills to get the system we wanted. But until we started to implement the risk management system, we had no UNIX systems in our department. Another time-consuming issue, and one of the most challenging pieces of the puzzle, is fully educating internal staff and the plan’s board of trustees. The learning curve can be much steeper than expected. Having outside confirmation, whether from consultants or other experts, to validate the plan’s internal risk management system is helpful. We made an effort to communicate to board members and internal staff that the system we were implementing was not the perfect solution for risk management or for risk budgeting but was an improvement over our prior system; we told them that it was a 60 percent solution at best but that it was better than a 10 percent or 20 percent solution. A risk management system is a tool, not a magic wand. Performance measurement is calculated on a regular basis, and the same should be true for risk measurement. If risk is not measured and monitored over time, how can it be managed? Also, the portfolio benchmark must be identified in advance so that relative portfolio risk can be measured; otherwise, the fund cannot be managed properly. And most important, to sell the risk management idea and find support for implementing the system, the idea must be presented as simply as possible. The math should be kept to a minimum so the plan does not seem intimidating.
2001, AIMR®
The Integration of Risk Budgeting into Attribution Analysis
Calculating Risk At the OTPP, we use a historical methodology to calculate risk. Using current assets—everything in the portfolio, whether a stock, bond, foreign exchange forward, option, or future—we calculate the daily profit and loss (the difference between these numbers is the surplus risk) for the portfolio for each day from October 31, 1986, to the present. This process allows us to see what the profit on our current portfolio was on particularly risky days, such as those experienced in 1987 and 1991. Performance is reported on a total portfolio basis, but it can be broken down into subportfolios. Once these daily historical profits and losses are calculated, they are sorted from worst to best. We then have to decide which risk level to report. Should we report profits and losses that occurred 1 percent or 5 percent of the time? Or should risk be framed by another reference? We also have to choose a time horizon to use for reporting the risk. We generally report on a daily basis but then scale the data to an annual basis for greater relevance. Figure 1 shows the daily changing surplus, essentially the daily profit and loss on a surplus basis, of our portfolio for each day since 1987. Figure 1 is an obvious representation of the risk inherent in our current portfolio under the previously experienced volatile market conditions. The greatest volatility experienced in the markets in recent history occurred in 1987. Figure 1 also indicates high volatility in the 1990–91 period, followed by relative market calm in the 1992–94 period, until the past couple of years when volatility once again picked up. The increasing
magnitude of the lines toward the end of the decade illustrates the riskiness of the assets in the portfolio. Figure 2 shows the results of sorting the portfolio surplus shown in Figure 1 from worst to best. The black bars on the left are the 1 percent left-hand tail of the distribution (i.e., the worst 1 percent change in surplus), which is our surplus risk. Figures 1 and 2 show our daily risk, but to annualize the data, we would multiply the daily risk by 16, the square root of the number of trading days (about 250). VAR can be calculated in various ways, but we have chosen to use this historical methodology. A Monte Carlo simulation could also be used to create a history based on specified inputs. Typically, similar results are obtained with the Monte Carlo simulation and our VAR methodology. The Monte Carlo methodology, however, is often a necessary corollary to a historical system. Say, for example, that a portfolio holds Egyptian stocks. Finding 14 years of history on Egyptian stocks is impossible, so a history based on certain assumptions must be created. To create a history of returns for Egyptian stocks, a proxy, such as data on stocks in Malaysia, can be chosen to simulate a manager’s view on the riskiness of Egyptian stocks. To model risk on Canadian and U.S. stocks, for example, we measure risk on an individual stock basis. We have daily stock prices from 1986 through the present, and for each stock, we calculate the profits and losses we expect to earn by holding those stocks. This process is complicated, but it is doable
Figure 1. Daily Change in Surplus with Policy Mix, 1987–99 1,500 C$ Millions 1,000 2,000 500 1,500 1,000 0 —500 500 —1,000 0 —1,500 –500 —2,000 –1,000 1987
1999
–1,500 –2,000 1987
©2001, AIMR®
1999
www.aimr.org • 41
Benchmarks and Attribution Analysis Figure 2. Change in Surplus Ordered from Worst to Best C$ Millions 1,000 2,000 500 1,500 0 1,000 —500 500 —1,000 0 —1,500 –500 —2,000 –1,000
–1,500 –2,000
for domestic stocks. It is more difficult for international portfolios, given the large number of international stocks (roughly 1,800) and the lack of data. International stock data are not as readily available as those for domestic stocks, so modeling the risks for an international portfolio is much trickier than for a domestic portfolio. Therefore, we use a capital asset pricing model for international equity. Portfolio risks cannot be measured or managed individually. Portfolio risks must be evaluated at the fund level because a change in an individual risk may increase surplus risk. Thus, correlation is a necessary, if often complicated, concept in the evaluation of portfolio risk. The problem with understanding correlation arises because the calculation is not simply additive. To illustrate this point, consider three portfolios, each containing two assets each: Asset A and Asset B, where RiskA = RiskB = $100
(the worst 1 percent outcome with an annual horizon). We then calculate the risk for each portfolio using the following equation:
Risk A + Risk B =
Risk 2A + Risk 2B + ( 2 × CorrelationAB × Risk A × Risk B ) .
For the first portfolio, both Assets A and B are $100 of Microsoft stock, so the correlation between Asset A and Asset B is 1 because they are the same asset. In this case, the risk calculation is rather simple because the risk is additive. The total risk of the portfolio, as defined by the above equation, is the sum of the individual risks, or 200:
42 • www.aimr.org
1002 + 1002 + ( 2 × 1 × 100 × 100 )
=
200 .
In the second portfolio, Asset A is $100 in Toronto Stock Exchange stocks and Asset B is $100 in commodities. The correlation of the two assets is zero. Thus, the risk of the portfolio holding Assets A and B is less than the sum of the risk of the two individual assets. The total risk is 140: 2 2
100
+ 100 + 0 =
140 .
Because of the zero correlation, the portfolio risk is not additive, but it is not subtractive either. In the third portfolio, Asset A is $100 in realreturn (inflation-indexed) bonds and Asset B is $100 in liabilities of the fund; they are mirror images of each other. In this instance, the correlation is –1. The risk is completely subtractive. The total risk is, in fact, zero. So, the assets are perfectly negatively correlated, and the risk of one cancels the risk of the other: 1002 + 1002 + [ 2 × ( – 1 ) × 100 × 100 ] = 0. This situation is obviously ideal. Therefore, correlation plays a crucial role in the calculation of risk for the overall portfolio. Unfortunately, correlation is neither neatly defined nor easily deconstructed. Distinguishing where each of a portfolio’s risks originates is difficult. And if one risk is removed, the whole structure of what remains can change. Therefore, risks cannot be viewed on an individual basis.
OTPP Portfolio Risk Our risk-free asset, if we want to assume little or almost no risk, is Canadian government real-return
2001, AIMR®
The Integration of Risk Budgeting into Attribution Analysis bonds (RRBs). These are similar to U.S. Treasury inflation-indexed bonds (commonly referred to as TIPS), except they are indexed to Canadian inflation. The return on RRBs is currently 50 basis points less than our required actuarial rate of return, so if we invested solely in RRBs, we would lose 50 bps of surplus every year and destroy the wealth of the plan beneficiaries. We thus accept some risk by buying higher-yielding assets, such as equities. Our assumption is that equities have a real return of about 6.5 percent. So, we accept the trade-off of increasing return at the price of increasing risk. Our current portfolio of 65 percent equities, 20 percent nominal fixed income, and 15 percent realreturn assets has a risk of 22 percent and a 1.3 percent expected surplus return. Thus, our policy asset mix should allow our surplus to grow at 1.3 percent a year, given that all of the assumptions in this model are correct, such as the return on equities and return on bonds. (Those assumptions are never right, by the way, but we have to start somewhere to do this type of analysis.) Our normal asset mix is slightly below the efficient frontier, which often is the case because of constraints that prohibit reaching optimality. On average, we should be increasing our surplus and exceeding our actuarial required rate of return, but we are taking 22 percent annual risk to do so. The odds are that for 1 year in 100, our surplus could fall by 22 percent. With assets currently at more than C$70 billion, we could lose about C$15 billion if we experienced that 1-in-100-year event. The bell-shaped distributions shown in Figure 3 illustrate another way of looking at portfolio risk. The mean of our policy surplus growth is 1.30 percent. Panel A shows our additional costs (30 bps) for achieving our policy return—management costs for a passive portfolio, rebalancing costs, and so on. The 1 percent tail in the risk distribution is the surplus risk of 22 percent (SaR), or the risk that the portfolio could lose 22 percent of asset value. In addition to the policy surplus risk is active-management risk. Our goal, before expenses, is to earn 84 bps in value from active management, which translates into about C$550 million. Essentially, we want to move the distribution to the right by 84 bps, and by doing so, we assume risk of about 3.4 percent of our portfolio, as shown in Panel B. In other words, in aggregate, we will move the mean of our actual surplus growth to the right, as shown in Panel C. So, we get a mean actual surplus return of 1.3 percent plus 0.84 percent minus 0.3 percent, and our SaR has increased to only 22.2 percent. When we add the risk of active management to our policy risk, our overall risk does not increase much; our total risk is the square root of 22 percent squared plus 3.4 percent squared, assuming a zero ©2001, AIMR®
Figure 3. Effect of Active Management on Surplus Growth A. Policy Surplus Growth
SaR = —22%
1.30 — 0.30 Return (%)
SaR = –22%
B. Active-Management Return 1.30 – 0.30 Return (%) B. Active-Management Return
MEaR = —3.4%
0 MEaR = –3.4%
0.84
Return (%) C. Actual Surplus Growth 0
0.84
Return (%) SaR = —22.2%
C. Actual Surplus Growth
1.30 — 0.30 1.30 + 0.84 — 0.30 SaR = –22.2%
Return (%)
1.30 – 0.30
1.30 + 0.84 – 0.30
Return (%)
correlation (which for the most part is true) between our active risk and our policy risk. We budget 3.4 percent active risk on a total portfolio basis, so in 1 year out of 100, we could underperform our benchmark by 3.4 percent. In most years, however, we expect to earn a first-quartile return on this risk. We tie this expectation into the compensation system. We want to have first-quartile managers, and our managers will receive the maximum bonus if they achieve first-quartile performance. So, we are looking for a 25 percent return on risk at the total fund level. We have nine basic management programs (e.g., domestic equities, fixed income, private equities, etc.) that are relatively uncorrelated with each other.
www.aimr.org • 43
Benchmarks and Attribution Analysis Therefore, the total active risk is less than the sum of all the risks. It is the square root of the sum of the squares of the individual risks. Consequently, total risk is about 2.5 times less than the sum of all the individual risks. Our undiversified risk is thus 2.5 × 3.4 percent, or 8.5 percent. If every portfolio manager puts in a 10 percent return on his or her active risk, the total fund will have a 25 percent return on diversified risk. Keep in mind that returns are additive, but risk is not. Our policy surplus risk—the risk of having asset growth below liability growth—is 22.2 percent. That is, 1 year in 100, we could lose 22.2 percent. Our equity management risk is our risk of underperforming the benchmark. On a total fund basis, our activemanagement risk is 3.4 percent of assets. Adding the two together (actually the square root of the sum of the squares), the total risk (the actual surplus at risk) is about 22.4 percent of assets. Remember that these numbers are for the benchmark portfolio; the numbers for our actual assets will differ. For example, in 1999, our policy VAR was C$14.6 billion (21.8 percent of assets); policy surplus at risk was $C14.9 billion (22.2 percent of assets); actual management effective risk was C$2.0 billion (3.0 percent); actual VAR was C$13.4 billion (19.9 percent); and actual surplus at risk was C$13.7 billion (20.4 percent). So, our actual surplus at risk was substantially less than our policy surplus at risk. The reason for this discrepancy is that we take a rather defensive stance in our portfolio. We are underweight equities to a certain extent, and equities add the bulk of the risk to the portfolio. And within equities, we have a value bias. Our managers, therefore, own stocks that are less risky than their benchmarks. These tactics have allowed our actual surplus at risk to be substantially less than our policy surplus at risk, even though we have a fair amount of activemanagement risk (about 3 percent of our portfolio).
Allocating Risk Part of risk management involves deciding where to accept risk: Where do we want to actively manage, and where do we want to passively manage? If we anticipate that returns from active management will be relatively low in an asset class, we do not allocate any active assets there; we do not want to chase efficient markets, such as U.S. equities and bonds. We take most of the active-management risk in investment programs that allow a longer time horizon. As a pension plan, we have a long time horizon, unlike many investors who are in the market for short-term gains. We perceive this difference in time horizons as a strength of the OTPP and try to exploit it. For example, when other investors were buying
44 • www.aimr.org
high-tech companies, we were purchasing undervalued assets, such as oil companies and timber companies. In our view, over a 10- or 15-year horizon, the oil and timber sectors could end up being the better performers. We can also vary the type of risk we take based on the perceived opportunities for a particular asset class. For example, we may want to put more money into real estate when the real estate market is depressed. Alternatively, we may prefer to have equal amounts of risk in all our investment programs, assuming a lack of correlation between them, because that is the easiest way to diversify. Such a strategy produces the highest aggregate return on the risks taken.
Controlling Risk Use We have a process that controls the level of risk that an investment manager can take. We use a three-zone methodology—green, yellow, and red. For example, the green, yellow, and red zones might be entered at C$120, C$300, and C$400 million, respectively. (The dollar amounts differ for each program, but the yellow limit is always 75 percent of the red limit.) In the green zone, the portfolio manager (or whoever is responsible for running the program) has full investment discretion, as long as investments are made according to the basic guidelines of the portfolio. The portfolio manager can buy whatever assets he or she chooses until the end of the green zone is reached. Once in the yellow zone, approval from a senior vice president is needed before additional investments can be made. The portfolio manager has to explain why he or she needs to be in the yellow zone, but being in the yellow zone is usually not a problem, although the manager is monitored more closely while there. Once the manager hits the red zone, however, he or she must immediately close out some positions to reduce the level of risk enough to fall back into the yellow zone. This system gives the manager some flexibility, and it complements the portfolio managers’ compensation system. For example, if the return the manager expects to earn is 10 percent on the risk of C$300 million, the manager will be expected to earn C$30 million to receive points for his or her bonus allocation. The allocation follows a linear scale, so if the manager earns only a C$15 million return on a C$300 million investment, the manager will receive half of the full allocation of bonus points appropriate for the risk taken. The manager never has to take all the possible risk, but the more risk the manager takes, the more likely it is for him or her to hit performance targets. The system is not able to capture intraday risk,
2001, AIMR®
The Integration of Risk Budgeting into Attribution Analysis because we measure the risk of the assets only at the end of the day. Thus, the system is not perfect, but it does manage to capture the bulk of the risks in a given portfolio.
Supporting Evidence For the most part, our portfolio returns have been above our benchmark returns as well as the median managers’ active returns. OTPP’s investment strategy recently took a defensive stance because we had a few rough years and our performance was below the benchmark a few times. Typically, our performance falls in the positive 84–168 bp range versus the benchmark return—in the first and sometimes even the top decile. Keep in mind that a good strategy can be implemented perfectly, but markets can turn against the manager. A risk system such as ours will pinpoint how much a portfolio can lose in volatile markets, or at least give the manager a better grasp of what may be encountered.
Cultural Change after VAR Implementing our risk system has brought about some changes in the culture of our organization. We have changed our active-management programs so that we budget active-management risk, and we measure the assets under management more by risk allocation than by dollars under management. In the past, a manager may have said, “I manage C$500 million in assets,” but this type of quantification no longer matters. What is relevant is the amount of risk someone manages, and this approach has led to some alpha-transport-type strategies. One result is that our Canadian active equity managers, who previously had C$4 billion under management, now run a long–short strategy and have much greater flexibility in managing their portfolios. They can short a wide variety of stocks, not just stocks that are weighted in the benchmark. And they do not have to own a stock simply because it has a large weighting in the benchmark. Our risk manage-
©2001, AIMR®
ment system has taught us that active risk is small. In a typical organization, managers worry about the stocks they pick, duration, country bets, and so on. We have learned that such decisions are inconsequential, because active risk is relatively small. The big issue is the policy asset mix. Our system thus compels managers to focus on the important issues. In a typical portfolio management environment, if a stock has a large weighting in a manager’s benchmark, that manager is obliged to hold that stock. By focusing on risk, managers no longer have to buy stocks they do not want. They can focus on those stocks they think will add value and make larger allocations to those assets. The key to our risk system being accepted throughout the organization was building into our compensation system the concept of return relative to active-management risk. Once managers realized their bonuses would be predicated on the relationship between the return earned for each unit of risk taken, they took notice.
Conclusion If a pension plan sponsor adopts a risk-budgeting system, the sponsor must face head on some of the preconceived notions of staff, board members, and others who will oversee the plan. Some people seem to think that a manager should never underperform the benchmark if he or she is a first-quartile manager. But the worst case scenario indicates that even firstquartile managers underperform the benchmark at least one or two years every decade. People must simply be aware of that fact. Some managers also argue that the assets they are buying diversify the portfolio, so losing a small amount of money is not a big deal. Plan sponsors cannot allow such thinking to become prevalent because if they do, managers will diversify and lose money. Plan sponsors must also watch out for managers who administrate career risks rather than the fiduciary obligations of the plan.
www.aimr.org • 45
Benchmarks and Attribution Analysis
Question and Answer Session Wayne A. Kozun, CFA Question: How do you deal with changing correlations?
but those are the hardest assets to handle.
bonds are the best hedge against inflation risk.
Kozun: The historical methodology uses whatever correlation is inherent in the actual data, so we do not specify a fixed correlation. That is one of the advantages of using a historical risk, rather than a Monte Carlo, approach.
Question: How do you determine your liability surplus?
Question: You set up some parameters for risk managers. How do you manage these managers to make sure they stay within those limits?
Question: What percentage of your portfolio is in alternative assets, and how do you come up with correlations and valuations for them? Kozun: About 12–14 percent of our portfolios’ assets are in alternative investments. For private equity investments, we use another stock or sector that is a good proxy for the private equity investment. For example, if the investment is in a software company, we use some other software index as a proxy for that individual stock. For real estate, we use Monte Carlo modeling superimposed on actual data. We can get quarterly data for 10–15 years. On top of that quarterly data, we throw in some daily noise to get the volatility that seems suitable for real estate investments on an annual basis,
46 • www.aimr.org
Kozun: To price liabilities, we use a real-yield curve. It is just like pricing a bond, but in this case, the discount rate is the actual rate these real-return bonds earn in the market. As long as we have data on those bonds, we can use them to price our liabilities. The problem is that in Canada, we have had them only since 1991. We had to use quantitative modeling to develop a regression that regressed real yields against nominal yields, inflation, and some noise. Question: If your liabilities were not indexed, would the inflationindexed bonds make more sense? Kozun: They would make a lot of sense, mostly because pension benefits resemble inflation-indexed bonds. But even if your liabilities are not indexed, holding inflationindexed bonds to hedge your active liabilities makes sense; the plan payroll needs to keep up with inflation, and inflation-linked
Kozun: The data are updated everyday. The managers use a Web-type interface to see what the risk is in their portfolios and how it has changed in the last couple of months or so. If someone goes over the risk limit, the risk management department sends an e-mail as a warning that the portfolio is in the yellow zone. Sometimes, managers consult with the risk management department if they want to add assets to their portfolios. We want to do a lot more work in that area to improve “what if” analyses. Right now, we use a back-office tool from our investment accounting department. A portfolio manager may have a good idea of what his or her return is, but until the data are released, no one knows for sure. We want to push that information to the front office as much as possible so that the portfolio manager knows how a particular trade will affect the risk of the portfolio.
©2001, AIMR ®
Taxable Benchmarks: The Complexity Increases Lee N. Price, CFA CEO Price Performance Measurement Systems, Inc. Palo Alto, California
After-tax benchmarks must adhere to standard benchmark rules while incorporating taxrelated concerns (such as income tax rates), but a big hurdle in establishing appropriate benchmarks is choosing which tax rate to use. An after-tax benchmark can best be constructed by using a combination of three levels of approximation as well as a shadow portfolio that allows for adjustments in cash flows and calculations of portfolio-specific cost bases.
f those who manage taxable portfolios or represent taxable clients, only a small percentage report after-tax returns. One of the reasons managers give for not calculating after-tax returns is the lack of generally available after-tax benchmarks. After addressing the issue of benchmarks in general, I will explain how to calculate after-tax returns according to AIMR-PPS™ standards; the same rules apply to calculating after-tax benchmarks. I will then discuss three levels of approximation for the calculation of after-tax benchmark returns and potential combinations of these approaches.
O
Standard Benchmark Rules A number of well-established principles exist for creating benchmarks. A benchmark should be (1) appropriate to the manager’s asset class and investment strategy, (2) unambiguous, (3) specified in advance, (4) investable, and (5) measurable. When constructing an after-tax benchmark, a sixth rule is also applicable: The benchmark should be subject to the same (or similar) tax considerations as those of the clients whose portfolios are being evaluated against it. Obviously, an after-tax benchmark must have the same or similar tax considerations as those of the accounts that are being managed against it, but that does not mean the benchmark can be ambiguous; nor does it mean the benchmark cannot be specified in advance. Most important, the benchmark must still be appropriate to the manager’s investment strategy. The problem with after-tax benchmarks is that no single after-tax performance number applies to all
©2001, AIMR®
users of the benchmark. One size does not fit all. Whereas pretax benchmark users can expect to have a single number for benchmark performance, aftertax benchmark users should never expect a single value. The S&P 500 Index’s pretax return in 1999 was 21.04 percent, according to Ibbotson Associates. But managers who want to compare their results with an after-tax benchmark must recognize the complications involved. The benchmark has to take into account not only the different tax rates of clients but also the variation in the capital gains tax bite, which is dependent on the client’s starting cost basis. A nuclear decommissioning trust with a 20 percent flat tax rate on capital gains should have a different after-tax benchmark from that of an individual with a 46 percent total state and federal tax rate. Equally important is the fact that the after-tax benchmark return depends on the inception date of the portfolio. If the account began in 1998, the 1999 aftertax return will reflect only a minimal amount of capital gains. If the account began in 1989, however, then the 1999 after-tax benchmark return will have a much larger capital gains component generated by every stock sold. The account will reflect 10 years of compounded gains built into the portfolio return, and when the manager sells the stocks, the account will realize much larger capital gains than an account that had been in existence for only one year. Consequently, after-tax benchmark returns for any given year tend to be smaller than those for longer holding periods.
www.aimr.org • 47
Benchmarks and Attribution Analysis
AIMR After-Tax Standards The initial task of the Taxable Portfolios Subcommittee of the AIMR Performance Presentation Standards Implementation Committee, which I chaired when it was formed in 1994, was to evaluate the various ways of computing after-tax returns. We considered everything from cash basis (using only custodian-computed, tax-related cash flows) to full liquidation, partial liquidation, and a present value methodology that would account for the potential tax liability of future portfolio liquidation in current dollars. The committee decided that the realized-basis method was the only acceptable way to report after-tax performance. Advantages of Realized Basis. In the committee’s view, the most important advantage of reporting after-tax returns using realized-basis accounting (which the U.S. SEC now calls “preliquidation”) is that implied taxes are linked directly, and in the same period, to the taxable event giving rise to them. Regardless of when taxes are actually paid, the realized-basis method forces managers to be aware of the tax impact of portfolio trading and security selection. This realization of the tax impact is true of both taxes on dividend and interest income and of capital gains taxes on security sales. A second important advantage is that after-tax performance computed in this manner will be completely in sync with pretax performance calculated according to the AIMR-PPS standards. All of the same rules regarding interest and dividend income accrual apply in the after-tax arena as well. Disadvantages of Realized Basis. The biggest disadvantage of using the realized-basis methodology is that it requires complicated accounting—accurate tax lots, calculation of accrued interest, and accretion of OID discounts/premiums—and a great deal of precision. Many investment managers lack the necessary capabilities in their computer systems, even though some software vendors have been working hard to solve that problem. Another disadvantage of this approach is that it slightly understates performance for all assets by charging taxes before they are actually due. AIMR-PPS Standards. For pretax performance, calculations are generally done according to a balance-sheet approach. The balance-sheet approach means that the manager uses the asset values at the end and the beginning of the period and adjusts for both the income received and the cash flows (positive and negative) during the period; the calculation is basically the difference between ending and begin-
48 • www.aimr.org
ning asset values divided by the average asset value for the period:
Pretax performance = Ending market value – Cash flows – Beginning market value -------------------------------------------------------------------------------------------------------------------------------------------------- . Beginning market value + Weighted cash flows This method works well for pretax performance calculations but not for the analysis of tax implications. A completely analogous method, using exactly the same numbers, is an approach based on investment flows. The manager looks at the flow activity during the period rather than focusing on the ending and beginning asset values. The manager can divide the return, in terms of flow, among the various sources of return to the portfolio—realized gains, unrealized gains, and income—and then apply the appropriate tax rate to each type of flow. The denominator is the same as in the pretax calculation—the average assets for the period: After-tax performance = Unrealized gains + Realized gains ( 1 – t ) + Income ( 1– t ) --------------------------------------------------------------------------------------------------------------------------------------- . Beginning market value + Weighted cash flows With this methodology, the after-tax implication is clearer because no taxes are incurred on the unrealized gains during the period. This equation is a simplification because realized gains and income are taxed at different rates depending on holding period and type of income, but this equation is useful for conceptualizing the process. An easier way to calculate the same result is: After-tax performance equals pretax performance minus the tax burden, where
Tax burden = ( Realized gains × Capital gains rate ) + ( Income × Income tax rate ) ------------------------------------------------------------------------------------------------------------------------------------------------------ . Beginning market value + Weighted cash flows
Importance of the Capital Gain Realization Rate The realization of capital gains plays a vital role in after-tax performance. The driving force behind the impact of taxes on a portfolio is the relative size of realized capital gains and the frequency with which they are realized. Table 1 shows the long-term return for a growth portfolio. I assume that the annual capital gains from price appreciation are 7.5 percent, the percentage of gain realized each year is 40.0 percent, the capital gains tax rate is 28.0 percent, the percentage average dividend yield is 2.3 percent, and the client’s income tax rate is 39.6 percent with no dividend exclusion. The long-term pretax return based on these assumptions is 9.8 percent (7.5 percent + 2.3 percent), and the after-tax return is 6.8 percent. Table 1 shows that in the beginning, taxes have only a minor impact on the portfolio, but as the
2001, AIMR®
Taxable Benchmarks Table 1. Implications of Varying the Rate of Realization of Capital Gains Item
Year 0
Year 1
Year 4
Year 8
Price index (untaxed)
100.0%
107.5%
133.5%
178.3%
Cost basis
100.0
100.0
111.5
141.6
Pretax gain this year Pretax value
0.0
7.5
9.3
12.1
100.0
107.5
133.6
173.7
Unrealized gain (cumulative)
0.0
7.5
22.1
32.1
Realized gain
0.0
3.0
8.8
12.8
Tax After-tax valuea
0.0
0.8
2.5
3.6
100.0
108.0
132.9
172.3
8.0
7.4
7.0
Compound after-tax return
Note: Assume all capital gains taxes paid and dividends received at year-end. a Including dividends after tax.
performance period lengthens, the after-tax return decreases because the amount of imbedded gains increases. After a holding period of about 20 years, the after-tax return drops to about 6.8 percent and remains constant. According to these specific assumptions, a 9.8 percent pretax return can convert to a 6.8 percent after-tax return depending on the holding period. When looking at after-tax performance (particularly long-term after-tax performance), the capital gain realization rate (CGRR) is an important concept. The CGRR is not necessarily the turnover rate. The Taxable Portfolios Subcommittee concluded that the measure of the CGRR should be the net gains or losses realized during the period divided by the average of the available gains during the period. The average stock of available capital gains during the period is ½(Stock of unrealized gains at start + Realized gains + Stock of unrealized gains at end).
Although turnover alone is not the measure that defines CGRR, keep in mind that portfolio turnover is not necessarily bad. For example, turnover may
include the selling, or turnover, of cash equivalents. This type of turnover does not affect taxes at all because the tax basis is always 100 percent of the market value. Or the manager may have intentionally harvested losses, which increases turnover but reduces the portfolio’s net capital gain realization. The effect of the CGRR on after-tax returns is rather dramatic. Table 2 shows the after-tax returns calculated with this same model under slightly different assumptions to illustrate the two things an investment manager can control: the CGRR—at the left of the table—and the investment style (namely, dividend yield)—at the top of the table. A manager cannot control the direction or the volatility of the market, but he or she can control the amount of turnover—a proxy for CGRR—in the portfolio. And the manager can control the investment style, whether he or she invests in high-dividend-yield stocks or low-dividend-yield stocks, growth versus value, and so on. Table 2 uses exactly the same assumptions presented in Table 1, with one exception: The portfolio’s rate of total pretax return is 10 percent (rather than 9.8 percent) a year for the next
Table 2. Effect of CGRR on After-Tax Return for Various Combinations of Appreciation and Dividend Yield Appreciation/Dividend Yield (%) CGRRa 5%
4.0/6.0
5.0/5.0
6.0/4.0
7.5/2.5
8.0/2.0
9.8/0.2
6.9%
7.3%
7.6%
8.2%
8.4%
9.0%
10
6.5
6.8
7.2
7.7
7.9
8.5
20
6.1
6.4
6.7
7.2
7.4
7.9
40
5.7
6.0
6.3
6.8
6.9
7.4
60
5.6
5.9
6.2
6.6
6.7
7.2
80
5.5
5.8
6.1
6.5
6.6
7.1
Note: Assumes a 28 percent tax rate. a Percent of gains realized each year.
©2001, AIMR®
www.aimr.org • 49
Benchmarks and Attribution Analysis 20 years, regardless of how the portfolio is structured. So, based on a completely hypothetical efficient market assumption, the after-tax returns range from about 9 percent for a portfolio with low turnover and a low dividend yield to 5.5 percent for a portfolio with high turnover and a high dividend yield.
Converting a Standard Pretax Benchmark One approach to constructing an after-tax benchmark is to convert a standard pretax benchmark to an aftertax one. Roughly 50–100 pretax benchmarks are used by managers, with 10–15 used widely. The after-tax benchmarks can be converted using various tax rates and investment periods (different inception dates). As I mentioned earlier, even if one adopts the AIMRPPS standards’ realized-basis method, there are three levels of approximation in converting a pretax benchmark to an after-tax benchmark, and I will describe those three levels in this section. At the end of this section, I will cover some of the special problems associated with converting a standard pretax benchmark to an after-tax benchmark. First Le vel of Approx imation. To convert a pretax benchmark into an after-tax benchmark, at the first level of approximation, the manager must first split the pretax return between the sources of return: for example, dividend income and appreciation—realized and unrealized. (I will use the S&P 500 as an example, and fortunately, Ibbotson has already split the returns for the S&P 500.) For this first level of approximation, the manager must also assume that the pretax benchmark has a fairly constant CGRR. (I will assume that the CGRR is 5.5 percent for the S&P 500.) Keep in mind that the CGRR can vary widely depending on the index chosen and the number of years used to construct the data. The CGRR could be 25 percent for the Russell 2000 Index or even 50–70 percent for some of the value indexes. And although the S&P’s CGRR has averaged 5.5 percent for the past 12 years, the average depends on which years are used to calculate the measure. The manager must also assume a capital gains tax rate (28 percent) and an income tax rate (39.6 percent) for dividends. Finally, the manager must apply the CGRR to the assumed portfolio appreciation and compound the remaining unrealized gains. The equations for creating an after-tax benchmark for the S&P 500 are as follows: Price
= Price(–1) × (1 + Appreciation)
Realized gain = CGRR × [Price – Cost (–1)]
50 • www.aimr.org
Cost
= Cost(–1) + Realized gain (1 – Capital gains tax) + Dividends (1 – Dividend tax)
Tax
= Realized gain × Capital gains tax + Dividends × Dividend tax
After-tax value
= After-tax value(–1) × (1 + Appreciation + Dividends – Tax)
After-tax return
= After-tax value/After-tax value(–1) – 1.
Note that the realized gains and dividends are both calculated as a percentage of Price(–1). To start the calculation, the manager must have a beginning price, and then he or she increments that beginning price by 1 plus the portfolio appreciation. Next, the manager calculates the potential realized gain, which is the difference between the price at the end of the period and the cost basis. The cost minus 1 is for the previous period (which is rolled forward each period). The manager multiplies the potential amount of realized gains by the CGRR, and the cost is incremented to reflect the reinvestment of proceeds, but only arithmetically. The U.S. IRS does not allow managers to compound the cost basis, so the manager calculates the cost using the cost of the previous period plus the realized gains from security sales in the current period times the quantity (1 minus the capital gains tax paid) plus the dividends received in the current period times the quantity (1 minus the income tax paid on the dividends). And then finally, the manager can calculate an after-tax return, which is computed on a running basis divided by the previous period’s value. The methodology is straightforward and, most important, sensitive to the manager’s assumed CGRR and assumed tax rates. Table 3 shows the results of calculating after-tax performance according to methodology. For this example, I used the performance of the S&P 500 for the past 10 years. The table shows the pretax return for the S&P 500 for each of the years from 1990 to 1999 and the after-tax performance for each of the same years. Beginning in 1991, more than one after-tax performance number exists for each year because the starting year for the calculation (for the cost) varies. For example, if the portfolio’s inception date was 1989 and the market was down 3.2 percent (pretax) in 1990, the after-tax return was a negative 4.4 percent, which may seem odd. Even though the portfolio was started in 1989 and should not have realized many gains on average in 1990, thus creating only a minimal capital gains tax liability, the average dividend earned by the S&P 500 in 1990 was high. Thus, the income tax liability alone would have had a significant tax impact on the portfolio. Because of the high dividend tax rate assumed in this analysis, the income taxes paid exceeded the net effect of realizing losses, assuming a 5.5 percent portfolio turnover rate.
2001, AIMR®
Taxable Benchmarks Table 3. S&P 500 After-Tax Return for 1990–99 as a Function of Starting Year Item Pretax
a
1990
1991
–3.17%
30.55%
1992
1993
1994
7.67%
9.99%
1.31%
1995 37.43%
1996 23.07%
1997
1998
1999
33.36%
28.58%
21.04%
Starting year for after-tax return 1989
–4.43
1990
28.35
6.05
8.46
–0.07
35.13
20.96
31.14
26.43
19.08
28.21
5.94
8.36
–0.17
35.04
20.89
31.08
26.38
19.04
6.22
8.62
0.08
35.28
21.08
31.24
26.51
19.14
8.66
0.11
35.32
21.11
31.27
26.52
19.16
0.19
35.39
21.16
31.31
26.56
19.19
35.34
21.13
31.28
26.54
19.17
21.48
31.58
26.76
19.35
31.81
26.94
19.49
27.27
19.75
1991 1992 1993 1994 1995 1996 1997 1998 a
20.05
S&P 500 pretax return.
As the holding period increases, the after-tax numbers are always lower than the pretax numbers, and they get lower the longer the investor holds the portfolio, even with a relatively low 5.5 percent CGRR. This effect is most noticeable in Table 3 for the 1999 period. For a portfolio started in 1998, the aftertax return was down about 1 percent—20.05 percent versus the pretax return of 21.04 percent. But if that portfolio had been started in 1989, the after-tax return would have been 19.08 percent. Table 4 highlights the difference between the pretax and after-tax returns for the S&P 500. The oneyear difference between pretax and after-tax returns is roughly between 1.0 percent and 2.3 percent, with a 1.75 percent average reduction across the 10 sample years (1990–1999). But for portfolios with a holding period of three years, the difference in the pretax and after-tax return is roughly 1.25–2.1 percent, and by six years, the range is 1.9–2.3 percent. By the time the investor has had the portfolio nine years, the difference is consistently above 2 percent.
The significant cumulative effect of taxes on portfolio returns explains the reason for the SEC’s proposal that mutual funds be required to report aftertax returns to their clients. Table 5 shows some of the cumulative differences (for portfolios with holding periods of up to 10 years) in pretax and after-tax returns for the years 1995–1999. Again, the difference is negligible in the first year, only 1–2 percent. But at 5 years, the difference accumulates to 19 percent on average, and at 10 years, the cumulative difference is about 77 percent. In other words, the cumulative 10year pretax return for the S&P 500 in 1999 was about 300 percent, whereas the cumulative after-tax return was about 225 percent, for an approximate difference of 75 percent, which is a fairly striking number. This large cumulative difference is obviously why the SEC is concerned about taxable investors investing in mutual funds to provide for their retirement years. Their realized return will be much lower than what is reported to them on a pretax basis.
Table 4. S&P 500 Example: Difference between Pretax and After-Tax Returns Year
1990
1989
1.26%
1990 1991 1992 1993 1994 1995 1996 1997 1998
©2001, AIMR®
1991
1992
1993
1994
1995
1996
1997
1998
1999
2.20%
1.62%
1.53%
1.38%
2.30%
2.11%
2.22%
2.15%
1.96%
2.34
1.73
1.63
1.48
2.39
2.18
2.28
2.20
2.00
1.45
1.37
1.23
2.15
1.99
2.12
2.07
1.90
1.33
1.20
2.11
1.96
2.09
2.06
1.88
1.12
2.04
1.91
2.05
2.02
1.85
2.09
1.94
2.08
2.04
1.87
1.59
1.78
1.82
1.69
1.55
1.64
1.55
1.31
1.29 0.99
www.aimr.org • 51
Benchmarks and Attribution Analysis Table 5. S&P 500 Example: Cumulative Difference between Pretax and AfterTax Returns Starting Date
1995
1996
1997
1998
1989
17.92%
26.07
39.87%
57.77%
77.43%
1990
16.84
25.04
38.85
56.87
76.78
1991
8.66
13.77
22.37
33.91
47.01
1992
5.93
10.19
17.32
27.09
38.37
1993
3.59
7.00
12.70
20.69
30.11
1994
2.09
5.20
10.34
17.69
26.51
1.59
4.28
8.41
13.60
1.55
4.16
7.63
1.31
3.22
1995 1996 1997
1999
1998
0.99
manager benchmarked against that index would have to sell that stock and take the capital gain. That is, the manager would have to calculate capital gains based on the actual capital gains realized each period as a function of the tax rate and starting cost basis. The question then arises as to what happens to after-tax returns as a function of CGRRs. Table 6 shows the differential between pretax and after-tax returns for various CGRRs and holding periods. The return differential in Year 1 does not vary greatly as the CGRR varies, but by Year 10, the difference between a CGRR of 5.50 percent (return differential of 1.96 percent) and a CGRR of 30 percent (return differential of 6.54 percent) is substantial. Therefore, calculating an after-tax benchmark using the second level of approximation creates a valuable tool. If the manager assumes a constant CGRR (as in the first level of approximation) when calculating the aftertax return of the benchmark, the result will be better than not accounting for the tax implications at all, even though it will not accurately reflect what really occurred in the index. If, in fact, the manager’s benchmarked index is changing with time (which it is), by incorporating the true CGRR in the after-tax return calculations, the benchmark’s after-tax performance numbers will differ from those calculated using the first level of approximation and will more accurately portray reality.
Second Level of Approximation. The second level of approximation entails the same general concept as in the first level, but rather than make the assumption that the CGRR is constant every year, the manager must go further and determine the actual CGRR of the index for each period. Historically, companies were dropped from the S&P 500 if their market capitalization shrank or if they declared bankruptcy, events that were not likely to create large capital gains. But recently, companies are being dropped because they have been acquired. This heightened merger and acquisition activity has had a noticeable effect on indexes such as the Russell 2000. In addition, the best performers, those that rise to the top, often no longer meet the capitalization requirements and are pushed out of the index. Huge capital gains are associated with such high turnover. Looking at the CGRR in detail, not on an average basis but analyzing it year by year, can add a lot of value to the after-tax benchmarking process. The manager has to determine for each year which companies left the index (because of bankruptcies, buyouts, or mergers) and whether the transition event was taxable. If the transition event was a merger, for example, and the company being dropped from the index was merged in a tax-free exchange with a company in the same index, then no capital gains tax would be incurred. That kind of turnover does not affect after-tax returns. But if the company being dropped was bought out by a company that was not in the index (perhaps a non-U.S. company), then a
Third Le ve l of A pproxima tion. The third level of approximation involves tracking the actual
Table 6. Average Annual Difference in Pretax and After-Tax Returns as a Function of the CGRR: S&P 500 Index CGRR
Year 1
Year 2
Year 3
Year 4
Year 5
Year 6
Year 7
Year 8
Year 9
1.50%
1.69%
1.74%
1.84%
1.96%
2.10%
2.08%
2.10%
2.08%
1.96%
10.00
1.71
2.04
2.20
2.39
2.61
2.86
2.92
3.03
3.05
2.93
20.00
2.16
2.79
3.13
3.50
3.90
4.39
4.59
4.88
5.00
4.86
30.00
2.61
3.49
3.96
4.47
5.02
5.73
6.04
6.49
6.70
6.54
5.50%
52 • www.aimr.org
Year 10
2001, AIMR®
Taxable Benchmarks dividend reinvestment income and reinvesting at the then-current (i.e., at the time of the reinvestment) prices. The manager also rebalances the benchmark portfolio whenever capital action occurs and tracks the new cost basis. Special Problems. The Dow Jones Industrial Average is probably the index most commonly used by taxable investors. The Dow is price weighted rather than market-cap weighted, which means that every time a corporate action occurs—a stock dividend, a stock split, and so on—the index must be adjusted. For example, if IBM splits two for one, a manager benchmarking against the Dow has to sell half of the IBM shares in the portfolio, regardless of the investment implications. As a result, the manager’s portfolio will realize capital gains, pay capital gains taxes, and reinvest the proceeds in the other 29 stocks in the index. So, an entirely new class of events become taxable events for the price-weighted Dow that would not be considered taxable events for the S&P 500 or most of the other indexes that are constructed according to a market-cap weighting. Style indexes also have some unique problems, such as when a stock falls in value and drops out of the Russell 1000 Index (a large-cap index) and goes into the Russell 2000 Index (a small-cap index). A manager benchmarking against the Russell 1000 must sell that stock and realize the gain. In this case, the gain realized may not be large because the stock has dropped in value, but the process applies in the other direction as well. That is, when a stock moves from a small-cap index into a midcap or large-cap index, huge gains might be realized if that stock has to be sold from a small-cap manager’s benchmark portfolio. Fixed-income indexes pose even greater problems for adjustment to an after-tax basis because index providers frequently do not list the securities in the index. Fixed-income indexes tend to be created by percentages of exposure to sectors—a certain percentage of Treasuries, mortgage-backed collateralized bond obligations, corporates, and so on. Thus, managers typically do not know which specific bonds (issuers, coupons, and maturities) are in the index. Figuring out this index composition can be difficult, if not impossible. Most fixed-income performance, however, comes from income rather than appreciation, so fixed-income indexes do not usually have the problem of accumulating unrealized capital gains, unless there has been a long period of declining interest rates.
Shadow Portfolios Converting a standard pretax benchmark is one way to construct an after-tax benchmark. A more precise ©2001, AIMR®
methodology, however, is to create a shadow portfolio that varies according to the client. In other words, the shadow portfolio pays the same pro rata capital gains taxes for withdrawals as the actual portfolio. And every time the client gives the manager more money, the shadow portfolio brings that money in at the cost basis at that time. Clients have different cash flows, and the shadow portfolios (i.e., the benchmarks) will be different for each client. Table 7 shows a shadow portfolio of the S&P 500 for a single year. The starting point at the end of 1998 is 100, and then the various monthly returns are shown. Table 7 shows a single withdrawal (half of the initial value, which is admittedly an extreme example) by the client at the end of 1999. This withdrawal causes the pretax return (21.02 percent) to drop significantly (to 15.75 percent) as a result of the capital gains tax paid on the capital gains realized from the security sales—security sales that were needed to generate the proceeds for the distribution requirement (withdrawal). Table 8 shows the impact on the after-tax return for the period (1999) if the withdrawal had been made in each month of the year—January, February, and so on. The return for the “No cash flows” row is the same, 20.2 percent, as in the standard benchmark conversion approach. The benchmark return using a shadow portfolio and a 50 percent withdrawal in January, however, is 16.7 percent. So, this withdrawal makes a big difference in the after-tax return, but the impact of the withdrawal varies by month. If the withdrawal had been made in February, the return would have been 18.4 percent; if it had been made in May, 17.3 percent. Because the index price varies, the benchmark, which is the S&P 500 in this case, also varies as a function of when the withdrawal is made.
Conclusion Constructing after-tax benchmarks is not easy, which is why I have been involved in the AIMR-sponsored effort to create a standardized methodology. Perhaps the most important aspect of constructing an after-tax benchmark is starting with the correct pretax index. The next step is to carry out the first level of approximation—splitting the appreciation and income return sources because of different tax rates and then applying a constant CGRR. The second level of approximation—calculating the after-tax returns with the appropriate CGRR each year—yields an even more accurate view. And finally, a combination of these approximations, plus adjusting for significant cash flows through a shadow portfolio to calculate a portfolio-specific cost basis, produces the most detailed and accurate after-tax benchmark.
www.aimr.org • 53
Benchmarks and Attribution Analysis Table 7. S&P 500 Example Shadow Portfolio Inflow
Date
Value
Cost Basis
12/98
100
100
Tax Rates
Outflow
Dividends
01/99
Capital Gains
Benchmark Rates Pretax Return Price
Pretax Return Income
Returns
Capital Gains Realization
Pretax
After Tax
39.60%
28.00%
5.64%
0.12%
0.46%
5.76%
5.71%
39.60
28.00
4.10
0.08
0.46
4.18
4.14
02/99
39.60
28.00
–3.23
0.12
0.46
–3.11
–3.17
03/99
39.60
28.00
3.88
0.12
0.46
4.00
3.94
04/99
39.60
28.00
3.79
0.08
0.46
3.87
3.82
05/99
39.60
28.00
–2.50
0.14
0.46
–2.36
–2.43
06/99
39.60
28.00
5.44
0.11
0.46
5.55
5.49
07/99
39.60
28.00
–3.20
0.08
0.46
–3.12
–3.17
08/99
39.60
28.00
–0.63
0.13
0.46
–0.50
–0.57
09/99
39.60
28.00
–2.86
0.11
0.46
–2.75
–2.81
10/99
39.60
28.00
6.25
0.07
0.46
6.32
6.28
11/99
39.60
28.00
1.91
0.13
0.46
2.04
1.97
39.60
28.00
5.78
0.11
0.46
5.89
1.90
21.02
15.75
12/99
50
Total Note: Model from David Stein, Parametric Portfolio Associates.
Table 8. S&P 500 Example for 1999 with 50 Percent Withdrawal in Various Months No cash flows
54 • www.aimr.org
20.20%
January
16.73
February
18.36
March
17.53
April
16.87
May
17.32
June
16.47
July
17.01
August
17.12
September
17.71
October
16.66
November
16.38
December
15.75
2001, AIMR®
Taxable Benchmarks
Question and Answer Session Lee N. Price, CFA Question: Can active managers compete against index funds when after-tax performance is evaluated? Price: As an active equity manager, I hate to admit that for a highly taxed client, particularly one who lives in a state with 10 or 12 percent state tax on top of the federal taxes, the manager has to generate a huge alpha to justify even relatively low turnover (say 10–20 percent). If the manager, particularly a growth manager, can reinvest without paying much in income taxes and can keep those capital gains without paying any taxes, over a long period of time, that manager will make a fortune. But if the manager has to realize those gains because of huge turns in the market, performance will suffer. I admit that for the active manager, the after-tax bogey is a much tougher one than the pretax bogey. Question: If mutual funds were to report after-tax returns, would they use their actual turnover rate or some sort of generic turnover rate to make the calculations? Price: The answer depends on SEC requirements. The proposal issued by the SEC makes after-tax reporting relatively easy for mutual funds because the SEC defines a taxable event as the point in time when the mutual fund declares its dividends to shareholders. A lot of funds, Vanguard’s 500 Index Fund for example, declare dividends only once or twice a year. Thus, for mutual fund calculations, the details related to the amount of income and turnover on a day-to-day basis are not important. So, it will be much eas©2001, AIMR®
ier for a mutual fund to abide by the SEC proposal than for an active manager of a taxable account to comply with the AIMR after-tax performance guidelines. That said, the mutual fund’s calculations will have to include their actual turnover (gains, losses, short term versus long term, etc.), not a generic estimate. Question: Is AIMR considering any revisions to the AIMR-PPS standards for after-tax reporting? Price: The Taxable Portfolio Subcommittee has been reconvened under Douglas Rogers, CFA, as chair and expects to issue a revised version of the AIMR-PPS standards for after-tax reporting around July 2001. The original after-tax standard in 1995 suggested that each composite use the maximum federal tax rate for that class of clients and ignore global and state taxes. At that time, the committee was aiming for comparability between managers. The current committee thinks that giving clients what they want is more important, which in many cases means that managers should calculate after-tax returns based on their clients’ own tax rates. Accordingly, AIMR’s draft proposal gives managers additional leeway to use the actual average tax rate of the portfolios in a particular composite. For example, if you had 10 taxable clients and some expected their tax rate to be 35 percent—including federal, state, and local taxes—and others expected 46 percent, you could provide the after-tax calculations for each of those accounts based on their own tax rates and then create a weighted-average performance for the composite and a weighted-
average tax rate, which would have to be disclosed. So, after-tax numbers wouldn’t necessarily have anything to do with maximum federal rates. The SEC may adopt something similar. Originally, we encouraged the SEC to follow the AIMR standard of using the maximum federal tax rate to allow for comparability across mutual funds. But many mutual fund managers protested that their average client is not in the 46 percent tax bracket and want to use a tax rate that reflects the average mutual fund holder. The SEC is seriously considering that request.1 Question: How many firms report after-tax returns? Price: I am aware of two or three firms—asset managers—that have gone to the trouble of incorporating the current AIMR-PPS after-tax standards. Most of them have done so primarily for large institutional taxable clients. Those might be corporate holding companies, insurance companies, nuclear decommissioning trusts, or family trusts—all accounts in which millions of dollars are involved and the corporation or the entity must report on an after-tax basis to a third party. Such clients often want to see after-tax performance because they want to drive into managers’ minds that a taxable account ought to be managed differently from a tax-free account and that they may even measure managers on aftertax, not pretax, performance. 1 The final SEC rule has since been released and requires the use of the maximum federal tax rates.
www.aimr.org • 55