The Design and Use of Political Economy Indicators: Challenges of Definition, Aggregation, and Application

The Design and Use of Political Economy Indicators This page intentionally left blank The Design and Use of Politic...

Author: King Banaian | Bryan Roberts

16 downloads 325 Views 1MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

DOWNLOAD .DOCX DOWNLOAD .PPTX

The Design and Use of Political Economy Indicators

This page intentionally left blank

The Design and Use of Political Economy Indicators Challenges of Definition, Aggregation, and Application

Edited by

King Banaian and Bryan Roberts

THE DESIGN AND USE OF POLITICAL ECONOMY INDICATORS

Copyright © King Banaian and Bryan Roberts, 2008. All rights reserved. First published in 2008 by PALGRAVE MACMILLAN® in the United States—a division of St. Martin’s Press LLC, 175 Fifth Avenue, New York, NY 10010. Where this book is distributed in the UK, Europe and the rest of the world, this is by Palgrave Macmillan, a division of Macmillan Publishers Limited, registered in England, company number 785998, of Houndmills, Basingstoke, Hampshire RG21 6XS. Palgrave Macmillan is the global academic imprint of the above companies and has companies and representatives throughout the world. Palgrave® and Macmillan® are registered trademarks in the United States, the United Kingdom, Europe and other countries. ISBN-13: 978–0–230–60083–6 ISBN-10: 0–230–60083–2 Library of Congress Cataloging-in-Publication Data The design and use of political economy indicators : challenges of definition, aggregation, and application / edited by King Banaian and Bryan Roberts. p. cm. ISBN 0–230–60083–2 1. Economic indicators. I. Banaian, King, 1957– II. Roberts, Bryan W. HB3711.D43 2008 330.01⬘5195—dc22

2008017400

A catalogue record of the book is available from the British Library. Design by Newgen Imaging Systems (P) Ltd., Chennai, India. First edition: November 2008 10 9 8 7 6 5 4 3 2 1 Printed in the United States of America.

We dedicate this book to our wives, Barbara Banaian and Malgorzata Pskit-Roberts

This page intentionally left blank

Contents

List of Figures and Tables

ix

List of Contributors

xi

1

The Challenges of Measurement in Political Economy King Banaian, Marta Podemska, and Bryan Roberts

1

2

So You Want to Use a Measure of Openness? H. Lane David

3

Measuring Central Bank Independence: Ordering, Ranking, or Scoring? King Banaian

15

33

4 Fiscal Indicators John E. Anderson

57

5 A New and Better Measure of Capital Controls Pariyate Potchamanawong, Arthur T. Denzau, Sunil Rongala, Joshua C. Walton, and Thomas D. Willett

81

6 Measuring Welfare Bryan Roberts 7

8

Why and How to Move from Capturing Perception of to Quantifying Corruption? Omer Gokcekus and Justin Myzie New Interpretations of Indices of Economic Freedom King Banaian and William Luksetich

9 On the Methodology of the Economic Freedom of the World Index Robert A. Lawson

103

139 155

171

viii

10

●

Contents

Government Structure, Strength, and Effectiveness Joshua C. Walton, Apanard Angkinand, Marina Arbetman, Marie Besançon, Eric M. P. Chiu, Suzanne Danis, Arthur T. Denzau, Yi Feng, Jacek Kugler, Kristin Johnson, and Thomas D. Willett

Index

187

217

Figures and Tables

Figures 5.1 5.2 5.3 5.4 6.1 6.2

Malaysia Korea India Mexico UNDP HDI versus Per Capita Income Income and Happiness in the United States

95 96 96 97 123 123

Tables 3.1 Weightings in Various Central Bank Independence Indices 3.2 Inflation and CBI in Transition Economies 3.3 Unscrambled Principal Components Analysis of CBI in Transition Economies 3.4 Classification of OECD Central Banks 4.1 CRI and ORI Indices of Fiscal Reform 4.2 Index of Tax Policy Reform 4.3 Index of the Capture Economy (ICE), 1999 4.4 Heritage Foundation Fiscal Freedom (FF) and Freedom from Government (FFG) Scores, 2007 4.5 Doing Business (DB) Indicator Components 4.6 Regression Analysis of the Doing Business 5.1 IMF AREAER Components 5.2 Capital Control Indices in 1996 by Country 6.1 Per Capita National Income and MEW (in 1958 Dollars) 6.2 Official and Extended Account Long-Run Growth Rates 6.3 Inequality in Distribution of Welfare Measures across Countries 6.4 Empirical “Macroeconomic” Characteristics of Welfare Measures

47 48 49 50 74 74 75 76 77 77 98 99 124 124 124 125

x

6.5 6.6 8.1 8.2 8.3 8.4 8.5 8.6 8.7 9.1

●

Figures and Tables

Happiness and Income Data—South Korea Regression Results Correlation Matrix of Variables Living Standards and Aspects of Economic Freedom Regressions Treating Steps in Freedom Components as Not Equidistant Property Rights Plus Other Measurements Property Rights, Trade, and Other Elements of Freedom Principal Component Analysis (PCA) of Heritage Economic Freedom Index Principal Component Regression The Areas and Components of the EFW Index

126 127 164 165 166 167 167 168 168 182

Contributors

John E. Anderson, University of Nebraska-Lincoln Apanard Angkinand, University of Illinois, Springfield Marina Arbetman, Transnational Consortium King Banaian, St. Cloud State University Marie Besançon, Harvard University Eric M. P. Chiu, National Chung-Hsing University, Taiwan Suzanne Danis, Claremont Graduate School H. Lane David, Indiana University South Bend Arthur T. Denzau, Claremont Graduate School Yi Feng, Claremont Graduate School Omer Gokcekus, School of Diplomacy and International Relations, Seton Hall University Kristin Johnson, University of Rhode Island Jacek Kugler, Claremont Graduate School Robert A. Lawson, Capital University William Luksetich, St. Cloud State University Justin Myzie, School of Diplomacy and International Relations, Seton Hall University Marta Podemska, St. Cloud State University Pariyate Potchamanawong, Bank of Thailand Bryan Roberts, U.S. Department of Homeland Security Sunil Rongala, Claremont Graduate School Joshua C. Walton, Claremont Graduate School Thomas D. Willett, Claremont Graduate School

This page intentionally left blank

CHAPTER 1

The Challenges of Measurement in Political Economy King Banaian, Marta Podemska, and Bryan Roberts

Do you really think you can get good conclusions from bad data? Simon Kuznets, 1976

1 Introduction In March 2003, U.S. Treasury Undersecretary John Taylor went before the Senate Committee on Foreign Relations to testify for the Millennium Challenge Act (MCA) of 2003.1 MCA was a centerpiece of the Bush administration’s new vision for foreign aid. The vision was that there were three requirements to accelerate the growth of developing economies: there should be more aid; there should be incentives for better performance of these economies; and there had to be measurement of the results. Undersecretary Taylor proceeded to outline a series of measurements produced by governmental and nongovernmental agencies such as Freedom House, the World Bank Institute, the UN’s World Health Organization (WHO) and UNESCO, and the Heritage Foundation. Never before had foreign aid been driven by such measurements; the Bush administration promised $5 billion in new aid to be allocated according to these standards through a new private agency to be called the Millennium Challenge Corporation (MCC). The measure passed. Measurement is not just driving policy in donor countries. The Heritage Foundation reported in 2006 that recipient countries were

2

●

Banaian, Podemska, and Roberts

interested in how its Index of Economic Freedom measures trade policy, one of the variables used by MCC in determining who gets their funds. 2 The World Bank in its “Doing Business” survey of government regulation of businesses titled one section “What Gets Measured Gets Done.” Publishing comparative data on the ease of doing business inspires governments to reform. Since its start in October 2003, the Doing Business project has inspired or informed 48 reforms around the world. Mozambique is reforming several aspects of its business environment, with the goal of reaching the top rank on the ease of doing business in southern Africa. Burkina Faso, Mali and Niger are competing for the top rank in West Africa. Georgia has targeted the top 25 list and uses Doing Business indicators as benchmarks of its progress.3

In 2006 the World Bank produced estimates of “intangible wealth” as a residual in classic Solow growth accounting measurements. Along with proxies for human capital formation such as education and literacy rates, the Bank included measures of “governance.” While admitting there were multiple dimensions of governance, they chose to focus on only one, the rule of law. They argued that all the measures are correlated with each other and with measures of social capital or trust, and therefore it was not particularly important to decide which of these measures mattered more. Using an index of rule of law, the authors offered a point estimate of 0.83 as an “elasticity of intangible wealth with respect to the rule of law,” The authors admit: With respect to the rule of law variable, the implications for policy making are less obvious since the partial derivative depends on the scale on which the rule of law index is measured (1 to 100 in this instance), not to mention the difficulty in deciding what it means—in terms of changing real institutions—to increase rule of law by one point on the scale.4

Popular debates are also fought through the use of statistics. After the release of the documentary Sicko, several newspapers carried stories about the data the producer and director, Michael Moore, had taken from WHO’s comparative study of health systems. But as one study has pointed out, WHO’s ranking of overall health attainment placed more weight on the distribution of health care than the quality by measuring the share of one’s income on health expenditures. If health care is a necessity (having an income elasticity of demand under one), then in

Challenges of Measurement

●

3

most countries with sufficiently large income, more is spent by the poor as a share of their income than by the rich. This is reduced by the share of health expenditures made by the public. Therefore, the measure favors those countries that rely on progressive income taxes to pay for health care, regardless of its quality.5 The desire to measure seems to consume much of comparative political economy. There is an index of gun control legislation in the 50 states of the United States, an international failed state index and a “happy planet index.” There are Big Mac indices and the Cost of Doing Business Index. Most of these indicators result in measures that either range over a limited set of integers or are bounded by some upper limit. They are also used to say something about the laws, the economics, the culture, or the politics of two or more places in the world. And they are increasingly inf luential for policy and decision making. Most generators of such indices assert that while such measures are imprecise, they contain much useful information. Kaufmann, Kraay, and Zoido-Lobatón, in creating measures of governance, argue as well that the data are informative since they are highly correlated across sources and that by using statistical techniques the imprecision can be improved. They nonetheless conclude that “the control of corruption varies widely across countries” and that “even efficient aggregate indicators are relatively imprecise, because many countries’ likely ranges of governance overlap.”6 Yet even in the face of this imprecision, the World Bank claims that “when governance is improved by one standard deviation, incomes rise about threefold in the long run, and infant mortality declines by two-thirds” and that these improvements would happen if the level of corruption in Equatorial Guinea would fall to that of Burundi, or the level in Burundi were to fall to that of Lithuania.7 This is one of only several issues in laying different types of institutions on a linear, continuous scale. The fact that these measures are increasingly used to drive key policy decisions such as whether or not a country will receive large amounts of foreign aid makes it all the more important that they carefully and accurately measure a coherent concept that is important for growth, development, and welfare. Much investment in these measures, however, seems to focus on promotion rather than methodology. It is also sometimes not clear whether a political economy concept is simply an input into society achieving something desirable, or is desirable in and of itself. For example, is economic freedom valuable because having it directly boosts the welfare of a population, or is it a means to

4

●

Banaian, Podemska, and Roberts

an end such as a higher growth rate of GDP? If it is valuable in and of itself, how can we measure how it contributes to our welfare? Consider the concept of welfare or happiness or well-being, arguably the most fundamental political economy indicator given that all actions and policies are ultimately aimed at increasing it. 8 There is no consensus on how to define and measure well-being. Modern economic theory often argues that welfare is inherently unmeasurable and ordinal in nature, and that the welfare level of one person cannot be compared to that of another. However, most economic analysis usually assumes cardinal utility functions. Since the 1980s, enthusiasm for trying to measure welfare directly has grown dramatically. Some want to focus on how well an economy satisfies the basic needs of its poorer members. Others want to measure how well a society creates capabilities for its members. And others measure and analyze welfare as reported by individuals themselves in responses to questions such as “how happy are you overall with your life on a scale of 1 to 10?” Attempting to measure welfare or the quality of life often raises the very difficult issues of how to choose the key factors that determine welfare and how to aggregate the together. Van Praag and Frijters (1999, p. 416) make the point: The quality of life is usually defined as a weighted average of specific country statistics. The statistics used include, for instance, the literacy level of the entire population, the literacy level of women, infant mortality rates, income levels per head, life expectancies of men and women, indicators of political stability, energy consumption per capita, average household size, the number of persons per physician, levels of civil liberties, and so on. It is clear that these variables may be very important for the utility levels of individuals and nations; however, the utility levels themselves are not measured by these variables. An obvious problem is then, how should these statistics be weighted? Does the quality of life increase more when the female literacy level increases by 1 percent or when the civil liberty index improves by 1 percent? It is clear that if one does not want to use evaluations of individuals themselves as a weighting method, the opinions of the research become the deciding criterion. The problem of how to weight these different variables into a composite quality-of-life index is, not surprisingly, the main source of dispute in this literature.9

The problems of selection and aggregation appear in many of the chapters in this volume. Political economy measures typically resolve the problem by either simplifying the measure down to one dimension so that there is no aggregation issue, or make ad hoc assumptions on

Challenges of Measurement

●

5

selection and weighting. The former solution risks ignoring obviously important dimensions of a measure, while the latter solution risks poorly measuring a concept. Ad hoc approaches to aggregation often take the approach of weighting things equally. The implications of this are strong. Will reducing the index of civil liberties by 1% and increasing the female literacy rate by 1% lead to no change in the welfare of a community? It is unlikely that such trade-offs will usually be one-for-one. In the chapters of this book, we lay out the case for better measurement in political economy. If measures are going to be as consequential as they seem to be becoming for decision making, the quality of these measures must be assured. Unfortunately, time and again we find significant f laws in how political economy measures are defined and constructed. 2 Input versus Output Measures, Causality, and the Time Dimension Production functions are causal relationships in which an output is determined by input variables. Many measures in political economy do not carefully distinguish between inputs and outputs and make the mistake of aggregating them together into one index. This problem has arisen forcefully in the measurement of national income, for example. The closely related problem of identification in estimating relationships between variables must also be confronted in empirical political economy research. Consider the relationship between real income and self-reported happiness as described above.10 Happiness is believed to be determined by the level of income. It would therefore be a mistake to create an index that combines happiness and income into a single indicator. Estimating the empirical relationship between happiness and income is also tricky, because underlying determinants of a person’s happiness might also help determine how much income they are capable of earning. Estimating the relationship between happiness and income should ideally attempt to control for this. The economics profession has become steadily more sophisticated and creative at identifying causal relationships through finding instruments, conducting randomized experiments, and taking advantage of natural experiments, but the difficulty in achieving identification should never be underestimated. In political economy analysis, the causality problem is particularly complicated because of the dynamic relationship between outputs and outcomes and the functioning of institutions. Mancur Olson (1982)

6

●

Banaian, Podemska, and Roberts

refers to this as a sclerosis of institutions. Douglass North (2005, pp. 123–124) makes the point as well. Not only does each factor and product market require different specific constraints so that it will provide the right incentive structure for the players, but economic change will require continuing alteration in the institutional structure in order to maintain efficiency. This is particularly critical for capital markets, which however well they may serve to facilitate growth at one time, may become obstacles to growth at another time; and there is no guarantee that they will automatically evolve as the economy evolves. The structure of the market will determine the incentives of the players and with changes in the aforementioned conditions the incentives that at time t would induce the players to make an efficient capital market may in time t ⫹ 1 induce the players to engage in activities that undermine, weaken, or indeed destroy the capital markets with consequent adverse effects on the economy as a whole. The history of Japan in the 1990s is a classic instance of a capital market that initially fueled extraordinary development—that of post–World War II—only to develop the sclerosis that followed.

It is critical to differentiate between inputs and outputs in comparative analysis by paying particular attention to the time period of study. Stages of development lead to a movement in institutions over time, as economies of scale or scope overcome the transactions costs of moving to a new institutional arrangement. Unfortunately, as North notes, policy prescriptions tend to be developed using “a static analysis in a dynamic setting” (p. 125). Most studies of growth (or inflation, investment, or any number of other economic phenomena) depend crucially on finding some set of exogenous factors that may explain variations across countries. Unfortunately, as already noted, most factors are plausibly endogenous. But even if causal relationships can be identified, this does not guarantee that the estimated relationship is helpful for policy and decisionmaking purposes. Pranab Bardhan (2005) finds the evidence on the linkage between democracy and growth “unhelpful and unpersuasive.” It is unhelpful because usually it does not give us much of a clue into the mechanisms through which democracy may help or hinder the process of development. It is generally unpersuasive because many of the studies are beset with serious methodological problems (like endogeneity of political regimes to economic performance, selection bias from the survival of particular regimes, and omitted variables) and the usual problems of data quality and comparability. (p. 88)

Challenges of Measurement

●

7

Bardhan also points out that in measuring corruption, the use of surveys and ordinal rankings is colored by the respondents’ perception of the economic performance of the country itself, introducing measurement error of a particularly troubling kind (p. 138). To what degree are political economy measures based on subjective surveys plagued by this serious type of measurement error? 3 Scales, Continuity, and Linearity A consistent theme in the chapters in this book is the proliferation of scales with which one compares countries. A map of economic freedom of the world lists colors countries as “free,” “mostly free,” “mostly unfree,” and “unfree.” Numerous chapters have used the simple onedimensional integer classification of economic freedom that ranges from 1 to 4 as an independent variable in empirical analysis. In some cases, the concept that is being measured clearly requires a greater degree of sophistication than this. For example, capital controls can come in many varieties as Potchamannawong et al. discuss later in this book. There can be controls on inf lows or outf lows, on portfolio investment but not direct investment, and so on. A simple 0–1 binary variable does not capture the complexity of capital controls, and a great deal of subjective judgment is required to condense this complexity to a one-dimensional scale. Unfortunately, many analysts are quite willing to use such simple measures without worrying very much about their quality. The easiest path, and often the only path, is to use the highly simplified measures that are publicly available. Scales create two types of issues. First, not all types of classifications will have the same value in explaining various economic outcomes, and different types of institutional arrangements will matter more for different outcomes. Second, movement along a linear scale does not correspond to linear or even logarithmic movement in economic phenomena. Banaian and Luksetich (2001) show, for example, that any conf lict resolution mechanism for monetary policy short of complete autonomy is not effective in controlling inflation. What researchers seek is not just a classification system of institutional features. What they want are a means of measuring quality. This is why we believe researchers continue to seek scales for measurement. But scales are frequently introduced in linear fashion (Fedderke, Klitgaard, and Akramov [2006] provide a useful exception). There may be a range of institutional features over which a linear scale or homogeneity can be found, but there are others that have nonlinear

8

●

Banaian, Podemska, and Roberts

features. In this case we believe that it is preferable to use categorical variables. In a review of Bueno de Mesquita, Smith, Siverson, and Morrow (2003), Kling (2007) discusses the kind of problems we find with the attempt to measure institutions using scales rather than categories. Throughout the book, I struggled with how these concepts might be given precise definitions that permit real-world measurement. The authors present many regression results involving these variables, but the first time I skimmed through the book I did not look closely at either the regressions or the descriptions of the variables. When I did take a closer look, here is what I found on pages 134–135: “The POLITY IV collection of data . . . include a number of institutional variables . . . We use another POLITY variable, Legislative Selection (LEGSELEC), as an initial indicator of S . . .” “ POLITY codes this variable as a trichotomy, with 0 meaning that there is no legislature. A code of 1 means that the legislature is chosen by heredity or ascription or is simply chosen by the effective executive. A code of 2, the highest category, indicates that members of the legislature are directly or indirectly selected by popular election . . . We divide LEGSELEC by its maximum value of 2 so that it varies between 0 and 1.” When I teach statistics in high school, one of the basic concepts is the difference between a quantitative variable (something like inches or dollars) and a categorical variable (something like Democrat, Republican, or independent). The authors’ theory of W and S describes quantitative variables. Instead, to obtain S, they use a categorical variable that has three categories. To get from a categorical variable to a quantitative variable, they treat the coding convention (0, 1, or 2) as if it were a scale.11

The chapters by Banaian and Banaian and Luksetich address these issues. Robert Lawson acknowledges elsewhere in this book that conversion to linear scales is arbitrary but that it nevertheless meets this desire to describe one place as freer than another. Yet two places could be different in significant ways and have the same freedom “score.” We believe, as do most chapters in this volume, that the scoring obscures more than it reveals. 4 The Course of This Book The logic of this book is to first examine input measures, which we define as attempts to measure institutional features. It is unlikely that these measures ever appear in statistical analysis as dependent variables, but are used as explanatory variables for phenomena-like growth.

Challenges of Measurement

●

9

H. Lane David shows in the next chapter that trade openness is perceived as the most important factor in determining economic growth, along with quality of institutions and geography. The issue of trade openness is a contentious one, and its definition is still not fully formulated. The many understandings of the term have resulted in even greater number of measures. David collects many measures into a single dataset. Having this dataset and its abundance does not answer the question of what trade openness is, but it will be useful for policymakers and analysts who seek to use these measures. David first reviews the theory of the relationship between trade and economic growth, and then reviews the literature on measuring trade openness. He then categorizes available measures and discusses their strengths and weaknesses. Six broad types of measures are considered: trade shares, adjusted trade flows, price-based, tariffs, nontariff barriers, and composite indices. Correlations between measures and across groups of measures are evaluated to determine to what degree these measures capture some common aspects of trade policy. David concludes that much work remains to be done in terms of measuring trade openness. Currently available tools are not sufficient to determine the consequences of specific policies and barriers to trade, although they do provide some understanding of trade openness. In the third chapter, King Banaian evaluates measures of central bank independence (CBI). CBI became popular in the literature after models of monetary policy found an inf lationary bias in monetary policy conducted by democratic governments. Banaian reviews all of the composite indices of CBI that have been created and tries to ascertain which ones are more relevant to central bank goals. These measures include the early measures that relied on ranking central banks and the cardinal measures created by Cukierman and later by Grilli and Masciandaro and Tabellini. The linear scale and averaging aspects of these measures are scrutinized. Banaian concludes that none of the approaches taken to measuring CBI is adequate. A single index of CBI that embraces the many possible dimensions of independence loses sight of why one wishes to measure CBI. Banaian offers a new approach to measurement in which what matters and what does not matter in terms of CBI is determined a priori on the basis of theory, not on data availability or other circumstances extraneous to the phenomenon itself. In the fourth chapter, John Anderson explores fiscal indicators and fiscal policy measurement. Intense interest in making cross-country comparisons of fiscal conditions and evaluating fiscal reform in transition

10

●

Banaian, Podemska, and Roberts

countries has emerged in recent years, as well as interest in assessing the quality of fiscal institutions such as tax systems of different countries and measuring the size and intrusiveness of government across countries. Anderson reviews the major indicators of fiscal health and reform that are employed in the literature, both in transition countries specifically and all countries generally. Fiscal reform measures include the Martinez-Vazquez and McNab Cumulative Reform Index (CRI) and Overall Reform Index (ORI), the Ebrill and Havrylyshyn Index of Tax Policy Reform (ITPR), and the Hellman et al. Index of the Capture Economy (ICE). Fiscal condition indicators include aspects of the Heritage Foundation Economic Freedom Index (EFI) and the World Bank/EBRD Doing Business (DB) indicators. For each fiscal indicator, analysis and critique is made of both the components that are used in building the indicator and the methodology used to construct it. Suggestions are then made for the improvement of these indicators. In the fifth chapter, Pariyate Potchamanawong, Arthur Denzau, Sunil Rongala, Joshua Walton, and Thomas D. Willett offer a new and better measure of capital controls. There are many opinions on how capital controls affect an economy, and the debate has not settled on their impact. To understand how capital controls influence economies, we first need to know how to effectively measure them. This is complicated and made difficult by several problems that range from data availability to coding methodology. Potchamanawong et al. provide an in-depth discussion of the different types of measures, with inclusion of methods that are no longer used and were replaced by newer ones. Popular in the past, 0–1 measures treated all types of capital controls equally. Since different types of capital f lows behave differently under different circumstances, they need to be examined separately in order to gain full understanding of their impact on the economy. The binary 0–1 measures that were popular in the past treated all types of capital controls equally, but different types of capital f lows behave differently, and 0–1 measures are far too crude to capture important distinctions among f lows. More recent measures can be divided into two groups: measures of breadth and measures of intensity. Measures of breadth focus on the extensiveness of controls across different types of capital f lows, and measures of intensity focus on the severity of restrictions. Most studies to date have failed to adequately capture the severity of restrictions, and most available measures of intensity employ gradations of the degree of restrictiveness of capital controls, typically either by coding the extensiveness of controls across types of capital flows, or by coding the stringency of controls. The authors then present a measure

Challenges of Measurement

●

11

that combines both the breadth and intensity dimensions and also separately codes restrictions on capital f lows and outf lows. A comparison of results using previous methods with this new approach suggests that the Claremont Potchamanawong measure is the most effective one. We then turn to measures of the outputs of political economies such as welfare. In the sixth chapter, Bryan Roberts critically evaluates the many indicators of utility, welfare, and well-being that have been proposed and measured over the past century. This political economy measure is arguably of the highest importance in the political economy field, because ultimately choices, policies, and actions will be judged on how they affect the welfare of an individual or a society. Roberts reviews the neoclassical conception of welfare, real income, composite social indicators, the capabilities approach and the human development measure, subjective well-being, and objective happiness. This review shows that there is no consensus on how to define and measure welfare. Serious aggregation problems that arise in most approaches are discussed, and several interesting paradoxes are identified that help illustrate why we are far from having any settled consensus. Some thoughts are offered on making progress, but the chapter does suggest that it will not be easy to make progress toward resolving difficult issues and achieving a consensus on how to measure welfare. In the seventh chapter, Omer Gokcekus and Justin Myzie focus on the issue of corruption and difficulties with quantifying it. Even if it is possible to define and measure corruption and determine how it inf luences economic performance, and even how to fight it, we must still answer the most crucial question: at what cost should it be eradicated? Corruption is an undesirable phenomenon in market economies, as it impedes real income growth and reduces productivity, but fighting it involves cost, and in a world of limited resources, any action should be undertaken only as long as benefits are greater than costs. Properly measuring corruption and the costs that it brings about is crucial to determine how much should be spent on fighting it. Corruption is clandestine, and great effort has been put in to keep it secret, so direct measurement of corruption is probably impossible. Most current efforts to measure corruption rely on surveys. Gokcekus and Myrie differentiate between measurement of the perceived and actual extent of corruption and argue that what is actually being measured by surveys is the perception. Attention is given to two main methods of measuring corruption. One is to track a country’s institutional features, and the other is to audit specific projects. Neither of these methods is perfect, and both need to be followed with special care. The authors conclude with a

12

●

Banaian, Podemska, and Roberts

recommendation that policymakers stay aware of the strengths and weaknesses of perception-based indicators and auditing projects. In the eighth chapter, Banaian and William Luksetich evaluate the usefulness of composite measures of economic freedom by analyzing the Heritage Foundation/Wall Street Journal Index of Economic Freedom. This index is the best known index of economic freedom and consists of an unweighted average of 10 factors deemed to be equally important in determining the level of economic freedom in a given country. The relationship between economic welfare and the 10 components of the economic freedom index is analyzed using principal components techniques, and Banaian and Luksetich conclude that the majority of these factors are irrelevant and that property rights are the strongest determinant of economic growth and thus the best indicator of economic freedom. They argue that countries that incorporate the concept of negative liberty, in which the role of government is to only protect life and property, achieve the highest rates of economic growth. They also show that the relationship between welfare and freedom is not continuously linear: a minimum level of economic freedom must be established before substantial improvements in economic welfare are attained. Banaian and Luksetich conclude that although there is no need for a new index of economic freedom, there is a need to narrow the definition of economic freedom and focus on fewer determinants in its measurement. In contrast, Robert Lawson defends the use of another measure of economic freedom, the Economic Freedom of the World (EFW) Index. He first discusses why measurement of economic freedom is important: accurate measurement of economic freedom in different countries is crucial to answer the question of whether economic freedom has a significant impact on income, growth, equality, and poverty. If a positive and significant relationship can be identified, policymakers can formulate courses of action accordingly. The EFW Index was constructed almost two decades ago and is now widely used. It is an aggregated index created from a combination of hard data, surveys, expert panels, and case studies obtained from various third-party sources. It was designed to fulfill two tasks: to rate a large number of countries using a transparent methodology, and to stay resistant to political pressure. Lawson examines the various conceptual and methodological issues and trade-offs involved in constructing the EFW Index. The advantages and disadvantages of different types of indices and the kinds of information that can be used to construct them are reviewed, including surveys, expert panels, case studies, hard data, and aggregations. The methodological choices involved in the creation of the EFW Index are

Challenges of Measurement

●

13

also carefully examined. Attention is given to problems resulting from use of third-party data, issues of transparency, difficulties of index numbers, decision on weights assigned to the components of economic freedom, and issues with comparability between countries and over time. Lawson concludes that the EFW Index is well suited to achieve its intended tasks. Despite the many problems associated with trying to summarize a multidimensional concept like economic freedom into a single number, the availability of this measure has energized empirical work in this area, and has greatly reduced the ideological content within the larger debate. In the last chapter Joshua C. Walton, Apanard Angkinand, Marina Arbetman, Marie Besançon, Eric M. P. Chiu, Suzanne Danis, Arthur T. Denzau, Yi Feng, Jacek Kugler, Kristin Johnson, and Thomas D. Willett look at cornucopia of cross-national datasets designed to measure aspects of political institutions and the strength and stability of governments. The abundance of studies results in strong confusion among the researchers trying to use them, and questions such as what do the measures mean and which might be useful for a particular empirical purpose have arisen. Different concepts of good governance are discussed, and measures of political institutions present in the literature are exhaustively reviewed. Most of the existing measures suffer from similar problems. The first one is collinearity: adding more measures conveys almost no new information that is not already contained in the smaller subset. Collinearity presents daunting interpretational and statistical problems to the analyst, and needs to be dealt with very carefully. A further problem in the attempts to create a single aggregate index is the lack of serious analysis of the proper weighting of the index’s components. Another troubling feature of the institution measures are data sources. One possible differentiation is whether the data measure is reasonably objective or largely subjective. Most of the indices confuse these two data source types. After defining the most common problems in measures of governance, political strength, and institutions, Angkinand et al. look at specific measures in different categories, including different measures of political instability, veto players, and measures of winning coalitions and the size of selectorate. Causality behind the measures and what they are meant to indicate are reviewed. State failure measures are also examined, and a critique of Foreign Policy’s Failed States Index is offered, as is a set of state failure evaluation criteria. The concept of Relative Political Capacity is an alternative and represents the ability of a state to carry out its policies. Measures of democracy, and measures of governance created by the World Bank, are also addressed.

14

●

Banaian, Podemska, and Roberts

Notes 1. Office of Public Affairs, United States Treasury Department, statement JS-80, March 4, 2003, online at http://www.treas.gov/press/releases/js80. htm. (September 24, 2007) 2. Brett Schaefer. Promoting Economic Prosperity through the Millennium Challenge Account. Heritage Lectures 920 (January 13, 2006): 13. 3. The World Bank. Doing Business 2007: How to Reform. Washington DC: World Bank, 2007. 4. The World Bank. Where Is the Wealth of Nations? Measuring Capital for the 21st Century. Washington: World Bank, 2006, p. 95. 5. Glen Whitman. “WHO’s Fooling Who?: The World Health Organization’s Problematic Ranking of World Health Systems.” Washington: Cato Institute Briefing Paper No. 101, February 28, 2008. 6. Daniel Kaufmann, Aart Kraay, and Pablo Zoido-Lobatón. “Governance Matters: From Measurement to Action.” Finance and Development 37 (2) (June 2000): 10–13. 7. The World Bank. A Decade of Measuring the Quality of Governance. Washington, DC: World Bank, 2007, p. 1. 8. We will use the term well-being without noting that the concept is inherently subjective. For more, see the article by Bryan Roberts later in this book. 9. Bernard M. S., van Praag, and Paul Frijters. “The Measurement of Welfare and Well-Being: The Leyden Approach.” In Well-Being: The Foundations of Hedonic Psychology, ed. Daniel Kahneman, Ed Diemer, and Norbert Schwarz, 413–433. New York: Russell Sage, 1999. 10. The answer to the question “how happy are you generally with your life on a scale of 1 to 10?” 11. Arnold Kling. “Data Molesters.” EconLog Weblog, http://econlog.econlib. org/archives/2007/03/data_molesters_1.html. (February 18, 2008)

References Banaian, K. and W. A. Luksetich. 2001. Central Bank Independence, Economic Freedom, and Inflation Rates. Economic Inquiry 39 (1): 149–161. Bardhan, P. 2005. Scarcity, Conflicts, and Cooperation. Essays in the Political and Institutional Economics of Development. Cambridge, MA: MIT Press. Fedderke, J. R. Klitgaard and Kamil Akramov. 2006. Heterogeneity Happens: How Rights Matter in Economic Development. Working Paper, University of Cape Town, South Africa. North, D. C. 2005. Understanding the Process of Economic Change. Princeton, NJ: Princeton University Press. Olson, M. 1982. Rise and Decline of Nations. Economic Growth, Stagflation, and Social Rigidities. New Haven: Yale University Press.

CHAPTER 2

So You Want to Use a Measure of Openness? H. Lane David

It is quite wrong to try founding a theory on observable magnitudes alone. . . . It is the theory which decides what we can observe. Albert Einstein

1 Introduction Trade policy is, and has been for a long time, an area of great contention in the public arena. There are many who believe strongly in the benefits of free trade, and there are many who believe with passion that trade policy must be used to protect domestic interests. Despite a large academic literature supporting the idea of a positive relationship between trade openness and economic growth1 and belief by many policymakers that “Openness to trade and more liberal trade policies are associated with faster rates of economic growth both in the United States and abroad,” 2 trade restrictions are pervasive even after numerous rounds of GATT (General Agreement on Tariffs and Trade) negotiations. We continue to hear frequent calls from the governments of the United States and other industrialized countries for greater trade liberalization and openness. Systems of trade policy are quite complex, involving many different types of instruments, and it is difficult to assess the trade stance of a given country or, perhaps more importantly, to assess the impacts of changes in a country’s trade policies. The focus of this chapter, however, is not on the effects of trade policy liberalization on economic growth but rather on how we measure trade openness and policy.

16

●

H. Lane David

Beyond a general understanding that “openness” refers to trade barriers, and how restrictive they may or may not be, there is not a clear definition of the term. There is, however, a great desire on the part of analysts, researchers, and policymakers for measures that can somehow aggregate the many restrictions placed on trade and be used to estimate the impact of restrictions as well as predict the effects of changes in those restrictions. As a result, a large number of measures of trade openness and policy have been created.3 Harrison (1996) observed over a decade ago that there existed “. . . a dizzying array of ‘openness’ measures, methodologies, and sample countries . . .”4 That observation holds true today given the continued “production” of new measures of trade openness and policy since then. It is, in fact, the case that identifying an appropriate measure from the many available for a given stream of research or policy proposals is a challenging exercise.5 The purpose of this chapter is to provide an overview of the variety of measures of trade openness and policy, highlighting brief ly the two most widely used (which I count among the worst), and then examine a newer set of measures, known as Trade Restrictiveness Indices (TRIs), that are theoretically sound, can be calculated (though they are data intensive 6 ), and produce index numbers with the potential for domestic as well as international policy prescriptions. 2 Classes of Measures In David (2005) I present a taxonomy in which measures of trade openness and policy are divided into categories and then review the strengths and weaknesses of each category. I divide the measures into six groups: 1. 2. 3. 4. 5. 6.

Trade ratios; Adjusted trade f lows; Price-based; Tariffs; Nontariff barriers (NTBs); Composite Indices.

It is important to note that the first three categories contain measures based on trade f lows or levels of prices while the latter three seek to assess trade restrictions directly, that is to say, the first three categories focus on outcomes while the last three focus on policies. Ideally, one would want to measure trade restrictions directly to determine the level

Want to Use a Measure of Openness?

●

17

of protection of a country. Unfortunately, in practice, it is easier to measure f lows and prices than barriers. The availability of data of trade volume and price data, however, is not a sufficient condition to create good measures of trade openness and policy. What are needed are measures that are economically meaningful, not simply statistics. The first group we look at, trade ratios, serves to illustrate this point. 2.1 Trade Ratios This category contains the most widely used measure of trade openness and policy, the simple and intuitively appealing trade ratios measures of openness, calculated as (Exports ⫹ Imports) / GDP. Commonly referred to as openness (a misnomer) it is also known as trade intensity. Variants of the overall trade ratio measure include import ratios (the ratio of imports, by sector or in aggregate, to GDP) and export ratios. The measure is popular because data are readily available for many countries and, as it is very commonly used, it allows for comparability across studies. Despite the overwhelming popularity of the simple trade shares measure, analysts and policymakers should be aware that this is a measure of country size (and integration into international markets) rather than trade policy orientation. A few examples serve to illustrate the point.7 First, the five least open countries are (in order) Japan, Argentina, Brazil, the United States, and India. Second, many of the comparisons across countries are not plausible. While it is not surprising that Singapore and Hong Kong are ranked as the most open countries— according to the trade ratios they are 11.2 and 9.7 times more open than the United States—what is surprising is comparisons involving less developed countries still struggling with economic (and social and political) development. Jamaica is almost 4 times more open than the United States and African nations such as Ghana and Congo that are 4.5 and 5.1 times more open. It is hard to claim that this measure produces satisfactory ordinal rankings much less the cardinal ranking implied by the precision of the numbers. What trade ratios do show is that small countries are relatively more engaged in international trade than large countries. 8 This is ref lective of smaller countries being constrained by the size of their domestic markets and needing to trade to achieve specialization and economies of scale.

18

●

H. Lane David

There are a number of other reasons to question the use of trade ratios. First, if we believe that there is a connection between trade policy and economic growth, then we need to have separate measures for each. Trade ratios mix the two concepts by construction; they should not be used to explain economic growth given that they themselves vary with income. Next, it is very important to note that trade ratios are completely lacking of theoretical foundations. It can be argued that the usefulness of some of the following measures discussed may depend upon the uses to which they are put, but atheoretical measures cannot be considered useful measures of trade policy or openness under any circumstances. Trade ratios are the most widely used measures of trade policy and openness in academic and policymaking work. For all of the previously mentioned reasons, I do not recommend the use of any trade ratios for this purpose. Analysts (and teachers of international trade theory) need bear in mind that their wide usage is not proof of their ability to capture a country’s trade policies or openness. 2.2 Adjusted Trade Flows Measures The Adjusted Trade Flows measures use deviations of actual trade f lows from predicted free trade flows (the counterfactual) to form measures of trade policy. The counterfactuals are assumed to represent what would have happened under different policy choices, that is, free trade policies. These measures are produced using factor proportion models and, increasingly, gravity models. These models have the important advantage that they have theoretical foundations (unlike the many measures whose creation is simply driven by data availability) to which users can turn for guidance in their usage.9 It is important to note that all such outcome measures are sensitive to the model chosen to construct the counterfactual.10 Choosing among these measures requires consideration of the models underlying the measures. Users must think carefully about what explanatory variables they consider important and what should be the form of the relationship between those variables. As with all measures, the ones in this category have disadvantages. The largest concern is that there is no way of assuring that the counterfactual accurately produces the volume of trade that would occur under free trade. Care must also be taken to separate out the effects of business cycles and large shocks to economies.11

Want to Use a Measure of Openness?

●

19

2.3 Price-Based Measures Price-based measures attempt to capture trade policy stance by seeking price distortions in either goods markets (by comparison with international prices) or currency markets (generally through the black market premium). Advocates of price-based measures claim that they capture the effects of both tariff and NTBs and that economic interpretation is easier than with the flow-based measures, as countries with high price levels over time would be seen as countries with a relatively high levels of protection. However, simply looking at levels of prices of tradable goods across countries may be misleading. Trade policies work by altering relative prices within an economy but the effects of trade policies on the level of prices in one country relative to another are not clear-cut. Countries can and do use multiple trade restrictions simultaneously on both imports and exports. This holds true for industrialized nations as well as developing countries. It is also, at best, dubious that the law of one price holds continuously.12 The most popular currency price measure is the black market premium, which is measured as the deviation of the black market exchange rate from the official exchange rate. The black market premium is not a direct measure of trade openness but rather measures the extent of rationing in the market for foreign currency. The argument for using the black market premium as a measure of trade openness is that foreign exchange restrictions act as a barrier to trade. However, it is not a good measure of trade policy stance, as it most likely reflects a wide range of policy failures (poor macroeconomic policy, weak government, lack of rule of law, and corruption) as well macroeconomic and political crises. The black market premium is thus serving as a proxy for many variables that are unrelated to trade policy. Crises and policy failures, rather than the trade-restricting effects of the black market premium, would be the reasons for low growth in this case. 2.4 Tariffs Measures Tariffs are highly visible restrictions of trade and are viewed as the most direct indicators of restrictions. They are also popular because data is available. A number of different measures of tariffs have been used by trade economists: simple tariff averages, trade-weighted tariff averages (which are the most widely used), and revenue from duties as a percentage of total trade.

20

●

H. Lane David

However, these measures are plagued by a number of problems. First, and foremost, none of the tariff measures mentioned above has a theoretical foundation. They are simply statistical measures. If we are interested in the welfare cost of trade policy then trade-weighted average tariffs are of little or no use.13 Second, gathering the data required for calculating weighted tariffs is daunting. Countries do not report their weighted average tariff rate or even their simple average tariff rate every year, so the most recent data may be several years old. The data for both tariff and nontariff indicators (discussed in the next section) are measured with error due to weaknesses in the underlying data (issues of both collection and coding) and there are frequently problems with missing data. Third, problems also arise when aggregating tariff data into a tariff index measure. As is well known in the trade literature, weighted averages will tend to understate the impacts of tariffs14 while the opposite occurs with unweighted averages. While these are the most direct measures of trade restriction available, I caution against relying solely on them. As noted, the measures are far from perfect and, in addition, other policy actions are important in determining the pattern of trade. Tariff-based measures might work well in combination with other measures, such as NTBs, and that will be the focus of the section on TRIs. 2.5 NTB Measures As tariff levels have declined (on manufactured goods in particular) NTBs have become increasingly important.15 NTBs are policies other than tariffs that alter, directly or indirectly, the prices and/or quantities of traded goods and services. Official (i.e., mandated or legislated by the government) NTBs come in a wide variety of forms: import quotas, voluntary export restraints, government procurement and domestic content provisions, restrictions on services trade, trade-related investment measures, administrative classification, health and labeling requirements, and so on. They also come in forms that do not appear to be “barriers” to trade but rather serve to stimulate trade (at least from a domestic viewpoint) such as export subsidies. Not all forms of NTBs are “official” barriers. Market structures vary across countries and national governments differ in how much they promote competition. In some cases there is extensive government involvement in industry, often allowing extensive collusion among firms or creating government monopolies. This is viewed internally as domestic

Want to Use a Measure of Openness?

●

21

policy though it has implications for the international trade policy of the country. Another source of barriers are the cultural, social, or even political institutions operating within a country. For example, countries such as Japan and Korea allow intricate relationships among firms (the keiretsu and chaebol, respectively) across industries and these in a very real sense create trade policy on their own. The effects of these forms of trade protection are more difficult to assess than the official NTBs. NTBs by themselves are poor indicators of trade restrictions both because broad coverage by NTBs does not necessarily mean a higher distortion level and (as always) there are difficulties of estimation because of data limitations. Thus these measures are often excluded from empirical studies.16 2.6 Composite Indices Composite indices strive to combine different aspects of trade policy and openness into a single index number. The category contains measures that can be based on both objective and subjective evaluations of trade barriers, structural characteristics, and institutional arrangements. As noted by Baldwin (2002, p. 27) high barriers to trade are very frequently found in conjunction with poor macroeconomic policies, corruption, and unstable governments. In recognition of this a number of indices that combine various indicators (such as macroeconomic, exchange rate and educational indicators in addition to trade openness and policy indicators) into a single index have been developed. The most well known is the Sachs and Warner (1995) measure of openness. Quite widely used in cross-national research on economic growth the openness indicator they construct is a binary dummy variable, which has the value 0 (indicating the economy is closed, and a value of 1 otherwise) if any one of the following conditions is met: • • • • •

the country had average tariff rates higher than 40%; the country’s NTBs covered more than 40% of imports; the country had a socialist economic system; the country had a state monopoly for major exports; or the country’s black market premium exceeded 20% during either the 1970s or the 1980s.

There are a number of grounds on which this measure can be criticized. An obvious weakness is the binary nature of the measure. If one assumes that the effects of trade liberalization do not occur

22

●

H. Lane David

instantaneously, but rather accrue over time, then a continuous measure would better reflect the process. Another problem is that the choice of numerical limits, as with the tariff, nontariff, and black market premium components of the measure, is arbitrary. The lack of any theoretical justification for the levels chosen casts doubt on the results from using such a measure and may encourage data-mining in order to find the “best fit” in measures of this type. Furthermore, it must be borne in mind that for a given level of tariffs, NTBs, or foreign exchange market manipulation we would not expect to find the same effects across countries. The Sachs-Warner measure of trade openness and policy is the second most widely used measure after trade ratios. I again note that wide usage of a measure in no way establishes it as a good measure and I strongly argue against the usage of this measure in empirical or policy work.17 However, the Sachs-Warner measure does represent an important contribution in that it introduces the idea of combining tariff and nontariff measures into an index measure. This is appealing as it utilizes two direct measures of policy rather than outcomes. In the next section we see how this concept has been developed in the work of others. 3 Trade Restrictiveness Indices We have seen in the previous sections that there has been little agreement on how to aggregate the restrictions imposed by different policies. Clearly, what are needed are measures that are directly based on policies, such as measures of tariffs. Surprisingly, until recently this avenue had been little pursued. Despite the fact that tariffs are the most direct indicators of trade policy, they have often been viewed as poor indicators of trade policy because of the shortcomings previously discussed. This is also in part because tariff protection affects producers and consumers differently, making it difficult to assess the impacts of tariffs. Given that it is also quite reasonable to assume that import elasticities vary across products and countries, tariffs of a given magnitude may have different effects both for differentiated products18 in a single country and for the same product across countries. In addition, as is invariably the case when dealing with most longitudinal international data, the data is far from complete: collecting comprehensive tariff data on all product categories is a large undertaking not often carried out, much less with consistent methodology over countries and time. Reasonably good data for industrialized countries reaches back 50–60 years, for developing countries half of that if the researcher is fortunate.

Want to Use a Measure of Openness?

●

23

And, finally, the typical trade regime in developing countries also restricts imports with NTBs as well. Attempting to measure and aggregate NTBs presents an even greater challenge, given the many different types and their lack of transparency. Nonetheless, given that measures of tariff and NTBs are measures that directly capture the effects of trade policies, it seems a straightforward proposition that composite indices can be created using them.19 As it turns out, research along these lines has been underway since the 1990s. Three different streams of research have developed similar measures that are known as TRIs. These are reviewed in this section. 3.1 International Monetary Fund The simplest of these measures was introduced by the International Monetary Fund (IMF) in 1997. The IMF TRI is used by the IMF as an internal tool for classifying the relative restrictiveness of trade regimes as well as for a measure of the increase in trade liberalization during IMF programs. The TRI is calculated at the beginning of a program and then again after any trade-related program measures have been implemented. It consists of three components: the Tariff Restrictiveness Rating, the Nontariff Restrictiveness Rating, and the Overall Trade Restrictiveness Index. Countries are rated on a scale of 1–5 based on their simple unweighted average tariff rates (where 1 reflects the lowest average rate and 5 the highest) and on a scale of 1–3 based on the incidence of NTBs. The overall restrictiveness rating is a 10-point scale that places a greater emphasis on the nontariff rating, as NTBs are less visible but potentially more distortionary than tariffs. The tariff component is based on the simple unweighted average of a country’s MFN applied tariff rates, plus any additional surcharges or fees that apply only to imports. The scale is as follows: (1) the simple average tariff rate plus surcharges is less than 10%; (2) the tariff plus surcharges is equal to or greater than 10% but less than 15%; (3) the tariff plus surcharges is equal to or greater than 15% but less than 20%; (4) the tariff plus surcharges is equal to or greater than 20% but less than 25%; and (5) the tariff plus surcharges is greater than 25%. The choice of break points was somewhat arbitrary (the scale was originally derived to have broadly equal numbers of countries in each category) and this is a weakness in the construction of the scale. There are a number of other reasons to be concerned with the tariff portion of the TRI calculation. The use of unweighted average tariffs is troubling as it assumes that the set of weights on the individual tariffs

24

●

H. Lane David

are equal, thus potentially leading the index to overstate the impact of tariffs. However, it is quite difficult to calculate weights based on actual trade. Thus, the IMF chooses to use the simpler unweighted average tariff methodology on the grounds that it ensures the broadest country coverage. Another reason for concern is that the tariff component does not incorporate the effects of preferential trade agreements. The nontariff rating uses a 3-point scale. A country was given a rating of 1 if 1 NTBs were absent or minor (less than 1% of production or trade was subject to NTBs), a rating of 2 if NTBs were judged significant (they were applied to at least one important sector of the economy and between 1 and 25% of production or trade was affected by NTBs), and given a rating of 3 if many sectors, or entire stages of production were covered by NTBs (more than 25% of production or trade is affected). An obvious weakness of the NTB component is that a 3-point scale does not allow for differentiation between countries based on the types or numbers of NTBs employed. Thus, the measure cannot capture the intensity of NTB usage. Countries that just barely fall into the 2 rating will be indistinguishable from those that almost fall into the most restrictive rating. Unfortunately, although the IMF calculates the TRI on an annual basis for 178 countries, it does not encourage cross-country comparisons and it neither promotes nor publishes the index. Thus, it is not possible for most researchers to make comparisons between the IMF TRI and other similar indices. Access to the index and/or the underlying data could be of use to analysts and policymakers and I hope that in the future the IMF will begin to release this information. 3.2 Anderson and Neary Two alternative trade restrictiveness measures were developed by Anderson and Neary (hereafter referred to as AN) in a series of papers (1992, 1994, 1996, 2007, and 2004; Anderson 1998), and a book (2005). Perhaps the greatest contribution of their measures is that these are the first measures of trade openness and policy that aggregate specific tariff and NTBs and that are founded upon economic theory, not data availability. The first index AN call their TRI. In it they calculate the uniform tariff equivalent (UTE) that would keep welfare constant if it replaced existing differing tariffs in the given country. 20 They advocate the use of their TRI for studies of openness and growth where it is desirable to have a measure of trade policy restrictiveness that is independent of real income in order to examine the welfare effects of protection.

Want to Use a Measure of Openness?

●

25

The second AN measure is called the Mercantilist Index of Trade Policy (MTRI) and it calculates the UTE that would maintain the same volume of trade as the given set of tariffs and NTBs. 21 The MTRI, because it holds trade volumes constant, uses trade as a reference rather than welfare, as in the AN TRI. The MTRI is thus a more appropriate measure than the TRI for purposes of trade negotiations, where a country’s focus is on the benefits to itself and not its trading partners. It is also easier to implement than the AN TRI. The AN measures are calculated using a Computable General Equilibrium (CGE) approach. CGE models run simulations that combine an algebraic general equilibrium structure with economic data to solve numerically for the levels of supply, demand, and price that support equilibrium across a specified set of markets. CGE models are widely used to analyze the aggregate welfare and distributional impacts of policies whose effects may be transmitted through multiple markets, because of changes in taxes, subsidies, quotas. The CGE model that AN use is available at http://fmwww.bc.edu/EC-V/Anderson.fac.html. To focus on the distortionary effects of trade restrictions they employ a model that has a very simple production structure. 22 Their model uses data on trade flows, tariffs, and NTBs for a detailed set of imports, much of which is available from the World Bank’s Trade and Production Database. 23 They use about 1,200 4-digit HS code categories for 25 countries over a 2-year period. 24 The import data are split into intermediate and final imports, with each accounting for roughly half of the product lines. The MTRI tracks the average weighted tariff of most countries quite closely, while the TRI is consistently higher. This is because the MTRI includes the effects of real income changes on trade volumes while the TRI does not. The only time the two measures are the same is in cases where tariffs are uniform and then the indices are equal. As one might expect, the correlations between the measures are quite high. As noted, AN calculate the TRI and MTRI for a fairly small number of countries. Again, data is the problem. Data requirements to calculate their indices are frequently beyond what developing countries are able to provide. 3.3 Kee, Nicita, and Olarreaga The most recent TRIs are from Kee, Nicita, and Olarreaga (hereafter referred to as KNO) (2008a, 2008b) at the World Bank. Their goal is also to capture the combined effects of tariff and NTBs and, along the

26

●

H. Lane David

lines of AN, they compute ad valorem equivalents (AVEs) for NTBs that are comparable to tariffs. Rather than using a CGE model as do AN, however, they use a factor endowments (comparative advantage) approach that follows the work of Leamer (1990) and Feenstra (1995). They use the model to predict imports, observe the deviations that presumably owe to the presence of NTBs, and then convert the quantity impact on imports into price effects by using import demand elasticities estimated in earlier work (KNO 2008b). Of particular note is that this model allows them to capture effects that vary by country and good. KNO (2008b) tested a number of hypotheses about import demand elasticities across countries and products. Among their findings were (1) homogenous goods were found to be more elastic than heterogenous goods; (2) aggregate industry level import demand is less elastic than when demand is calculated at the tariff level; (3) large countries tend to have more elastic import demands; and (4) more developed countries tend to have less elastic import demands. KNO (2008a) define three TRIs. The first, called TRI, corresponds to the AN TRI in that it estimates the equivalent uniform tariff that keeps real income, or welfare, constant. Their second index, the Overall Trade Restrictiveness Index (OTRI), parallels the MTRI of AN in determining the equivalent uniform tariff needed to keep imports at their observed levels. As with the AN TRI, the KNO TRI has the advantage over the IMF TRI that it does not use tariff averages to calculate the impacts of trade barriers. They also add a new index designed to capture the barriers faced by a country’s exporters, the Market Access OTRI (MA-OTRI). The MA-OTRI measures the trade restrictiveness that countries impose on each other using bilateral OTRI calculations (the MA-OTRI faced by one country for its exports is equal to the OTRI the other country placed upon itself for its imports from the first country). KNO calculated TRIs for 91 countries. 25 The simple and tradeweighted core NTB frequency ratios move together closely (correlation coefficient of 0.9187). Combined with the large sample size as well as the AN results, this lends support to the idea that TRIs can be used to capture both the effects of NTBs and tariffs. However, data constraints hamper the construction of KNO TRIs, in particular the MA-OTRI, which depends on UNCTAD’s NTB data for its construction. 3.4 Some Additional Considerations TRIs, in common with all measures that aggregate data, are subject to bias stemming from aggregation. A loss of explanatory power can result

Want to Use a Measure of Openness?

●

27

due to the aggregation of data, such as individual tariff lines into industry or sectoral averages. For example, AN consider a given line of trade to be restricted by NTBs if at least 75% of the underlying categories were subject to NTBs. Given this high requirement, there may be an understating of the level and effects of NTBs and this could alter the trade restrictiveness ranking of countries. 26 The problem is compounded by the fact that the tariff data used is based on upper bound duty rates. Given that actual rates are frequently lower (due to, say, most favored nation treatment or membership in a preferential trade agreement) there is also a bias toward overstating the effects of tariff barriers. One might be tempted to argue that the two effects could cancel each other out, but we have no way of knowing without less aggregated data. One weakness of the AN trade restrictiveness indices has to do with the CGE model used. In the model each country is identical in structure, differing only in the data and parameter values used. O’Rourke (1997) pointed out that UTEs depend on model structure and that care should be shown in choosing a model structure for a given country. He suggests that estimates of UTEs derived from using an identical model for a number of countries may not produce useful rankings for crosscountry comparisons of trade restrictiveness. Tokarick (2006) raises an important point concerning the intertemporal behavior of TRIs. Both the IMF and AN calculate TRIs for two points in time and use the results as indicators of whether or not a given country has become more open to international trade. Their models do not, however, take into account structural changes that may have taken place in the economy over the relevant period. An important example of this is factor accumulation. Depending on the pattern of factor accumulation a country’s TRI could potentially rise or fall, even if tariff rates and NTBs remain unchanged. In the case of balanced factor growth, the TRI will remain the same. However, in the case where the factor accumulation is sufficiently biased toward a protected sector, a reduction in overall welfare could result. Finally—and this is a complaint common to all empirical international trade research—better data is needed. Consider, for example, the case of the KNO TRI. The tariff data is from 2000 to 2004, while the NTB data covers the period 1992–2001. Country-by-country examination of the year in which the tariff data was collected and the year in which the NTB data was collected reveals that, for a sample of 103 countries, the 2 years match in only one case. The mean difference between the two dates is 4.3 years (median: 4), with a maximum difference of 12 years. The tariff data is more recent in all the cases but two. This mismatch is of concern. Given that the use of NTBs is

28

●

H. Lane David

increasing, TRI measures may suffer from increasingly understating trade restrictiveness. 4 Conclusions Much work remains to be done in the area of measuring barriers to trade. There is no consensus as to what is (are) the “best” measure(s) of trade openness and policy. Many of the existing measures lack theoretical foundations, are poorly specified, or are simply ad hoc measures driven by data availability. The majority of the measures available focus on outcomes, while what researchers really need are direct summary measures of policies and barriers to trade. Most serious, the most popular measures are among the worst measures a researcher could use. All said, readers should not give up hope. The trade restrictiveness indices reviewed in this chapter represent an important improvement in measuring trade openness and policy. Combining tariff and NTB data, TRIs use direct trade indicators and have solid theoretical foundations. For the future, the continued development of such composite indices holds the most promise in the quest to capture the multifaceted nature of trade openness in a single index. The reader is nonetheless cautioned that the words of Leamer (1988) are as applicable today as they were two decades ago: The question is not whether a particular method produces perfect measures of openness, since none will. The real question is which method seems likely to produce the best measures.

Notes 1. See, for example, Edwards (1992), Krueger (1997), Wacziarg and Horn Welch (2003). 2. USITC (2003). 3. For example, Leamer (1988), Dollar (1992), and Sachs and Warner (1995). 4. Page 420. 5. Unless one decides to choose a measure based on the number of times it has been cited, which is a mistake as will be demonstrated. 6. Limited, of course, as is all work in international trade policy, by data availability and quality. 7. Using 2000 data (in current prices) from Heston, Summers, and Aten (2002). 8. For a good exposition and a model of this idea see Alesina and Spolaore (2003), Chapter 6, pp. 81–94.

Want to Use a Measure of Openness?

●

29

9. For the factor proportions models a useful overview is contained in Appleyard and Field (2001, pp. 118–139), while for the gravity model important works include Anderson (1979), Deardorff (1998), and Anderson and van Wincoop (2003). 10. Pritchett (1996, p. 312). 11. Hiscox and Kastner (2002) note: “A key problem here is that we cannot distinguish between the effects of changes in trade policies and other changes, specific to particular importing countries in particular years, that also affect trade f lows and are not accounted for in the model.” 12. Substantial evidence indicates that deviations from the law of one price are frequent and that these deviations die out only in the very long run (Rogoff 1996). 13. Anderson (1992) shows that, under restrictive conditions, a mean preserving tariff reduction is efficient if it reduces the tariff ’s variance and that an average tariff reduction with constant variance is efficiency improving. 14. Imports of items that have high tariffs are likely to be small and thus given a small weighting and prohibitively high tariffs will be given no weight at all. 15. Bhagwati (1988) and Edwards (1992). 16. It is worth noting that the exchange rate regime of a country can serve as an NTB. An overvalued currency works against exports, as it has the same effect as an export tax, which in the case of developing countries frequently implies both industrial protection and a bias against agriculture. 17. A number of papers have examined the Sachs-Warner measure in detail. See Rodriquez and Rodrik (2001) for a useful critique of this measure, as well as a number of others. 18. For example with imported beers, wines, cheeses, and so on. 19. Rodriquez and Rodrik (2001) recommend the use of tariffs and NTBs in their wide-ranging critique of openness indicators. 20. More formally, this measure defined as a scaling factor that applied to a country’s imports would have the same effect on real income as the country’s existing differentiated structure of tariffs. 21. Defined as quotas, domestic taxes, and subsidies. 22. They have only two goods produced: an exportable good that is not consumed domestically and a nontraded good. Production is assumed to be a constant elasticity of transformation (CET) process while consumer’s tastes are represented by constant elasticity of substitution (CES) expenditure functions. 23. Available at http://www1.worldbank.org/wbiep/trade/tradeandproduction. html. 24. The countries (years in parentheses) are Argentina (1992), Australia (1988), Austria (1988), Bolivia (1991), Brazil (1989), Canada (1990), Columbia (1991), Ecuador (1991), Finland (1988), Hungary (1991), India (1991), Indonesia (1989), Malaysia (1988), Mexico (1989), Morocco (1984), New Zealand (1988), Norway (1988), Paraguay (1990), Peru (1991), Philippines

30

●

H. Lane David

(1991), Poland (1989), Thailand (1988), Tunisia (1991), United States (1990), and Venezuela (1991). 25. The countries are Albania, Argentina, Australia, Burkina Faso, Bangladesh, Bahrain, Belarus, Bolivia, Brazil, Brunei, Bhutan, Central African Rep., Canada, Switzerland, Chile, China, Cote D’Ivoire, Cameroon, Columbia, Costa Rica, Czech Rep., Algeria, Ecuador, Egypt, Estonia, Ethiopia, European Union, Gabon, Ghana, Equatorial Guinea, Guatemala, Hong Kong, Honduras, Hungary, Indonesia, India, Iceland, Jordan, Japan, Kazakhstan, Kenya, Kyrgyzstan, Lao People’s DR, Lebanon, Sri Lanka, Lithuania, Latvia, Morocco, Moldova, Madagascar, Mexico, Mali, Mozambique, Mauritius, Malawi, Malaysia, Nigeria, Nicaragua, Norway, Nepal, New Zealand, Oman, Pakistan, Peru, Philippines, Papua New Guinea, Poland, Paraguay, Romania, Russia, Rwanda, Saudi Arabia, Sudan, Senegal, El Salvador, Slovenia, Tchad, Thailand, Trinidad and Tobago, Tunisia, Turkey, Tanzania, Uganda, Ukraine, Uruguay, United States, Venezuela, Vietnam, South Africa, Zambia, Zimbabwe. 26. AN consider this conservative estimate of NTBs preferable to the possibility of overstating their presence and effects.

References Alesina, A. and E. Spolaore. 2003. The Size of Nations. Cambridge, MA: MIT Press. Anderson, J. E. 1979. A Theoretical Foundation for the Gravity Equation. American Economic Review 69 (1): 106–116. ———. 1998. Trade Restrictiveness Benchmarks. The Economic Journal 108 (449): 1111–1125. Anderson, J. E. and J. P. Neary. 1992. Trade Reform with Quotas, Partial Rent Retention, and Tariffs. Econometrica. Econometric Society 60 (1): 57–76. ———. 1994. Measuring the Restrictiveness of Trade Policy. The World Bank Economic Review 8 (2): 151–169. ———. 1996. A New Approach to Evaluating Trade Policy. The Review of Economic Studies 63 (1): 107–125. ———. 2005. Measuring the Restrictiveness of International Trade Policy. Cambridge, MA: MIT Press. ———. 2007. Welfare versus Market Access: The Implications of Tariff Structure for Tariff Reform. Journal of International Economics 71 (1): 187–205. Anderson, J. E. and E. van Wincoop. 2003. Gravity with Gravitas: A Solution to the Border Puzzle. American Economic Review 93 (1): 170–192. Appleyard, D. and A. Field. 2001. International Economics, 4th ed. New York: McGraw-Hill. Baldwin, R. 2002. Openness and Growth: What’s the Empirical Relationship? In Challenges to Globalization: Analyzing the Economics, ed. R. E. Baldwin and L. A. Winters. New York: National Bureau of Economic Research Conference Report.

Want to Use a Measure of Openness?

●

31

David, H. L. 2005. A Guide to Measures of Trade Openness and Policy. Working Paper. Claremont, CA: Department of Economics, Claremont Graduate University. Deardorff, A. V. 1998. Determinants of Bilateral Trade: Does Gravity Work in a Neoclassical World? In The Regionalization of the World Economy, ed. J. A. Frankel. Chicago: University of Chicago Press. Dollar, D. 1992. Outward-Oriented Developing Economies Really Do Grow More Papidly: Evidence from 95 LDCs, 1976–1985. Economic Development and Cultural Change 40 (3): 523–544. Edwards, S. 1992. Trade Orientation, Distortions and Growth in Developing Countries. Journal of Development Economics 39 (1): 31–57. Harrison, A. 1996. Openness and Growth: A Time-Series, Cross-Country Analysis for Developing Countries. Journal of Development Economics 48: 419–447. Heston, A., R. Summers, and B. Aten. 2002. Penn World Table Version 6.1. Center for International Comparisons at the University of Pennsylvania (CICUP), October. Hiscox, M. J. and S. L. Kastner. 2002. A General Measure of Trade Policy Orientations: Gravity-Model-Based Estimates for 82 Nations, 1960–1992. Working Paper. Kee, H. L., Nicita A., and Olarreaga A. 2008a. Estimating Trade Restrictiveness Indices. Economic Journal (forthcoming). ———. 2008b. Import Demand Elasticities and Trade Distortions. Review of Economics and Statistics (forthcoming). Krueger, A. O. 1997. Trade Policy and Economic development: How We Learn. American Economic Review 87 (1): 1–22. Leamer, E. E. 1988. Measures of Openness. In Trade Policy Issues and Empirical Analysis, ed. R. Baldwin, 147–200. Chicago: University of Chicago Press. ———. 1990. The Structure and Effects of Tariff and Nontariff Barriers in 1983. In The Political Economy of International Trade: Essays in Honor of Robert E. Baldwin, ed. R. W. Jones and A. Krueger, 224–260. Cambridge: Basil Blackwell. Pritchett, L. 1996. Measuring Outward Orientation in Developing Countries: Can It Be Done? Journal of Development Economics 49 (2): 307–335. Rodriguez, F. and D. Rodrik. 2001. Trade Policy and Economic Growth: A Sceptic’s Guide to the Cross-National Evidence. In Macroeconomics Annual 2000, ed. B. Bernanke and K. S. Rogoff. Cambridge, MA: MIT Press for NBER. Rogoff, K. 1996. The Purchasing Power Parity Puzzle. Journal of Economic Literature 34 (2): 647–668. Rose, A. K. 2004. Do WTO Members Have More Liberal Trade Policy? Journal of International Economics 63 (2): 209–235. Sachs, J. and A. Warner. 1995. Economic Convergence and Economic Policies. Brookings Papers on Economic Activity 1: 1–95.

32

●

H. Lane David

Tokarick, S. 2006. Book Review: Measuring the Restrictiveness of International Trade Policy. World Trade Review 5 (October): 495–498. Wacziarg, R. and K. H. Welch. 2003. Trade Liberalization and Growth: New Evidence. NBER Working Papers 10152. World Bank. 1987. World Development Report 1987. New York: Oxford University Press.

CHAPTER 3

Measuring Central Bank Independence: Ordering, Ranking, or Scoring? King Banaian

C

entral Bank Independence as an area for international comparison and for study by international political economists has been around for approximately two decades, spurred on by the work of Bade and Parkin (1982). It probably reached its full fruition with the work of Cukierman and others, centering on work done at the World Bank. There are others too, and we should not ignore them, but since the mid-1990s most of the work done has centered on the Cukierman-type model. Interest in the CBI intensified after models of monetary policy found the likelihood of an inflationary bias in monetary policy operated by democratic governments. That analysis turned on the potential for monetary surprises being perpetrated by governments seeking electoral advantage. Later analysis found that if such incentives were fully anticipated by the public, inflation rates in democracies are higher than they would be if somehow government could make a credible commitment to price stability. The search began for how to establish monetary institutions that can be viewed as credible commitments. Delegation of monetary policy to an independent central bank was one strand of that exploration. It is also believed that independent central banks would reduce the scope of monetization of government budget deficits and thereby put downward pressures on deficits. Cukierman, Webb, and Neyapti (1992) argued, Economists and practitioners in the area of monetary policy generally believe that the degree of independence of the central bank from other

34

●

King Banaian

parts of government affects the rates of expansion of money and credit and, through them, important macroeconomic variables, such as inf lation and the size of the budget deficit.

Over time, views of CBI have evolved as our own understanding of institutions has. Central bank structures are chosen in a political system that ref lects the nature of the polity. Forder (1998) points out, for example, that statutory CBI only matters if the law conditions behavior. Posen (1993) argues that without a political coalition that wishes to have monetary stability, legal independence would not be granted. Banaian and Luksetich (2001) show that countries with more economic freedom (particularly those with greater security of private property rights) tend to choose central bank structures with greater independence.1 Endogeneity issues are only one of the many discussions surrounding the measurement of CBI. Political economists have sought measures of these institutional arrangements, and while some researchers have used measures such as the turnover of central bank governors or survey data, legal independence measures continue to dominate the research agenda. These measures tend to focus on relatively large sets of central bank attributes rather than deciding which ones are more important. In this chapter, I first examine what measures are used. My argument is that in the search for a measure that can embrace the many possible dimensions of independence we have lost sight of the reason to measure CBI. Along the way, we have made decisions regarding the scales on which we measure institutional arrangements that are arbitrary and atheoretic. An absence of theory also surrounds the decisions of averaging. Some measures use simple arithmetic averages while others place weights in ways that are difficult to justify by monetary theory. In the second half of the chapter, I appeal to theory in order to justify using a classification scheme that is lexicographic and simplified. Rather than placing central banks on a scale, I suggest placing them in broad categories; if a researcher were to choose to use an index number that was to be meaningful, one has to choose a ranking a priori of which central bank attributes mattered most. It may be that some matter more for inf lation control, while others matter more for longrun economic growth (by reducing uncertainty over monetary policy) or for budget deficit control. My point is not to argue for a particular new ranking, though I will offer one. It is that the researcher cannot avoid deciding what counts, and why, by using a one-size-fits-all measure of CBI.

Measuring Central Bank Independence

●

35

1 Early Measures: Ranking Central Banks Early iterations of the CBI measure centered on two legal characteristics—the appointment process for the central bank’s board, and whether the central bank maintained autonomy for monetary policy or the government held a veto. Bade and Parkin (1982) created eight classifications by marking three binary choices: 1. Who has final authority for monetary policy? 2. Are a majority of the members of the central bank board appointed independently of the government? 3. Is there a government official on the central bank board (whether she or he is a voting member)? All of these are references in one way or another to central bank autonomy. As Akhtar (1995) notes, there is no reference to the goals of the central bank, no reference to price stability. Bade and Parkin then asserted from this a rank ordering of which structures were more independent than others by adding up how many of the three choices favored the central bank’s independence. Eijffinger and De Haan (1996) show that this is relatively accidental: Only four of the eight possible classifications appeared in the industrialized economies: some had none of the autonomy measures; some had only the absence of a government official on the bank board; some had the absence of a government official and final authority for monetary policy; and two had all three desirable qualities (the Bundesbank and the Swiss National Bank [SNB]). They thus gave the banks scores of 1–4. Others tried to include additional criteria. Alesina (1988) adds a fiscal dimension to this list: are central banks required to purchase Treasury bills?2 This is the beginning of consideration of the concept of fiscal dominance, or that governments can force even the most autonomous central bank to issue base money if it must act as a backstop for debt issuance. This would mean that such central banks may not have operational independence. A lack of operational independence also accords to central banks that are chartered to guard for “financial stability.” As the problems stemming from the subprime mortgage crisis and resulting credit crunch made clear, the Federal Reserve has less than complete operational independence.3 The Bade-Parkin, Alesina, and Eijffinger-Schaling indexes are in fact not cardinal in any way.4 They describe four or five separate central bank types, as described by these three attributes. All three indexes have some agreement over the order of the bank types toward independence. Such ordering is

36

●

King Banaian

lexicographic: Central banks that have final authority over monetary policy are always ranked ahead of those where the government has final authority (or that authority is shared, in the Eijffinger-Schaling index) regardless of the appointment process. 2 Cardinal Measurements: Cukierman and GMT Most CBI indices that researchers use these days involve either some point count of various institutional features or some scale determined by a reading of the experts. The Grilli, Masciandaro, and Tabellini (1991) index would be an example of this type. Their point count type index usually uses a yes/no choice for some institutional feature. For example, is the chair of the central bank appointed by the country’s chief executive? Does the government have a direct representative like a finance minister on the central bank board? Does the central bank’s constitution specify price stability as the sole objective of central bank policy? The number of “yes” answers are summed to construct the index. Sometimes these get combined with the second, more judgmental index type to get a blended measure, as in Alesina and Summers (1993). The Cukierman, Webb, and Neyapti (CWN) index is quite different from these earlier versions. It is additive of various features, as does the Grilli, Masciandaro, and Tabellini (GMT) index. But the CWN index places a much richer set of possible institutional arrangements along a variety of scales. Some of them will be 2- or 3-point scales, others as high as a 7-point scale. They are then added, sometimes in an unweighted average and other times in a weighted average (called LVAU and LVAW in their paper). For example, the conf lict resolution variable included in Cukierman is a 6-point scale as follows: 1. CB given final authority over issues clearly defined in the law as CB objectives. 2. Government has final authority only over policy issues that have not been clearly defined as CB goals in the case of conf lict with CB. 3. In case of conf lict, final decision is up to a council whose members are from CB, legislative branch, and executive branch. 4. Legislative branch has final authority on policy issues. 5. Executive has final authority on policy issues, but subject to due process and possible protest by CB. 6. Executive branch has unconditional authority over policy.

Measuring Central Bank Independence

●

37

These get marked as 1, 0.8, 0.6, . . . 0. The authors then take each of these measures and collect a set of subaverages, and then average the subaverages for either a weighted or an unweighted number lying between 0 and 1 that is considered a measure of legal central bank independence. In table 3.1 I have arrayed the various components of the indexes, and shown the weights applied to each. An advantage of the GMT measurements is that the measure is an unweighted summation (though as discussed below, it assumes all values are equivalent in contributing to independence, without complementarities). When broken down, the CWN measure has an arbitrary set of weights. Another issue with these broader measures is the need for a broader set of judgments. In addition, some central bank laws are silent on some measures. For example, few of the 34 central bank laws offered Cukierman, Miller, and Neyapti (2002) enough information to measure all 16 instruments. In this case, the measure averages up the values into the 4 subcategories and then averages the subaverages in the same way as if they had all 16 measurements of legal independence. Further, it is quite difficult to imagine how central banks in transition economies could avoid some participation in the government debt markets. There are few countries with financial markets active enough to permit full private purchase of government debt. In Ukraine, for instance, few banks have the ability to hold any significant portion of the government’s debt. The debt “market” is simply the central bank wire, the closed network of computers that connect commercial banks with the National Bank of Ukraine (NBU). The auction of treasury bonds is conducted by the NBU in conjunction with the Ministry of Finance. At some points, the NBU has acted as “buyer of last resort” in the government debt market because there were no bids available at any interest rate.5 Since that debt is dominant as well in the central bank’s portfolio (with the exception of the Baltic states with their currency boards), there may be little choice for the central bank legislation than to allow some participation in the debt market. 3 The Linear Scale and Averaging The use of linear scales and averaging to create a single number presents two issues in measurement. First, the linear scale introduces the notion that the gap between each type of institutional arrangement within a certain measure, such as term of office, has an equal effect on independence or on inflation fighting.

38

●

King Banaian

So, for example, the conf lict resolution variable in CWN implies that every step along the path from institutional arrangement 1 to institutional arrangement 6 has the same effect, for example, on reducing inflation or on reducing budget deficits. There’s no reason to believe that it is true. Banaian and Luksetich (2001) show that only those central banks with the most independent of these six structures have had better inf lation performance. This is a very basic insight of econometric analysis. When using categories such as those in the conf lict resolution variable above, one can agree to the ordering without agreeing what the distances between them are. But that is exactly what the CWN and CMN later do. The number created by GMT says that two central banks are “equal” in, say, political independence if each of them has six of the eight characteristics of politically independent central banks. It does not matter which six. And then one is tempted to place those numbers in a regression and derive a slope, or a partial derivative. I argue the measures are not to be used in that way. The researcher can measure the difference in means between inf lation rates of countries with central banks of different types and gain insight, but the regression coefficient does not reveal anything more meaningful.6 As noted earlier, the CWN, CMN, and GMT measures also average or add up a set of institutional values. GMT is always an unweighted average as shown in table 3.1. The other two measures though, because they average subgroups and then average the subaverages imply a set of weights. The weights are quite arbitrary. Principal components give much different weights, award most of the weight on three variables (see Banaian, Burdekin, and Willett 1998). An example will illustrate this better. CMN extended the index to 34 countries in transition from planned to market economies. Their data set gave multiple indicators for eight countries that have changed central bank laws since twice since transition began. Of those, five have changed toward giving complete final authority over meeting goals stated in the central bank law as the central bank’s objectives (Armenia, Kazakhstan, Lithuania, Poland, and Uzbekistan). The Central Bank of Mongolia already had that power in its earlier law. Those countries followed the lead of the Czech Republic, Estonia, Hungary, Latvia, and Slovakia. So of the 26 countries, 12 now have central banks with complete autonomy. Another, Belarus, says that the government can only act against the wishes of the CB on those items not in the central bank’s objectives. The remaining give the CB little autonomy, subject to either an unchecked parliamentary (5 of 12) or executive veto.

Measuring Central Bank Independence

●

39

Was the grant of autonomy a wise choice? I look at the data for price depreciation (D in their paper, equal to the inverse of 1 plus the average rate of CPI inf lation) for the latter subperiods, which are defined by CMN based on the date of adoption of central bank laws.7 Their data indicate that the countries with complete central bank autonomy have an average rate of price depreciation of 17% per year, while those with any other form of conflict resolution had an average annual depreciation rate of 37%. Of course, the effects of price liberalization may reduce the size of that effect, as CMN demonstrate. However, the size of the effect of complete central bank autonomy may still be large. To make a good comparison, in table 3.2 I have rerun their regression on just the posttransition periods.8 These regressions may be compared to their table 3.3, except for not using the pretransition subperiod. I then substituted a simple dummy variable that equals one if the central bank has complete autonomy (i.e., if CMN find that the “central bank is given final authority over issues clearly defined in the law as CB objectives,” then my autonomy variable will equal 1; otherwise it equals 0). The results suggest that perhaps the simple measure of central bank autonomy is as useful a measure of CBI as the fuller measure CMN employ. This result confirms what was found in Banaian, Burdekin, and Willett (1995) for industrialized economies. The first and third equations replicate the first and fourth columns of their table 3.3. The size and significance of most coefficients are similar, except for the index for internal price liberalization. Like CMN, I see little evidence of significant effects of CBI as measured by their LVAW index (the p-value of 0.14 indicates a 14% probability of no significance.) The measure of central bank autonomy fares little better. In the third and fourth columns I take advantage of CMN‘s insight that the effectiveness of CBI may depend on creating a price system more like those in the industrialized economies, as measured by the cumulative liberalization index (CLI). They use a slope dummy that splits the slope of LVAW at a CLI measure greater than 4. They used a cut-off at 2, but since this is a cumulative index, it will naturally have higher values in later periods. The mean value of CLI for the third subperiod is 3.42 and only 5 countries had values less than 2. In their example, this brings the significance of the LVAW measure in total (for a country that has liberalized prices) to about 5–6%. CMN expected that CBI would only obtain anti-inf lationary effects if the degree of price liberalization placed the country’s price system more in line with those in the West. Thus, they found that “The

40

●

King Banaian

coefficients of (CBI) at low levels of cumulative liberalization remain insignificant and the coefficient of CLI (which was significant before) becomes insignificant at conventional levels, but its sign remains negative . . .” (p. 20). My results show just the opposite when one resets the slope shift dummy to occur at CLI greater than 4. It now appears that the effects of CBI in reducing inflation are significant only for countries that have liberalized less. For countries that have CLI > 4, the effect of CBI is nil, while the effects of CLI continue to be as strong as in those regressions without the slope shift coefficient. It might therefore be useful to run a regression with the principal components along as CMN have. One may use the principal components and then rearrange or “unscramble” the results to obtain coefficients on the original central bank attributes.9 This appears in table 3.3. I dropped the third principal component (which mostly loads the ability of the CB governor to hold another office) as it was insignificant. The result of that estimation is that the conflict resolution mechanism in the central bank law and the CB’s objectives are significantly correlated with a country’s price depreciation. One should approach these results with due caution, however, as they are based on only 20 central banks for whom full data are available. 4 Issues with Other Measures Some researchers have used turnover rates for central bank governors as an alternative means of testing central bank independence. The problem with this measure, however, is that turnover may be endogenous to economic performance (see De Haan and Kooi 2000 or Dreher, De Haan, and Sturm 2006). Central bank governors may change when governments themselves are unstable. And countries with different attitudes toward inflation (or more precisely, different dominant interest groups with different preferences for inflation) may in fact prefer longer or shorter turns in office. The importance of commercial banks would be one example. It is somewhat of a stretch then to say that high rates of turnover of a central bank’s chief executive officer (CEO) is evidence for or against independence. Central bank accountability may call for a frequent review of performance, while granting high amounts of independence in the interreview period. It would be odd to view these reviews then as political interference. Evidence on turnover by Cukierman (1992) found two-way causality between inflation and turnover. Dreher, De Haan, and Sturm (2006) show that CEOs are replaced more often when inf lation is higher, along

Measuring Central Bank Independence

●

41

with higher degrees of political instability and turnover and the election of left-wing governments. Again, the problem arises: is this a measure of independence or accountability? As Eijffinger and De Haan (1996) note, a long term in office may just reflect a subservient central bank governor, while shorter terms could mean a central bank governor who stands up to the executive and/or legislative branches. Cukierman and Webb (1995) try to refine the turnover measure by looking only at those changes in central bank CEOs that happen within six months of a change in government. Eijffinger and De Haan (1996) argue that this measure may be quite useful in developing economies, where weak rule of law may mean the central bank’s legal and actual independence differ sharply. Other attempts to measure autonomy have met with more success. For example, Oatley (1999) finds that when holding labor market structure and policy preferences of the government equal, simple measures of autonomy explain inflation outcomes better than either the GMT or CWN indices. Likewise, Banaian, Burdekin, and Willett (1995) find that the absence of a government override of central bank policy outperforms the CWN index. Fry, Goodhart, and Almeida (1998) include the results of a 1996 survey of central bankers in developing countries conducted by the Bank of England. Central bankers who saw themselves as more autonomous did not finance government deficits through the inf lation tax or by financial repression. Cobham, Cosci, and Mattesini (2005), studying the central banks of France, Italy, and the United Kingdom, rely on a different set of measures of informal CBI, defined as a central bank being able to pursue price stability when it is not the central bank’s goal and without regard to government’s preferences. They look at seven attributes, none of which refers to a legal document. The resulting ranking is very subjective and while interesting, the paper has so far not attracted much attention. 5 Back to the Future: A New Lexicography of Central Banks Thus it appears from this analysis that the two or three most important factors in determining which central bank de jure features help reduce inf lation are the CB’s focus on price stability and whether it has final authority in setting monetary policy. My strategy is to use that feature to return to a model such as Eijffinger and Schaling (1993). However, to do so requires a few adjustments to their process.

42

●

King Banaian

First, as central banks have focused on inf lation targeting, many elements of political autonomy for central banks have ceased to have much variation between them. Arnone, Laurens, and Segalotto (2006b) recoding of the GMT index in the Organization for Economic Cooperation and Development (OECD) countries finds that only four countries have not provided legal protections to CBs to strengthen them in case of conf lict with government. But three countries—Australia, Canada, and New Zealand—use an inflation target that is enacted by legislation or otherwise imposed by government. The same was true of the United Kingdom when inf lation targeting was first introduced in 1993; the Bank of England gained independence only in 1997 after the election of the Blair government. Many countries in the OECD also placed greater emphasis in their laws on price stability. This and longer terms for CB governors constitutes a great amount of the improvement in political autonomy in OECD central banks since 1990. This means two things. First, as noted by Arnone, Laurens and Segalotto (2006a), if the researcher uses a GMT index for central banks today, there’s less variation for the OECD countries. The European Monetary Union countries all score 8 of 8 marks for political autonomy, and Switzerland has moved to 7 of 8 from the five it scored in the original coding by GMT. Second, the GMT index as recoded by Arnone et al. (2006b) gives 4 of the 5 lowest marks to the four Anglospheric central banks, which have inflation targeting imposed by legislation or by approval of the government. (The three remaining banks are Denmark, Japan, and the United States.) Among emerging market economies, there is more heterogeneity in terms of conf lict resolution, but only the South African Reserve Bank does not have inf lation in its charter as its primary objective. Yet it adopted an inflation targeting rule in 2000 (for details of its relation to the government see van der Merwe 2004).10 Arnone et al. (2007) review the evidence on central bank independence and draw four “consensus views” of monetary policymakers from global trends. 1. “Set price stability as the primary objective of monetary policy.” The time-inconsistency argument for inf lationary bias in democratic countries has led to broad agreement on the establishment of price stability as the sole goal as part of a credible commitment. 2. “Curtail direct lending to governments.” Consensus has formed among central bankers that any lending to government should be

Measuring Central Bank Independence

●

43

temporary, restricted by amount and subject to market rates of interest. 3. “Ensure full autonomy for setting the policy rate.” This implies both instrument independence (in the sense of Debelle and Fisher 1995) and a consensus that a short-term interest rate is the best operational target for monetary policy. 4. “Ensure no government involvement in policy formulation.” There should be no veto by government in the decisions, and the structure of central bank laws should strengthen the position of the central bank when conf licts arise with the government. I argue that this list constitutes an effective set of categories for classification of central banks. Rather than develop a new system of weights and steps, the method I propose takes these four consensus views and creates a category indicating which of these each country’s central bank has adopted, along the lines of Eijffinger and Schaling (1993) and Schaling (1995, Chapter 3), who create eight potential central bank structures but discover only five of the eight were adopted by any of the central banks of the OECD countries. Arnone et al. (2007) argue for a sequencing of reforms in which goals and basic autonomy of the central bank (in particular instrument independence) would come before the imposition of limits on central bank lending to government. In developing economies, central bank participation in government debt markets may help in countries with shallow money markets. Governments would demonstrate that direct lending is curtailed if they make their central bank completely autonomous. As noted in Banaian, Burdekin, and Willett (1995, 1998), direct lending does not provide any further explanation of inf lation control in either developing or industrialized economies once autonomy is accounted for. Therefore, in the following discussion, I do not account for it.11 All three of the remaining criteria are political variables. Both the CWN and GMT indices measure these, and Arnone et al. update those measurements for newer central bank charters. As GMT uses a simple 0–1 measure it would seem easy to use their measurements, but there remains the question of drawing the lines in converting them as Arnone et al. do. They consider the price stability objective criterion to be met in cases where price stability is mentioned with other goals, even those that would “potentially conf lict” with price stability. This is quite outside the consensus view they claim.12 In the case of many laws governing central banks in the EU, laws are worded to state that price stability is the primary objective of monetary policy and task of the

44

●

King Banaian

central bank, and then say “without prejudice to its primary objective,” the central bank can support macroeconomic policies of the government. In this case, I believe the subsidiary of full employment or other objectives is sufficiently clear to fit the consensus, and I treat those central banks as if they had a sole objective. A very important consideration in this would be whether objectives for financial stability in a central bank charter conf lict with price stability, when those are the only two objectives listed in the law. In the European Central Bank (ECB) law makes it quite clear that financial stability is secondary to price stability, but in central bank laws of countries where central banks are said to have a great deal of autonomy— such as the Reserve Bank of New Zealand, the Bank of Canada, or the Riksbank—financial stability is provided more as a constraint on pursuit of price stability.13 It is quite true, as Ferguson (2002) points out, that if the central bank does not produce price stability it will get financial instability, as expectations for macroeconomic outcomes are not met. But the question remains whether the reverse is true: can one have financial instability when the central bank is producing price stability, and if so, does financial instability then threaten price stability? Ferguson argues that it is not a question of whether one ignores financial stability in that case, but what weight one places on it. It is still a very open question, but in the classification that follows, I will treat financial stability as being consistent with price stability. Also, for strengthening the hand of the central bank in conf licts with the government, Arnone et al. use a curious recoding of the CWN measure to say the hand is strengthened either if there is a conf lict resolution by committee of the central bank, executive and legislature, or if conf licts are decided by the legislature alone. Certainly, in the second case this cannot be considered removing political interference from central bank policy. I argue that in a negotiation with the central bank, parliaments and presidents will hold a great deal of sway and make it difficult for the central bank to hold onto the conf licting policy. There are arguments for the central bank having more autonomy; the more transparent is the veto of the legislature or executive. I will nonetheless argue for a very clear autonomy, and thus the only veto that will be seen as still permitting a strong central bank hand in conf licts will be provisions that only allow for veto over matters not defined as the bank’s primary objective—that is, if the central bank has a sole objective of price stability but wanted to build new, ornate branch offices, the government could object to that. Just not the bank’s pursuit of price stability.14

Measuring Central Bank Independence

●

45

In table 3.4 I show these classifications for the OECD banks in 1993 and 2006.15 The data reveal the broad movement of central banking toward this consensus view. Every central bank listed has moved toward what theory would state is a better central bank structure in the past 20 years except Switzerland and the United States, both of which started with very good structures.16 All central banks that had none of the three desired central bank attributes in the consensus view have changed their laws to take at least one of them, and all have made price stability one of monetary policy’s objectives if not the only one. In developing countries, a broad majority follow this advice. Mahadeva and Sterne (2000) found in a survey of 94 central banks that 26% had only monetary stability as a goal, while another 57% had monetary policy and other goals that did not compete with that goal, such as financial stability or stability of the payments system. Most of the countries that have retained a government override do so within a framework of inf lation targeting. In these cases the government has made the commitment to the inf lation targeting regime and assigned the central bank the task of meeting that objective. Many developing and emerging market economies have also chosen this path. It may be in this case that this method provides some accountability to government of reducing pressures from fiscal deficits. Australia is an interesting case insofar as it retains (in Section 10 of the Reserve Bank Act of 1959) both the goal of providing for “the stability of the currency of Australia” and to “(maintain) full employment in Australia.” The consensus view would find this one step below the independence of the other government-adopted inflation targeters in the OECD. In a strong sense, there is a parallel between these central banks and the pre-ECB Nederlandsche Bank. As Burdekin and Willett (1991) argue, the Dutch government could provide for an override of the bank’s policy, but had to do so by an open directive that was laid before the parliament, with an explanation. Likewise, these inf lation targeting central banks are under the control of government, but the government has to argue openly why their override is consistent with the agreed inflation target. Governments cannot use the central banks as scapegoats for a failed macroeconomic policy when they have a veto over policy. It is tempting to place the central banks listed here on a scale, much as Alesina or Eijffinger and Schaling did using a similar strategy 15 years or more ago. But, the nature of the differences in the scale would now be very different. The difference between the two most independent structures that we actually observe is over the possibility that the Federal

46

●

King Banaian

Reserve and the Swiss National Bank are less inflation-averse because of their dual mandates. But Meyer (2001) points out that the sole goal of the ECB may not mean a 0 weight on output variability from full employment.17 I do not think we have yet enough data on the ECB to determine whether it has a weight on output variability greater than 0. Likewise, it is worth considering whether the step between Japan and Australia is the same as between Australia and the other inf lation targeters (outside of the ECB, or Mexico). It is, however, quite reasonable to treat the Fed, the SNB, the ECB, and the Bank of Mexico as qualitatively more independent than those where the government has an override (even when providing accountability through an inflation targeting program). As argued earlier in this chapter, ordinal rankings make some sense but cardinal values do not. 6 Conclusions It is more contentious to use the classification scheme described here, but it has precedent. The IMF (2006) classifies exchange rate regimes into eight categories, and monetary policy frameworks into five possible structures, without placing any numbers on them. Levy-Yeyati and Sturzenegger (2003) use cluster analysis to classify exchange rate regimes and frameworks. A cluster analysis uses a type of discriminant analysis that seeks groupings as I have in this paper, and chooses each central bank as part of a cluster depending on the similarity of experiences with some macroeconomic target. For exchange rate regimes, the variances of the exchange rate and of the change of the exchange rate are chosen. If one wanted to move from a de jure measure of central bank independence to a de facto measure, this would seem the path to take. The exchange rate classification uses theorized outcomes of exchange rate behavior to make the classification. Is the central bank’s structure or its legal mandate the only determinants of, say, price level variability? If one wanted to include fiscal dominance, should budget deficit or government debt ratios (to GDP) be included as a criterion for grouping? Instead, I have argued for a return to a simpler measure of central bank independence that uses the coalescing of professional opinion in research since the development of these measures 15 years ago. By focusing on the price stability mandate, instrument independence, and the conflict resolution mechanism, I find that a group of banks led by the ECB have moved ahead of the Federal Reserve and Swiss National Bank. Using those criteria keeps the Fed and SNB ahead of the countries whose governments have imposed an inflation target on their central banks.

Measuring Central Bank Independence

●

47

Most importantly, I argue that central bank independence needs to be thought of as a set of categories, not a continuous variable. While the latter is tempting for the purposes of statistical analysis, the process of creating continuous variables leads to problems in interpretation, and these problems are not solved by better computing. The method used instead is quite arbitrary, in particular the ordering of which criterion goes first. I believe it is better to make the choice and do so explicitly than to provide any sense of evenhandedness or numerical certainty through an aggregation scheme. Appendix

Table 3.1

Weightings in Various Central Bank Independence Indices Bade/ Parkin (%)

ES 1993 (%)

C92— LVAU (%)

C92— LVAW (%)

Term of office of CEO

0.000

0.000

3.125

5.000

12.500

0.000

Who appoints CEO

0.000

0.000

3.125

5.000

12.500

0.000

Dismissal provisions

0.000

0.000

3.125

5.000

0.000

0.000

Can CEO hold another office?

0.000

0.000

3.125

5.000

0.000

0.000

Other board members appointed by someone other than the government

25.000

33.333

0.000

0.000

12.500

0.000

Board appointment term of office

0.000

0.000

0.000

0.000

12.500

0.000

Government sits on CB board

25.000

33.333

0.000

0.000

12.500

0.000

Who forms monetary policy?

0.000

0.000

3.125

3.750

12.500

0.000

Conflict resolution

25.000

33.333

6.250

7.500

12.500

0.000

CB advises budget

0.000

0.000

3.125

3.750

0.000

0.000

25.000

0.000

12.500

15.000

12.500

0.000

Limits on advances

0.000

0.000

12.500

15.000

0.000

14.286

Limits on securitized lending

0.000

0.000

12.500

10.000

0.000

0.000

CB objectives

GMT91— GMT91— Political Economic (%) (%)

Continued

Table 3.1

Continued Bade/ Parkin (%)

ES 1993 (%)

C92— LVAU (%)

C92— LVAW (%)

GMT91— GMT91— Political Economic (%) (%)

Who controls terms of lending

0.000

0.000

12.500

10.000

0.000

0.000

Width of circle of borrowers from CB

0.000

0.000

12.500

5.000

0.000

0.000

Lending limits

0.000

0.000

3.125

2.500

0.000

14.286

Maturity limits

0.000

0.000

3.125

2.500

0.000

14.286

Interest rate limits

0.000

0.000

3.125

2.500

0.000

14.286

CB prohibited from primary market

0.000

0.000

3.125

2.500

0.000

14.286

Discount rate set by CB

0.000

0.000

0.000

0.000

0.000

14.286

Bank supervision

0.000

0.000

0.000

0.000

0.000

14.286

Source: C92, Cukierman (1992, pp. 379–380); GMT—EconPolicy (1991, pp. 368–370); Bade and Parkin (1988); ES93—Eijffinger and Schaling (1993, p. 65).

Table 3.2

Inflation and CBI in Transition Economies

Variable

Regression 1 Regression 2 Regression 3 Regression 4

Cumulative liberalization index

⫺0.07307 (0.05)

⫺0.08308 (0.02)

⫺0.11543 (0.01)

⫺0.09477 (0.01)

War dummy

⫺0.06471 (0.39)

0.07214 (0.32)

0.04479 (0.53)

0.06371 (0.36)

Index of internal price liberalization

⫺0.48445 (0.19)

⫺0.45589 (0.21)

⫺0.47241 (0.18)

⫺0.50993 (0.15)

CMN index (“LVAW”)

⫺0.28473 (0.14)

⫺0.33270 (0.08)

LVAW slope shift (for CLI > 4)

0.24414 (0.07) ⫺0.09679 (0.11)

Central bank autonomy

⫺0.16563 (0.02)

Autonomy slope shift (for CLI > 4)

0.16620 (0.07)

Joint significance of central bank measure Adjusted R 2

0.62

0.63

Note: p-values for significance in parentheses. Sample size = 31.

0.67

0.04

0.65

0.66

Table 3.3 Unscrambled Principal Components Analysis of CBI in Transition Economies Variable

Coefficient

p-value

War dummy Cumulative liberalization index Internal liberalization Principal component 1 Principal component 2 Constant

0.089 20.111 ⫺0.026 0.161 ⫺0.115 0.698

0.34 0.02 0.96 0.10 0.27 0.05

Unscrambled Coefficients Term of office Dismissal process Governor can hold another office Who formulates monetary policy Who has final authority Participation in budget process Statutory objectives of CB

⫺0.071 ⫺0.039 ⫺0.008 ⫺0.093 20.073 ⫺0.010 20.187

0.12 0.39 0.61 0.19 0.04 0.73 0.04

Adjusted R-square Standard error

0.64 0.118

Note: Coefficients in bold are significant at 95% confidence level. Dependent variable is inflation rate. See table 3.2 for more details.

Table 3.4

Classification of OECD Central Banks

Prototype Government Price Stability Instrument Exists 1989? Exists 2007? 1989 Examples Override? Objective? Independence a.

Yes

None

No

Yes

No

b.

Yes

None

Yes

No

No

c.

Yes

Multiple

No

Yes

Yes

d.

Yes

Multiple

Yes

No

Yes

e.

Yes

Sole

No

Yes

Yes

2007 Examples

Belgium, Canada, France, United Kingdom, Japan, Korea, Mexico, Norway, New Zealand, Poland, Sweden Australia, Hungary, Iceland, Spain

Japan Australia

Finland, Greece, Ireland, Netherlands

Canada, Korea, Norway, New Zealand, United Kingdom

f.

Yes

Sole

Yes

No

No

g.

No

None

No

No

No

h.

No

None

Yes

No

No

i.

No

Multiple

No

Yes

No

Denmark

j.

No

Multiple

Yes

Yes

Yes

United States, Switzerland

United States, Switzerland

k.

No

Sole

No

No

No

l.

No

Sole

Yes

Yes

Yes

Germany

ECB and its membership (including associates), Mexico

Source: Year 1989 data from Cukierman (1992); 2007 by author, from BIS (2007) collection of central bank laws.

Measuring Central Bank Independence

●

51

Notes 1. All of these are reviewed by Siiklos (2002). 2. As Schaling (1995) notes, this is not a direct criterion applied but implied in the discussion of the “divorzio” of the Banca d’Italia from absorbing the excess supply of Treasury securities. See also Tabellini (1988). 3. Buiter (2006) refers to complete operational independence as equivalent to a lack of substantive accountability. There is no judgment or consequence for a central bank that, acting as a delegate of authority from the people and/or the government, suffers when its actions are not desired by those principals. It is not surprising that truly operationally independent central banks have effectively no substantive accountability at all. Independence has to mean that those in charge of monetary policy cannot be fired except for incapacity or serious misconduct, and that financial remuneration and working conditions likewise cannot be used to reward or punish them (pp. 23–24). 4. I say this despite the fact that Alesina goes so far as to classify the Bank of Italy (BI) with a fractional number. That is clearly a judgment meant to indicate that he thought there was some difference between BI and other dependent central banks, but not enough to fit into the classifications warranting the next integer. The intent is nonetheless ordinal. 5. That is not to deny that at other times the NBU has bought debt or refused bids because the government would not accept the interest rate that the debt market would bear at that time. 6. That does not preclude, of course, the use of categorical or dummy variables in regression so that one can obtain conditional differences in means. 7. There are eight countries for which there are two subperiods after reform of the central bank law, so these means are for a set of 34 time periods of varying length. See CMN, table 3.4; the means I offer skip the first subperiod in every case. 8. Mongolia is excluded because CMN have no inf lation data, and Poland after the second central bank law is excluded because there is no information on price liberalization. 9. One might wish to argue that the price liberalization measures should be included in the principal components analysis. It turns out that those data are mostly orthogonal to the central bank attributes, and it makes little difference whether they are included or excluded. 10. For the purposes of this chapter, the following countries are listed as inf lation targeters as of 2004: Australia, Brazil, Canada, Chile, Columbia, Czech Republic, Hungary, Iceland, Israel, Korea, Mexico, New Zealand, Norway, Poland, South Africa, Sweden, Thailand, and the United Kingdom. I would also include the European Central Bank. 11. There is also a practical consideration. Using a classification scheme for consensus views with verbs like “set” or “ensure” are straightforward.

52

12.

13.

14. 15. 16.

17.

●

King Banaian

Either price stability is the sole goal or it is not; either the CB has final authority over monetary policy or it does not. “Curtail” is a different matter. We can curtail without eliminating entirely, so deciding whether one has curtailed is a judgment call. This reintroduces the same arbitrariness that I have faulted in the CWN and GMT indices. In terms of the CWN measure, they state price stability is a primary objective if the central bank’s score on the CWN table is greater than or equal to 0.4. The Bank of England is stranger yet. It is told to pursue price stability and “subject to that,” pursue policies to support government goals for economic growth and employment. It also has a memorandum of understanding with the government to provide for stability of the monetary system and the financial system (particularly regarding the payments system), and to provide oversight for the financial system more generally. Again, in terms of the CWN measure, I would count only those central banks with values of 0.8 or 1 as holding the upper hand in conf licts. On the Web site that complements this book, you can find a longer list of other central banks. The dual mandate of the Swiss National Bank may be less known. Article 5, Section 1 states “The National Bank shall pursue a monetary policy serving the interests of the country as a whole. It shall ensure price stability. In so doing, it shall take due account of the development of the economy.” I am not interpreting the words “In so doing” as providing the same degree of subsidiarity in policy objectives as I have described elsewhere. Another way to think of this is whether a central bank that has the upper hand in policy conf licts with the government and instrument independence is any less “weight conservative” in the Rogoff (1985) or Svensson (1997) sense than a central bank with those qualities and a stated sole goal for price stability. Such banks may nonetheless have the ability and incentive to smooth output or interest rate f luctuations.

References Akhtar, M. A. 1995. Monetary Policy Goals and Central Bank Independence. Banca Nazionale del Lavoro Quarterly Review 195: 423–439. Alesina, A. 1988. Macroeconomics and Politics. NBER Macroeconomics Annual 3: 13–52. Alesina, A. and L. H. Summers. 1993. Central Bank Independence and Macroeconomic Performance: Some Comparative Evidence. Journal of Money, Credit and Banking 25 (2): 151–162. Arnone, M., B. Laurens, and J. F. Segalotto. 2006a. The Measurement of Central Bank Autonomy: Survey of Models, Indicators and Empirical Evidence. IMF Working Paper 06/227.

Measuring Central Bank Independence

●

53

———. 2006b. Measures of Central Bank Autonomy: Empirical Evidence for OECD, Developing and Emerging Market Economies. IMF Working Paper 06/228. Arnone, M., B. Laurens, J. F. Segalotto, and M. Sommer. 2007. Central Bank Autonomy: Lessons from Global Trends. IMF Working Paper 07/88. Bade, R. and M. Parkin. 1982. Central Bank Laws and Monetary Policy. University of Western Ontario (unpublished manuscript). Banaian, K., R. C. K. Burdekin, and T. D. Willett. 1995. On the Political Economy of Central Bank Independence. In Monetarism and the Methodology of Economics: Essays in Honor of Thomas Mayer, ed. K. D. Hoover and S. M. Sheffrin, 178–197. Aldershot, UK: Edward Elgar. ———. 1998. Reconsidering the Principal Components of Central Bank Independence: The More the Merrier? Public Choice 97 (1–2): 1–12. Banaian, K. and W. A. Luksetich. 2001. Central Bank Independence, Economic Freedom, and Inflation Rates. Economic Inquiry 39 (1): 149–161. Buiter, W. 2006. Rethinking Inflation Targeting and Central Bank Independence. Background Paper for the Inaugural Lecture for the Chair of European Political Economy in the European Institute at the London School of Economics and Political Science, October 26, London, United Kingdom. Burdekin, R. C. K. and T. D. Willet. 1991. Central Bank Reform: The Federal Reserve in International Perspective. Public Budgeting and Financial Management 3 (3): 531–551. Cobham, D., S. Cosci, and M. Fabrizio. 2005. Informal Central Bank Independence: An Analysis for Three European Countries. Departmental Working Papers 217, Tor Vergata University, CEIS. Cukierman, A. 1992. Central Bank Strategy, Credibility and Autonomy. Cambridge, MA: MIT Press. Cukierman, A., G. P. Miller, and B. Neyapti. 2002. Central Bank Reform, Liberalization and Inflation in Transition Economies—An International Perspective. Journal of Monetary Economics 49 (2): 237–264. Cukierman, A., and S. B. Webb. 1995. Political Influence on the Central Bank: International Evidence. The World Bank Economic Review 9 (3): 397–423. Cukierman, A., S. B. Webb, and B. Neyapti. 1992. Measuring the Autonomy of Central Banks and Its Effects on Policy Outcomes. The World Bank Economic Review 6 (3): 353–398. De Haan, J. and W. J. Kooi. 2000. Does Central Bank Autonomy Really Matter? New Evidence for Developing Countries Using a New Indicator. Journal of Banking and Finance 24 (4): 643–664. Debelle, G. and S. Fisher. 1995. How Independent Should a Central Bank Be? In Goals, Guidelines and Constraints Facing Monetary Policymakers, ed. Jeffrey Fuhrer, 195–211. Boston: Federal Reserve Bank of Boston. Dreher, A., J. De Haan, and J. E. Sturm. 2006. When Is a Central Bank Governor Fired? Evidence Based on a New Data Set. Working Paper 06–143. KOF Swiss Economic Institute, ETH Zurich.

54

●

King Banaian

Eijffinger, S. C. W. and De Haan, J. 1996. The Political Economy of Central Bank Independence. Princeton Studies in International Economics 19, International Economics Section, Departement of Economics Princeton University. Eijffinger, S. C. W. and E. Schaling. 1993. Central Bank Independence in Twelve Industrial Economies. Banca Nazionale del Lavoro Quarterly Review 184: 49–89. Ferguson, R. W. 2002. Should Financial Stability Be an Explicit Central Bank Objective? Paper Presented to an International Monetary Fund conference on Challenges to Central Banking from Globalized Financial Markets, September 17, in Washington, DC. Forder, J. 1998. Central Bank Independence—Conceptual Clarifications and Interim Asessment. Oxford Economic Papers 50 (3): 307–334. Fry, M. J., C. A. E. Goodhart, and A. Almeida. 1998. Central Banking in Developing Countries: Objectives, Activities and Independence. London: Routledge. Grilli, V., D. Masciandaro, and G. Tabellini. 1991. Political and Monetary Institutions and Public Financial Policies in the Industrial Countries. Economic Policy 6 (13): 341–392. International Monetary Fund. 2006. De Facto Classification of Exchange Rate Regimes and Monetary Framework. http://www.imf.org/external/np/mfd/ er/2006/eng/0706.htm. Levy-Yeyati, E. and F. Sturzenegger. 2003. To Float or to Fix: Evidence on the Impact of Exchange Rate Regimes on Growth. American Economic Review 93 (4): 1173–1193. Mahadeva, L. and G. Sterne. 2000. Monetary Policy Frameworks in a Global Context. London: Routledge. Meyer, L. 2001. Comparative Central Banking and the Politics of Monetary Policy. Remarks Given to the National Association of Business Economics Seminar on Monetary Policy and the Markets, May 21, in Washington, DC. Oatley, T. 1999. Central Bank Independence and Inflation: Corporatism, Partisanship, and Alternative Indices of Central Bank Independence. Public Choice 98 (3–4): 399–413. Posen, A. S. 1993. Why Central Bank Independence Does Not Cause Low Inflation: There Is No Institutional Fix for Politics. In Finance and the International Economy, ed. R. O’Brien, 40–65. New York: Oxford University Press. Rogoff, K. 1985. The Optimal Degree of Commitment to an Intermediate Monetary Target. Quarterly Journal of Economics 100 (4): 1169–1190. Schaling, E. 1995. Institutions and Monetary Policy: Credibility, Flexibility and Central Bank Independence. Aldershot, UK: Edward Elgar. Siklos, P. 2002. The Changing Face of Central Banking: Evolutionary Trends since World War II. Cambridge: Cambridge University Press. Svensson, L. E. O. 1997. Optimal Inflation Targets, “Conservative” Central Banks, and Linear Inflation Contracts. American Economic Review 87 (1): 98–114.

Measuring Central Bank Independence

●

55

Tabellini, G. 1988. Monetary and Fiscal Coordination with a High Public Debt. In High Public Debt: The Italian Experience, ed. F. Giavazzi and L. Spaventa. Cambridge: Cambridge University Press. Van der Merwe, E. 2004. Inflation Targeting in South Africa. Occasional Paper 19, July, South African Reserve Bank.

This page intentionally left blank

CHAPTER 4

Fiscal Indicators John E. Anderson

1 Introduction Interest in cross-country comparisons of fiscal conditions has heated up in recent years for two primary reasons. First, the transition of former centrally planned economies to market orientation in the 1990s brought with it a concern for fiscal reform and an examination of the fundamental institutions on which a market economy is built. Fundamental reform of each country’s fiscal system was necessary in order to facilitate economic transformation. Consequently, researchers began to construct various indices of fiscal reform and use those indices to analyze effects of fiscal reform. Second, with advances in globalization, increasing mobility of capital, and growing international trade f lows, there is greater concern for examining the fiscal conditions across countries. Since both fiscal conditions and the quality of institutions can have direct impacts on economic growth and development, policy analysts have been intent on measuring and describing cross-country variations. As a result, academic researchers, think-tank policy analysts, and world financial institutions have devoted considerable effort to chronicling fiscal conditions and the quality of institutions across countries. Numerous indices of fiscal conditions and institutional quality have been developed and used in policy analysis. In this chapter I focus on two types of indices—those measuring fiscal conditions in transition countries, and those more broadly measuring fiscal conditions across the world’s countries. Section 2 examines several indices specific to transition economies while section 3 covers broader measures applied to a wide range of

58

●

John E. Anderson

countries. Finally, section 4 summarizes what we learn from these indices and provides guidance for the careful use of such indices.

2 Indicators of Fiscal Conditions and Reform in Transition Countries 2.1 Transition and Taxes Fiscal reform in transition countries has involved fundamental restructuring of both the revenue and expenditure systems to facilitate the larger transition to market-oriented resource allocation in the economy. On the expenditure side, fiscal reform efforts have rationalized public sector responsibilities, introduced hard budgets and modern budgeting processes, and established treasury functions. On the revenue side, the focus of fiscal reform efforts has been on the development of a comprehensive tax code, the establishment of a destination-based consumption-type VAT, the implementation of a corporate income tax based on market-based net income, the widespread elimination of exemptions and preferences. For an overview of the typical fiscal reforms recommended and implemented in transition countries, see Lorie (2003), Martinez and McNab (2000), Martinez-Vazquez and McNab (1997a), Summers and Baer (2003), Stepanyan (2003), and Tanzi and Zee (2000). Mitra and Stern (2002) have analyzed the transition experience of CIS and CSB countries (Central and Eastern Europe and the Baltics) and compared their experiences to high-income OECD countries. They have identified opposing movements in key ratios often used to monitor fiscal reform. Both tax levels and the composition of tax revenue sources are considered in assessing progress in fiscal reform. Opposing effects arise, however, in two ways. First, there are opposing effects between the beginning of transition and the situation at the end of a decade of transition. Second, there are opposing effects in cross-section comparisons of transition countries after a decade of reform and highly developed industrial countries. For both reasons, Mitra and Stern suggest that there is a U-shaped temporal pattern of the share of tax revenues to GDP and the shares of major taxes in tax revenue. In the cross-section comparison of transition countries, there are several factors to consider. There is a loss of revenue from traditional profit, turnover, and payroll taxes due to the noncompetitive nature of state enterprises. Price liberalization, new hard budget constraints, and

Fiscal Indicators

●

59

private competition combine to reduce the potential revenue generated by taxing these entities. Furthermore, the complexity of fiscal reform has involved a limited ability to quickly implement a broad-based low-rate tax structure that is effectively administered. The challenge has been that of instituting a new tax system that fosters compliance among new and restructured enterprises, before they are driven underground. For both of these reasons it has been difficult for transition governments that formerly operated with a preemptive claim on the output of enterprises and the associated income generated and earned. Under the centralized systems before transition, the government exercised its claim to resources before citizens had access to the remainder. With transition and a less centralized system, however, the government has a diminished role and is forced to collect revenue in order to support spending. Mitra and Stern identify several implications of this transition situation, including a reduction in the ratio of tax revenue to GDP (due to declining corporate income tax revenue), a reduction in the ratio of public expenditures to GDP (due to a macroeconomic need to reduce fiscal deficits to control inflation), a reduction in the importance of income taxes (due to the reduced corporate income tax revenue), a reduction in the importance of social insurance tax revenues in CIS countries, an increase in the share of individual income taxes, and an increase in the importance of indirect taxes such as VAT and excises taxes (reflecting the decline in direct taxes). With fiscal reform we expect a rise in the ratio of tax revenue to GDP, an increase in the share of direct taxes in tax revenue, an increase in the share of revenue from personal income taxes, a reduction in the share of revenue from domestic forms of indirect taxation, and a reduction in the role of trade taxes. Bird and Banta (1999) provide a good overview of the many potential indicators that can be used to assess fiscal conditions in transition countries. They do not, however, discuss or develop any indices of fiscal conditions that combine multiple measures. Tanzi and Tsibouris (2000) discuss the expectation that progress in fiscal reform should result in improved revenue performance. They caution, however, that many of the reforms were recently implemented and have not yet been fully felt. Some reforms are revenue-reducing (such as the elimination of export taxes and excess wage taxes). Furthermore, many tax policy reforms have been hindered by problems in tax administration. Concern over tax evasion has been a particularly vexing issue in many transition countries. In one such study of tax evasion Anderson and Carasciuc (2004) examined evidence from Moldova and found quite predictable

60

●

John E. Anderson

effects, with greater measured tax evasion in sectors of the economy where audit frequencies were lower and/or where the real value of fines and penalties were lower. Within this context, researchers have worked to develop measures of fiscal reform for use in several ways. First, the measures are needed in order to assess the extent of reform within a given transition country. Second, the measures are needed in order to assess the extent of reforms across transition countries. In the following section we examine several of the indices that have been developed and used in policy analysis of fiscal reform among transition countries. 2.2 Fiscal Reform Indicators A number of analysts engaged in assessing the extent of fiscal reform during the first decade of transition. Consequently, we have available several indices of the extent of reforms. We may use these indices as measures of the breadth and depth of reforms in an attempt to determine whether reforms have had an impact on the business environment in which firms operate. Three sources of transition economy measurement are considered in what follows. 2.2.1 Martinez-Vazquez and McNab Reform Indices Martinez-Vazquez and McNab (1997b, 2000) have conducted extensive reviews of the tax reforms in transition countries. As part of their investigation, they have created two reform measures: a cumulative reform index and an overall reform index in Martinez-Vazquez and McNab (1997a). In this section we examine those two indices of fiscal reform. 2.2.2 Cumulative Reform Index (CRI) The CRI is constructed using data from 24 transition countries over the period 1989–1996. Martinez-Vazquez and McNab use 6 measures of the effectiveness of reform, including 1. Timing of tax reform—the period of time from the start of the transitional process of the implementation of a tax reform program that included a modern VAT. 2. Preparation for tax reform—the average period of time allocated for preparation of legislation and preparation for implementation. 3. Stability of the tax system—frequency of changes in the tax laws since the initial reform program.

Fiscal Indicators

●

61

4. High tax rates—positive deviation of the maximum rates for the primary revenue sources from the average maximum rate for the primary revenue sources of all countries in transition. 5. Prevalence of tax holidays—significance of tax holidays and special treatments. 6. Complexity—number of enterprise profit tax brackets. For each of the 6 measures, they examine the distribution of observations and assign a subjective score ranging from 0 to 3. A score of 0 indicates most effective reform while a score of 3 indicates least effective reform. By design, the CRI measure is thus a subjective index. While it is based on objective data in the form of a distribution of observations for each of the six factors listed above, the assignment of scores is ultimately subjective. Using the assigned scores from all six of these measures, they construct the CRI index for each country by summing the country’s six scores: 6

CRIi

∑X

j1

j

Hence, the CRI is bounded on the interval (0,18). Computed CRI index scores for the 24 transition countries, reported in table 4.1, range from a low score of 3 to a high score of 17. The mean score across all countries is 11.75. Using the CRI Marinez-Vazquez and McNab categorize the 24 transition countries according to the degree of reform. They identify the most advanced reform countries as the Czech Republic, Estonia, Latvia, and Croatia. Their high intermediate reform countries include Slovak Republic, Hungary, Lithuania, Poland, Kazakhstan, and Slovenia. Their low intermediate reform countries include Bulgaria, Kyrgyz Republic, Turkmenistan, Ukraine, Albania, Romania, Russian Federation, and Tajikistan. Finally, they identify slow reform countries, including Georgia, Azerbaijan, Armenia, Uzbekistan, Moldova, and Belarus. Table 4.1 summarizes their results. 2.2.3 Overall Reform Index (ORI) Martinez-Vazquez and McNab also construct an ORI constructed by assigning an index value from 0 to 3 for countries in each of the 4 groups identified in table 4.1. Countries with the most advanced reforms measured by the CRI index are assigned an ORI index value of 0. These 4 countries have a mean CRI index value of 3.25. Countries judged to

62

●

John E. Anderson

be high intermediate reformers, with mean CRI index values of 7, are assigned ORI index values of 1. Low intermediate reformers, with CRI index values averaging 11.75 are assigned an ORI index value of 2. Finally, the slow tax reformers with CRI index values averaging 14.5 are assigned an ORI index value of 3. Consequently, the ORI is an ordinal ranking that contains less information than the CRI from which it is derived. We would expect that in statistical analysis it would underperform the CRI measure in explanatory ability. Confirming this conjecture, Anderson (2005) reports two-stage estimates of tax bribe selection models in which CRI outperforms ORI as judged by a higher level of significance as an explanatory variable. 2.3 Index of Tax Policy Reform (ITPR) Ebrill and Havrylyshyn (1999) conducted a study of fiscal reforms in CIS and Baltic countries for the IMF. Their TPR index of tax policy reform over the period 1992–1998 measured the degree of policy reform using a scale from 1, indicating very little appropriate market-oriented reform, to a score of 5, indicating a high degree of reform. Hence, the TPR index is bounded on the interval (1,5). Tanzi and Tsibouris (2000) describe the TPR measure as an “unavoidably subjective ranking” that ref lects some of the fundamental pillars of tax policy reform. Factors included in unspecified ways are adoption of a comprehensive tax code or completely new laws for the major taxes, establishment of a VAT, implementation of a market-oriented profitbased tax, and elimination of exemptions and preferences. Given this description, we know that the TRP index for country i is a combination of these factors X ij with unknown weights aij for each of the j factors (j1 . . . n): n

TPRi

∑a

j1

ij X ij

Table 4.2 reports the ITPR for CIS and Baltic countries. According to the ITPR, Turkmenistan reformed the least over the period 1992– 1998 and the Baltic countries of Estonia, Latvia, and Lithuania reformed the most. While the ITPR rankings may accord with our sense of tax policy reform in these countries it has severe limitations. First, as a subjective ranking it tells us nothing about the measurable degree of tax policy

Fiscal Indicators

●

63

reforms in these countries. Second, we have no idea of the precise factor measurements X and weights a for those measurements used in the construction of the index. Consequently, the TPR index is of limited value for conducting research on fiscal reform. The indicators discussed in this section attempt to measure rather specific differences in fiscal conditions across transition countries. In the following section we turn to consider somewhat broader measures of conditions in transition economies where the focus is not narrowly on fiscal conditions. Rather, the context of institutions supporting the movement to market economies is the focus. 2.4 Index of the Capture Economy Economic transition has been accompanied by state capture and inf luence in transition economies. A capture economy has emerged in many transition countries in which rent-generating advantages are sold by public officials to private firms. In order to measure and analyze this development Hellman, Jones, and Kaufmann (2003) have constructed an Index of the Capture Economy (ICE). They use several measures collected in the World Bank/EBRD 1999 Business Environment and Enterprise Performance Survey (BEEPS) to construct their index. Specifically, they use six measures from the BEEPS survey: (1) parliamentary legislation; (2) presidential decrees; (3) central bank; (4) criminal courts; (5) commercial courts; and (6) party finance. Each of these Xij measures of the capture components are expressed as the percentage of firms in the BEEPS survey that consider the respective form of state capture to have a significant impact on the firm. From these six measures, Hellman et al. (2003) compute the ICE by taking a simple unweighted arithmetic average of the measures: 6

ICEi 1 6 j 1 X ij

∑

Theoretically, the ICE index is bounded on the interval (0,100) since it is the mean of 6 percentage measures. The actual distribution of ICE scores for the 22 transition countries examined in the BEEPS data ranges from a low of 6 to a high of 41, with a mean of 20. Table 4.3 reports the ICE scores and the Hellman et al. (2003) categorization of each country into low and high categories of risk of capture. Hellman et al. (2003) use the computed ICE measures to analyze the determinants of state capture and inf luence using regression analysis.

64

●

John E. Anderson

They find that influential and captor firms are strikingly different from one another. Inf luential firms are the typical state-owned enterprises left from the centralized socialist system. Those firms are large and have ready access to public officials. They began the transition process with clear property and contract rights. On the other hand, captor firms are the larger new entrants in the marketplace with no formal ownership ties to the state and less access to public officials. Captor firms began the transition process with insecure property and contract rights. Hellman et al. (2003) find that small firms are less likely to take part in both state capture and inf luence. This may be due to lack of access or to potential gains insufficient to justify such activities. The issue of state capture is part of a larger set of concerns about country’s institutions and the link between those institutions and the economic performance of the country. The literature examining the economic role of institutions has exploded in recent years. Two representative examples include Friedman et al. (1999) and Acemoglu and Johnson (2003). 3 Indicators of Fiscal Conditions across All Countries More general measures of fiscal conditions across all world countries have also become increasingly popular in recent years. In this section I review two of the most commonly cited indices that appear in both the popular press and academic research—the Heritage Foundation Economic Freedom Index and the World Bank/EBRD Doing Business indicators. While these indices have quite distinct foci, they have common elements related to fiscal conditions across the world’s countries. In the following sections I examine the components of each index that relate to fiscal conditions. 3.1 Heritage Foundation Economic Freedom Index (EFI) The Heritage Foundation publishes an annual EFI that has been popular since its inception in 1995. The EFI is intended as a comprehensive index of economic freedom and is computed as a simple average of 10 individual freedom scores, each of which the Heritage Foundation views as vital to the development of personal and national prosperity. Heritage describes economic freedom as “that part of freedom that is concerned with the material autonomy of the individual in relation to the state and other organized groups.” Individuals are considered free by Heritage if they can fully control their labor and property.

Fiscal Indicators

●

65

This index has been commonly used by researchers in a wide variety of academic studies. For example, Carter (2007) used this index in explaining economic growth among postcommunist countries and Johnson et al. (2000) used the tax components of this index in their path breaking studies of bribery and unofficial economic activity in postcommunist countries. Johnson et al. use the tax components of the index without mention of any concerns for the way the EFI is constructed. Their only editorial comment on the Heritage tax ratings occurs in a footnote where they say, “The Heritage Foundation’s tax ratings focus primarily on posted tax rates, rather than the way the tax system is administered or whether tax inspectors are corrupt.” Two components of the EFI relate to the fiscal condition of the country: Freedom #3: Fiscal Freedom and Freedom #4: Freedom from Government. Fiscal Freedom (FF) is a measure of the burden of government from the revenue side. It includes measures of both the tax burden measured by the top individual and corporate tax rates and the overall amount of revenue collected as a share of GDP. Freedom from Government (FFG) is a measure of the burden of government on the expenditure side. It includes all government expenditures for government consumption, transfers and expenditures of state-owned enterprises. Table 4.4 reports the top 5, middle 5 and bottom 5 countries according to the 2007 Heritage FF and FFG indicators. Countries with top scores in the fiscal freedom component are primarily resource-rich countries with no income taxes. Those with top scores in the freedom from government component are primarily less-developed countries in Latin America. 3.1.1 Freedom #3: Fiscal Freedom Fiscal freedom is the revenue-based measure that is composed of three equally weighted factors: 1. The top tax rate on individual income; 2. The top tax rate on corporate income; 3. Total tax revenue as a percentage of GDP. With these three subcomponents, the fiscal freedom measure is clearly a combination of both marginal and average tax rates. The stated intent of Heritage is to ref lect the overall tax burden on individuals and corporations. The top tax rates on individual and corporate income ref lect marginal tax rates applied to incremental increases in income from both sources. These rates affect behavior on the margin and can

66

●

John E. Anderson

be viewed as altering free choice if they are excessively high or if individuals and corporations are highly sensitive to these rates. On the other hand, total tax revenue as a percentage of GDP ref lects an average tax rate for all revenue sources in the county. This is a measure of the overall tax burden. For most countries those tax sources include both individual and corporate income taxes, but more importantly VAT revenues and other nonincome tax revenues. The top five countries in the 2007 FF ranking are Kuwait, Qatar, United Arab Emirates, and Bahrain. Clearly, the FF indicator gives the highest scores to resource-rich countries that do not have income taxes. At the bottom of the 2007 FF indicator list we find Cuba, Belgium, Chad, Denmark, and Sweden. The Fiscal Freedom in country i (FFi) measure is constructed as the sum of three equally weighted factors where each factor j is computed using a quadratic cost function formulation: FFij 100 200X ij2 where Xij is a component j in country i. Component Xij is the raw percentage value (between 0 and 1). FFi is then computed as the unweighted average of the three factors for each country i. FFi

1 3

3

∑ FF

ij

j1

For example, consider a country with top individual and corporate income tax rates of 35% and total tax revenue of 25% of GDP. The fiscal freedom measure for this country would be computed as FF (1/3) (75.5 75.5 87.5) 79.5. Heritage justifies the use of a quadratic formulation in equation (4) using an excess burden rationale. With linear demand, the excess burden of a tax rises with the square of the tax rate. Of course, the parameter choice in equation (4) makes FF an arbitrarily scaled measure. Furthermore, it is important to note that the derivative of FF with respect to the component X is 400X., which is an arbitrary value for the marginal effect of a change in the component. Fundamentally, the essential problem with the FF measure is that it combines both marginal and average tax rates. As a measure of overall tax burden, it would be better to simply use total revenue as a share of GDP. That would not capture the marginal tax rate effects, however.

Fiscal Indicators

●

67

But including them in the FF results in an ill-defined measure of the fiscal burden. A number of other problems come to mind in considering this FF component. First, the top individual and corporate tax rates are nominal rates, not effective rates. As a result they do not accurately ref lect true economic burdens and ref lect overall incentive effects. Exemptions, deductions, and credits are applied in all tax systems, making the effective tax rate quite different from the nominal tax rate. Second, the top individual and corporate tax rates are equally weighted in computing this measure when in reality individual and corporate tax revenues are not equally important fiscal elements for the country, nor are they equally important to individual or corporate taxpayers. Third, VAT revenue is the largest component of government revenue in many countries, and the VAT tax rate is not included in the index. Fourth, there is no distinction between revenue collected at the central government and local government levels. Presumably, revenue collected at a more decentralized level should ref lect a clearly association between the demand for public goods and the taxes used to fund their provision. Finally, the third measure is a linear combination of the first two measures. That is, total tax revenue is dependent upon both the top individual and corporate tax rates and is not an independent measure. It would be best to compute the total welfare cost of each of the taxes in each country, summing them to obtain the total welfare cost of taxation in each country. That would require knowledge of the specific welfare costs of each of the taxes, however, which is difficult to come by. A second-best alternative would be to compute two distinct revenuebased measures of the burden of government: one measure simply based on total government revenue as a share of GDP and a separate measure using marginal tax rates. Ideally, we would want to use the top personal and corporate tax rates to measure the marginal excess burden of increases in those rates for each country. That is very difficult, however, because it requires knowledge of a number of demand and supply elasticities, each of which is subject to measurement uncertainty. Tresch (2002) summarizes the consensus view on the efficiency cost of the U.S. personal income tax with the statement, “No consensus exists on the total or marginal dead-weight loss from the federal personal income tax.” If there is no consensus for the U.S. personal income tax, it would be very difficult to produce a revenue-based measure for all countries including all revenue sources.

68

●

John E. Anderson

3.1.2 Freedom # 4: Freedom from Government The EFI Freedom from government measure is comprised of two subcomponents: 1. Government expenditures in country i (GEi), expressed as a percentage of GDP; 2. Revenues generated by state-owned enterprises and property in country i (RSOEi) as a percentage of total government revenue. Given the EFI context, lower measures for both GEi and RSOEi would imply greater freedom from government. The GEi measure is used as a component in judging freedom from government, assuming that a lower GEi measure for a country implies greater freedom from government. A smaller government sector, relative to the total output of the country’s economy, reflects a less obtrusive government. While this may be true, it is likely to be highly contextual as citizens of some countries want greater government involvement rather than less. The RSOEi measure is also highly contextual as historic precedents and local conditions dictate varying roles for state-owned enterprises across countries. Together, these two measures make an odd combination. The first measure, GEi is government expenditure scaled as a percentage of GDP while the second measure RSOEi is a particular form of government revenue scaled as a percentage of total government revenue. Hence, the two relative amounts are being added where the standard of comparison is different for each measure. Heritage uses a quadratic cost function for GEi 2

GEi 100 3500X 1 j where X1i is government expenditure as a share of GDP in country i and the parameter 3500 has been chosen in order to calibrate the scores. Revenue from SOEs is computed as: RSOEi 100 X 2i where X2i is the percent of total revenue collected by SOEs. For example, Israel has an RSOEi score of 96.5, which is computed as 100 minus its 3.5% of SOE revenue collected. The Freedom from Government score for country i, FFGi, is computed as a combination of GEi and RSOEi, with GEi given a weight of (₁ = 2/3) and RSOEi given a weight of (₂ = 1/3), when such data is available.

Fiscal Indicators

●

69

FFGi a₁GEi a2 RSOEi (2/3)(1003500X₁²) (1/3)(100X₂) Obviously, the choice of parameters in equations in these equations is arbitrary, as are the weights ₁ and ₂. Consequently, the FFGi measure is a very arbitrary index. Ideally, in measuring the burden of government expenditure programs we would like to use the concept of the marginal cost of public funds. Browning (1976) first popularized this notion and demonstrated its usefulness as a measure of the social opportunity cost of government spending. 3.2 World Bank Doing Business (DB) Indicators The World Bank computes an “ease of doing business index” that ranks world economies from number 1 to number 155. Their index is computed as a ranking of the simple arithmetic mean of country percentile scores in each of 10 country rankings on component indicators. Using this index, they identify countries that are reforming in significant ways and illustrate how those reforms facilitate economic development. The 2006 index included the following components and subcomponents, listed in table 4.5. Since the index is a simple arithmetic mean of country percentile scores in each of the 10 component areas, there is no need to statistically analyze the subcomponents of the index. The DB index for country i is simply the mean of the 10 indicators Xij (j = 1 . . . 10): 10

DBi 1 10 j 1 Xij

∑

Of course, the first question that comes to mind is why use equal weights? The components of the indicator measure processes for inputs (including labor, physical, and financial capital) and outputs. One might expect the DB indicator to weight the components by their economic contribution to the firm (e.g., share of profit). For our purposes in analyzing indices of fiscal reform in this chapter, we will focus particularly on the paying taxes component of the World Bank. The paying taxes component of the index is computed in a fairly rigorous manner. DB reports the tax that a prototypical medium-sized

70

●

John E. Anderson

company must pay or withhold in a year. It also reports the administrative burden involved in paying taxes, measured by the amount of time it takes to file taxes and the number of payments necessary. Financial statements and information on detailed assumptions about the prototypical firm are given to tax experts in each country. Those experts then compute the total taxes for the firm at all levels of government (including corporate income tax, personal income tax, value added tax, property tax, property transfer tax, dividend tax, capital gains tax, financial transactions tax, waste collection tax, and vehicle and road tax) and report the time required and the number of payments necessary. Using the World Bank “Doing Business in 2006” database, I regressed their measure of the ease of paying taxes on three measures of the tax system; first, the amount of time it takes for a representative company in each country to make all of the necessary tax payments, measured in hours; second, the number of tax payments required in a year for the representative firm. Finally, a measure of the total tax rate including all of the taxes paid by the firm in a year, expressed as a percent of the firm’s profit. Table 4.6 reports the results of two estimated regression models. The dependent variable EASE is measured as a percentile ranking, with countries ranked from easiest (first percentile) to most difficult (ninetyninth percentile). Hence, the smaller the EASE measure the easier it is to pay taxes in a country and the greater the EASE measure the more difficult it is. Two models were estimated with the first using TIME and PAYMENTS as explanatory variables and the second model adding TTR. In both models the estimated coefficient on TIME required to make tax payments is positive and significant, indicating that the more time it takes to pay taxes in a country, the more difficult is the taxpaying process in that country. Similarly, in both models the PAYMENTS coefficient is positive and significant, indicating that the greater the number of payments required in a year, the more difficult the taxpaying process in the country. Finally, Model 2 adds the total tax rate (TTR) variable, whose estimated coefficient is also found to be positive and highly significant, indicating that the larger the total tax burden relative to firm profit the more difficult it is to pay taxes in a country. Note that the addition of the TTR variable, controlling for the overall level of taxation, does not affect the signs or relative magnitudes of the TIME and PAYMENTS variable coefficients. The explanatory variables are jointly significant in both models and explain 63% of the variation in EASE in Model 1 and 72% in Model 2. Consequently, these results indicate that the DB index measure for ease of paying taxes is inversely

Fiscal Indicators

●

71

related to the amount of time spent in completing tax forms, the number of tax payments required, and the total tax rate. As such, it is a useful indicator of the complexity of the tax code and tax administration systems across countries. Incidentally, it should be noted that analysis of this data also reveals that countries imposing higher tax burdens on firms do not make it easy for those firms to pay taxes. A simple regression of the TTR on a constant and the EASE indicator provides an estimated coefficient for the EASE variable of 0.97, which is significantly different from 0, but not from 1. Hence, we cannot reject the hypothesis that the tax burden is higher despite the difficulty of imposing taxes on firms. Countries in the DB database are not extracting greater taxes by making it easier for firms to pay. 4 Summary and Conclusions What do these indicators tell us? And how shall we use them? Our review of fiscal indicators has covered both indicators used to assess progress in transition countries and indicators that more generally inform us about fiscal conditions across the world’s countries. We consider lessons learned from both classes of indicators. 4.1 Fiscal Reform Indicators We have analyzed the Martinez-Vazquez and McNab CRI and ORI, the ITPR, and the Hellman et al. (2003) ICE. These indicators are all subjective, although they are loosely based on objective data. In each case, the indicator is constructed using some objective data, but the choice of what data is used and the weighting scheme employed in constructing the index involve subjective judgments. That being said, however, it is essential that the data sources and the weights be explicit so that researchers using these indices know what goes into each index. In some cases, the indices are not independent. The MartinezVazquez and McNab ORI and CRI indices are a clear example. The ORI measure is constructed by grouping countries according to their CRI measures. Hence, ORI and CRI are correlated and should not be used simultaneously in any regression analysis. Furthermore, since there is greater information content in the CRI measure, it dominates the ORI measure for analytic purposes. In other cases, the indices are not transparent in the sense that the weights applied to the objectively measurable factors used in construction

72

●

John E. Anderson

of the indices are not made explicit. This is the case with the Ebrill and Havrylyshyn ITPR measure. The broader ICE developed by Hellman et al. (2003) measures a wide range of indicators of the strength of state capture and inf luence in transition economies. This index combines six measures of state capture, equally weighting the measures. This approach is rather ad hoc and would be improved by nonequal weighting where the weights are determined through empirical research on the determinants of state capture. 4.2 Fiscal Conditions Indicators Indicators of fiscal conditions across a wide range of countries have become increasingly popular in recent years. In this chapter we have examined aspects of the Heritage Foundation EFI and the World Bank/ EBRD DB indicators. Within the EFI, we examined the FF and FFG measures. The FF measure is more sophisticated than most, as it is designed to be nonlinear based on our knowledge that the excess burden of taxation rises with the square of the tax rate. Even so, parameter choices in the FF equation are arbitrary and therefore the scaling of this measure is arbitrary. Furthermore, this measure combines average and marginal tax rates that is not theoretically consistent. The FFG measure is based on two measures: (1) government expenditures as a percentage of GDP and (2) revenues generated by stateowned enterprises and property as a share of government revenues. Together, these two measures make an odd combination as they have completely different reference points (GDP and total government revenues), making the combination of the two measures an odd measure. Furthermore, the weights used in combining these two measures (2/3, 1/3) are completely arbitrary. 4.3 Suggestions for Rationalizing Fiscal Indicators Depending on the particular concerns being addressed, there are several theory-based measures and indices that can be used as a starting point in the development of fiscal indictors. If the policy concern is the overall tax burden, then simple computations of total tax revenue as a share of GDP will generally suffice. On

Fiscal Indicators

●

73

the expenditure side, the ratio of total government expenditures to GDP ref lects the burden of government spending programs in the aggregate. Of course, these measures are only approximate indicators of the size of the revenue and expenditure burdens placed on the economy by governments. Yet, they capture the overall size of the public sector relative to the output of the economy and have the advantage of being readily understood and compared with other countries. If the policy concern is the disincentive effects of high marginal tax rates, then an indicator that is based on the theory of excess burden is required. That requires, at the least, a nonlinear (quadratic) formulation. Beyond that, information on specific tax rates, and elasticities of demand and supply would be useful to refine the measure. If the policy concern is the marginal cost of public funding, the measure to use is the marginal cost of funds. This measure computes the marginal cost of an additional dollar of public expenditure program spending, including the distortionary cost of raising the public funds to finance the expenditure. If the policy concern is the degree of progressivity or regressivity of a tax system, the natural index to use is the Suits (1977) index. The Suits index is based on the concept of the Gini coefficient that is used to measure the degree of income inequality. Suits modified the Gini coefficient concept in order to measure the degree of inequality in tax burdens compared to income, allowing for both progressive and regressive tax systems. That is, in the Suits index context the equivalent of the Lorenz curve can bow either below or above the line of perfect equality, ref lecting progressive or regressive taxes, respectively. While the Suits index is most often used in the public finance literature to measure the progressivity of a single tax, it can be applied to an entire tax system provided that the appropriate data is available for its computation. Indeed, in the original Suits (1977) article the author used data on six major tax revenue sources at both the federal and state/ local levels to illustrate computation of the index. The necessary data are simply the amounts of income and tax revenue collected, arranged by quantiles. The finer the quantiles available, the more precisely the Suits index may be computed. For purposes of illustration Suits (1977) simply used deciles. In order to analyze whether significant changes take place in the degree of progressivity of a tax system it is necessary to compute confidence intervals using a method such as that developed in Anderson, Roy, and Shoemaker (2003).

Appendix Table 4.1

CRI and ORI Indices of Fiscal Reform

Country

CRI

ORI

Country

Advanced Tax Reformers

CRI

ORI

Low Intermediate Tax Reformers

Czech Republic

3

0

Bulgaria

10

2

Estonia

3

0

Kyrgyz Republic

11

2

Latvia

4

0

Turkmenistan

11

2

Croatia

3

0

Albania

12

2

Mean

3.25

0.00 Romania

13

2

Russian Federation

13

2

Tajikistan

13

2

Mean

11.75

2.00

High Intermediate Tax Reformers Slovak Republic

5

1

Slow Tax Reformers Georgia

13

3

Hungary

7

1

Azerbaijan

14

3

Lithuania

7

1

Armenia

14

3

Poland

7

1

Uzbekistan

14

3

Kazakhstan

8

1

Moldova

15

3

Slovenia

8

1

Belarus

Mean

7.00

1.00 Mean

Source: Martinez-Vazquez and McNab (1997a).

Table 4.2

Index of Tax Policy Reform

Country Armenia Azerbaijan Belarus Estonia Georgia Kazakhstan Kyrgyz Republic Latvia Lithuania Moldova Russia Tajikistan Turkmenistan Ukraine Uzbekistan

TPR Index 4 3 2 5 4 4 3 5 5 4 2 3 1 3 2

Source: Ebrill and Havrylyshyn (1999).

17

3

14.5

3.00

Table 4.3

Index of the Capture Economy (ICE), 1999

Country

ICE

Capture Risk Class

Albania Armenia Azerbaijan Belarus Bulgaria Croatia Czech Republic Estonia Georgia Hungary Kazakhstan Kyrgyzstan Latvia Lithuania Moldova Poland Romania Russia Slovak Republic Slovenia Ukraine Uzbekistan

16 7 41 8 28 27 11 10 24 12 30 30 30 11 37 12 21 32 24 7 32 6

Low Low High Low High High Low Low High Low High High High Low High Low High High High Low High Low

Mean

20

Source: Hellman et al. (2003).

Table 4.4 Heritage Foundation Fiscal Freedom (FF) and Freedom from Government (FFG) Scores, 2007 Fiscal Freedom (FF)

Freedom from Government (FFG)

Top 5 UAE, Qatar, Kuwait (tie) Bahrain and Saudi Arabia (tie) Oman Bahamas Paraguay

Top 5 99.9 99.6 99.0 98.3 97.8

Guatemala Haiti El Salvador Guinea Costa Rica

96.4 95.2 95.1 92.4 92.3

84.8 84.7 84.0 84.0 83.7

Middle 5 Romania Bolivia and Fiji (tie) Egypt Swaziland Ireland

74.9 74.3 73.6 73.3 73.1

62.8 62.2 57.7 55.2 53.6

Denmark France Sweden Libya Cuba

Middle 5 India and Iran (tie) Taiwan Philippines Canada Mauritania Bottom 5 Cuba Belgium Chad Denmark Sweden

Bottom 5 32.1 32.0 31.5 23.5 10.0

Source: Heritage Foundation (2007), data available at http://www.heritage.org/index Note: Results are reported in this table for countries with complete data on these two factors.

Table 4.5

Doing Business (DB) Indicator Components

Doing Business Component

Subcomponents

Starting a business

Procedures, time, cost, and minimum capital to open a new business

Dealing with licenses

Procedures, time, and cost of business inspections and licensing (construction industry)

Hiring and firing workers

Difficulty of hiring index, rigidity of hours index, difficulty of firing index, hiring cost and firing cost

Registering property

Procedures, time, and cost to register commercial real estate

Getting credit

Strength of legal rights index, depth of credit information index

Protecting investors

Indices on the extent of disclosure, extent of director liability, and ease of shareholder suits

Paying taxes

Number of taxes paid, house per year spent in preparing tax returns and total tax payable as a share of gross profit

Trading across borders

Number of documents, number of signatures, and time necessary to export and import

Enforcing contracts

Procedures, time, and cost to enforce a debt contract

Closing a business

Time and cost to close down a business and recovery rate

Table 4.6

Regression Analysis of the Doing Business Ease of Paying Taxes Indicator

Variable

Coefficient a

Variable

Coefficient

constant

0.2092 (0.0268)

constant

0.1531a (0.0298)

TIME

0.0003a (5.30E-05)

TIME

0.0002a (4.88E-05)

PAYMENTS

0.0057a (0.0006)

PAYMENTS

0.0050a (0.0006) 0.1581a (0.0285)

TTR R²

0.6304

R²

0.7192

F

148.5300a

F

148.6625a

Note: Standard errors are reported in parentheses. a significance at the 1% level or less.

78

●

John E. Anderson

References Acemoglu, D. and S. Johnson. 2003. Unbundling Institutions. Journal of Political Economy 113 (5): 949–995. Anderson, J. E. 2005. Fiscal Reform and Its Firm-Level Effects in Eastern Europe and Central Asia. Conference paper presented at the ACES/ASSA meetings, Boston, MA. Anderson, J. E. and L. Carasciuc. 2004. Tax Evasion in a Transition Economy: Theory and Empirical Evidence from the Former Soviet Union Republic of Moldova. Progress in Economics Research 8, Nova Science Publishers. Anderson, J. E., A. G. Roy, and P. Shoemaker. 2003. Confidence Intervals for the Suits Index. National Tax Journal 61 (1): 81–90. Bird, R. M. and S. M. Banta. 1999. Fiscal Sustainability and Fiscal Indicators in Transitional Countries. USAID conference paper, Istanbul, Turkey, Barents Group. Browning, E. K. 1976. The Marginal Cost of Public Funds. Journal of Political Economy 84 (2): 283–298. Carter, J. K. 2007. After the Fall: Globalizing the Remnants of the Communist Bloc. Federal Reserve Bank of Dallas Economic Letter 2 (2): 1–8. Ebrill, L. and O. Havrylyshyn. 1999. Reforms of Tax Policy and Tax Administration in the CIS Countries and the Baltics. Washington, DC: International Monetary Fund. Friedman, E., S. H. Johnson, D. Kaufmann, and P. Zoido. 1999. Dodging the Grabbing Hand: The Determinants of Unofficial Activity in 69 Countries. Conference paper, The Nobel Symposium in Economics—The Economics of Transition, Stockholm, September 1999. http://ssrn.com/abstract=194628. Hellman, J. S., G. Jones, and D. Kaufmann. 2003. Seize the State, Seize the Day: State Capture and Influence in Transition Countries. Journal of Comparative Economics 31: 751–773. Heritage Foundation. 2007. Index of Economic Freedom. Washington, DC. International Bank for Reconstruction and Development (World Bank). 2006. Doing Business in 2006: Creating Jobs. Washington DC: World Bank. Johnson, S., D. Kaufmann, J. McMillan, and C. Woodruff. 2000. Why Do Firms Hide? Bribes and Unofficial Activity after Communism. Journal of Public Economics 76 (3): 495–520. Lorie, H. 2003. Priorities for Further Fiscal Reforms in the Commonwealth of Independent States. International Monetary Fund Working Paper WP/03/209. Washington, DC. Martinez-Vazquez, J. and R. McNab. 1997a. Tax Reform in Transition Economies: Experience and Lessons. Georgia State University International Studies Program Working Paper 97–6. ———. 1997b. Tax Systems in Transition Economies. Georgia State University International Studies Program Working Paper 97–1. ———. 2000. The Tax Reform Experiment in Transitional Countries. National Tax Journal 53(2): 273–298.

Fiscal Indicators

●

79

Mitra, P. and N. Stern. 2002. Tax Systems in Transition. World Bank Working Paper 25063. Washington, DC. Stepanyan, V. 2003. Reforming Tax Systems: Experience of the Baltics, Russia, and Other Countries of the Former Soviet Union. International Monetary Fund WP/03/173. Suits, D. B. 1977. Measurement of Tax Progressivity. American Economic Review 67 (4): 747–752. Summers, V. and K. Baer. 2003. Revenue Policy and Administration in the CIS-7: Recent Trends and Future Challenges. Washington, DC: International Monetary Fund. Tanzi, V. and G. Tsibouris. 2000. Fiscal Reform over Ten Years of Transition. International Monetary Fund WP/00/113. Tanzi, V. and H. H. Zee. 2000. Tax Policy for Emerging Markets: Developing Countries. International Monetary Fund WP/00/35. Tresch, R. W. 2002. Public Finance: A Normative Theory, 2nd ed. New York: Academic Press.

This page intentionally left blank

CHAPTER 5

A New and Better Measure of Capital Controls Pariyate Potchamanawong, Arthur T. Denzau, Sunil Rongala, Joshua C. Walton, and Thomas D. Willett

1 Introduction There has long been a substantial controversy about the role of capital controls. The designers of the Bretton Woods postwar international monetary system anticipated that capital controls would be a permanent feature of the system. Over time, however, views of many economists and officials changed with capital controls becoming seen as having greater costs and freedom of capital f lows as having greater benefits than before. These changes in view resulted in major shifts toward the liberalization of capital accounts, first in the industrial countries and then in many developing countries. With the recent rash of currency crises in emerging market countries in the 1990s, considerable support for limiting the freedom of international capital f lows reemerged.1 Many blamed capital mobility, either directly or indirectly, rather than national policies, for the crises. During the 1997 crises, India and China were largely left untouched, and some argued this was because they had substantial controls; Willett et al. (2004), however, present estimates that these countries’ fundamentals were sufficiently strong that they would not have faced substantial speculative attacks even in the absence of controls. Scholars such as Leblang (2001) and Glick and Hutchison (2000b) conducted empirical studies that found that countries with capital controls are more prone to currency attacks. Other studies such as Quinn (1997) and Rodrik (1998) have looked at

82

●

Potchamanawong, Denzau, Rongala et al.

the relationship between capital controls and economic growth and found quite different conclusions, at least in part because they used different measures of capital controls. In this chapter, we offer a brief survey of a wide variety of measures of controls currently in use, and introduce a new and improved measure. 2 2 Applications of Capital Control Indices One of the most prominent questions regarding the effects of capital controls is their relationship to economic growth. There are competing and opposite theories regarding the relationship between capital mobility and economic growth. On the one hand, capital mobility allows a country with limited capital resources to borrow from abroad, and thus be able to smooth individuals’ and firms’ consumptions, leading to higher growth. On the other hand, increased capital inflows put pressure on domestic asset prices and on exchange rate appreciation, which could lead to current account deficits and eventually slow growth. To date, empirical studies of the relationship between the imposition of capital controls and economic performance has been mixed, with little conclusive evidence supporting either a positive or a negative relationship. Capital controls are also important objects of study with respect to trade; as with growth above, competing theories exist. One might reasonably expect that capital controls would be negatively associated with trade, as the lifting of controls would allow greater inf lows of capital, some of which would presumably be used to finance export production. That being said, it may also be the case that, as capital controls are imposed, those who wish to participate in the international capital market may seek means by which the controls can be evaded. Trade may be one of those means, through such tactics as misinvoicing or the manipulation of payment timing. Capital account liberalization is a critical subject for researchers studying the sequencing of liberalization policies, due to the potential for such liberalization to have dramatic effects—both positive and negative—on the domestic economy. A prominent research question in this area is whether or not the goods market must be liberalized before similar liberalization in the financial market can occur. The most generally held hypothesis is that trade openness does in fact precede financial openness, and recent studies have supported this finding, though in some cases joint causality is found.3 There is also some disagreement over whether capital controls help to protect countries from currency and financial crises, or whether they

New Measures of Capital Controls

●

83

invite such crises. Those arguing the former point out that controls serve to prevent volatile capital f lows from destabilizing the economy. Those arguing the latter posit that the imposition of capital controls may itself send a signal to investors that the country is concerned that its economic fundamentals are in bad enough shape that it is worthwhile to prepare for a crisis. 3 Technical Survey of Capital Controls Indices Evaluating the accuracy of capital control indices is difficult, for reasons ranging from data availability to coding methodology. This is ref lected in the fact that a number of measures purportedly measuring capital controls show very different results. The capital account is not just one single entity but consists of many components. The term “capital controls” can be applied to a variety of policies ranging from registration to taxation to multiple exchange rates to complete prohibition. Most countries do not place controls on the entire capital account but rather on particular components. Until recently, most empirical studies of the effects of capital controls have used simple 0–1 binary measures based on the information reported in the IMF’s Annual Report on Exchange Arrangements and Exchange Restrictions (AREAER).4 If any types of capital controls were reported then the proxy for controls was coded with a value of 1. Clearly this was not a very satisfactory measure, as there are many different types of capital f lows and many different types of controls. Furthermore, controls vary widely in the severity of their restrictions, a characteristic that we shall refer to in this chapter as intensity. Many of the measures to be discussed in this chapter fail to adequately capture intensity, with binary measures being the most egregious of these. With binary measures of the type described, Chile’s moderate tax on capital inf lows of financial capital is treated the same as North Korea’s stringent controls on both inflows and outflows of most types of capital. Empirical studies using such crude measures cannot be expected to shed much useful light on the various controversies about the effects of different types of capital f lows. Scholars and officials studying issues concerned with capital inf lows and outf lows have developed strong evidence that not all types of capital f lows behave in the same way or have the same types of effects. For example, direct investment flows typically display far less volatility during crises than do bank and portfolio flows (see the analysis and references in Sula and Willett 2007). Nor would we necessarily expect

84

●

Potchamanawong, Denzau, Rongala et al.

controls on capital inf lows and those on outf lows to have the same effects on the probabilities of currency and financial crises. In general, economists tend to be much more critical of controls on capital outf lows than of controls on inf lows, especially when the latter are based on prudential regulatory grounds and concerns with the potential disruptive effects on recipient developing countries of sudden stops in international financial f lows (on these issues see Calvo 2003, etc.). Economists have pointed to good reasons why capital controls might be positively associated with currency crises (see Bartolini and Drazen 1997; Wihlborg and Willett 1997). The prevalence of controls may be a signal of bad versus good economic policies. Thus the finding from studies such as Leblang (1997) and Glick and Hutchison (2000b) of positive associations between their measures of controls and the occurrence of currency crises is not surprising. A closer look at the evidence, however, suggests some problems with this analysis. Following the logic of the signaling hypothesis, we would not expect capital control countries to be major recipients of financial capital inflows since the controls would be signaling bad economic policies. Both of the previously mentioned studies used binary, that is, 0–1 measures of capital controls that classified the Asian crisis countries as having controls. This can help explain the overall findings of positive correlations between controls and crises, but it does not fit the fact that prior to 1997 the Asian crisis countries had enjoyed large financial inflows based in part on perceptions that they were following good economic policies. The coding methods for these variables also do not coincide with the general perception that the early and mid-1990s had been a period of substantial capital account liberalization in most of these countries. Leblang (1997) and Glick and Hutchison (2000b) were not wrong in treating these countries as still having some level of capital controls, but such a 0–1 measure cannot hope to be very useful in helping to understand most of the relevant issues involved in the debate about capital controls. Nitithanprapas, Rongala, and Willett (2002) found that the countries that were affected by the Asian crisis had relatively open capital accounts according to most observers, but were classified by most capital control measures as being largely closed. The high levels of capital inf lows into these countries before the crisis suggest that the anecdotal evidence cited by Nitithanprapas et al. was more accurate than the major capital controls indices. It also suggests the importance of more systematically distinguishing between controls on capital inf lows and on capital outflows. Rongala (2003) offers a detailed qualitative evaluation of the level of controls in seven Asian countries and concludes

New Measures of Capital Controls

●

85

that the capital control indices do tend to overstate—often substantially—the level of controls in these countries. Fortunately, in recent years a number of new measures of capital controls have become available that improve in various ways on the old 0–1 measures. These give finer gradations of the degree of restrictiveness of capital controls, typically either by coding the extensiveness of controls across different types of capital f lows (which we characterize in this chapter as the “breadth” of controls), or by coding the stringency of the controls (which we characterize as “intensity” of controls). In the following sections we briefly discuss these new measures and then present a still newer—and, we believe, superior—measure that combines both the breadth and intensity dimensions and also separately codes restrictions on capital inf lows and outf lows. Until recently, the primary capital controls measures used were “binary measures,” which are the 0–1 dummy variables mentioned above that indicate the existence or nonexistence of any controls on capital flows and transactions. A notable variation of this type of measure is the Glick-Hutchison (2000b) measure. Glick and Hutchison’s study focused on a panel dataset of 69 developing countries over the 1975– 1997 periods. Prior to 1996, when the IMF adopted a new AREAER format, Glick and Hutchison’s variable takes the standard 0–1 format capturing the existence of any controls on any form of capital f low. Following the IMF format change, the authors altered their coding to capture the existence of controls on more than 5 of the 13 AREAER capital transactions types. The control index thus still takes a binary form, but the pre-1996 and post-1996 values have different meanings: the former index is based on the imposition of any capital controls while the latter is based on the number of types of capital transactions restricted. Counting how many capital transactions are restricted does not provide an accurate picture of capital controls since some countries might have the same number of restrictions but on different capital transactions; the measure also treats countries with restrictions on as few as 5 of 13 capital categories as being the equivalent of fully closed capital accounts. The Glick-Hutchison (2000b) measure was an interesting wrinkle on the heretofore traditional binary measures, replacing the previous “none” versus “any” capital controls dichotomy with a more intellectually intuitive “not many” versus “a substantial number” dichotomy. For all of its appeal, however, the 0–1 construction of the measure makes it no less blunt a tool than its predecessors. Fortunately, researchers now have an array of more sensitive tools at their disposal.

86

●

Potchamanawong, Denzau, Rongala et al.

4 New Measures 4.1 Disaggregated Measures of the Extent of Controls These variables are typically constructed to capture how many types of capital f lows are “controlled” (i.e., at all) or “liberalized.” Where binary measures typically capture the existence of any controls on capital, disaggregated measures look at the number of controlled capital flow types. These measures therefore generally are presented as either counts or ratios of controlled capital f low types. While, relative to simple binary measures, these measures provide a clearer picture of the breadth of the capital controls imposed by a given country, they cannot capture the intensity of those restrictions, nor can they address incremental changes in controls within the f low type. 4.1.1 Brune et al. (2001) Brune et al.’s (2001) Capital Account Openness Index (CAOI) is created by summing the number of types of flows subject to controls from a selection of nine categories of current and capital f low types. The overall index has a range of possible scores from 0 (fully closed) to 9 (fully open) and is available for 173 countries over the period 1973– 1999. Although the CAOI includes the composition of controls on capital inflows and outflows, they analyze the index as a whole entity without considering the different impacts between controls on inf lows and outf lows. An additional drawback of simply summing the scores in each category is that the missing values in particular categories are scored as zeroes, implying controls in cases where controls may not actually be imposed. This could cause the CAOI to be biased toward greater capital account restriction. 4.1.2 Johnston-Tamirisa (1998) This method is the most disaggregated measure of capital control since it combines all the classifications (including all the subcategories) of the IMF’s AREAER, but unfortunately it is only available for one year. It also distinguishes capital inflows from capital outflows and further between the different types of transactions by assigning zeros or ones to the presence of controls in each subsection (i.e., purchases and sales locally by nonresidents, purchases and sales abroad by residents, to residents from nonresidents, etc.) within the main 13 capital transaction categories. The data is available only for 1996 and includes 45 developing and transition countries.

New Measures of Capital Controls

●

87

4.1.3 Rossi (1999) Rossi’s measure covers 15 developing economies over the period of 1990–1997; these countries include Argentina, Brazil, Chile, Colombia, India, Indonesia, Israel, Korea, Malaysia, Mexico, Peru, Philippines, South Africa, Thailand, and Venezuela. Rossi uses Johnston and Tamirisa’s capital control measurement as a starting point to measure both inf low and outf low capital transactions. Two indices of capital controls are calculated. After a slight modification to account for an alternative classification of some of the items in the capital account, the 1997 index is then coded back to 1990 using an algorithm that mimics the main episodes in the process of capital account liberalization, based on AREAER information. 4.1.4 Miniane (2004) Miniane’s (2004) capital restriction index is created by extending the 13 disaggregated capital transaction categories reported in the AREAER back from 2000 to 1983 for 34 developed and developing countries. Once again, the measure for a category takes a value of 1 if at least one restriction exists on that type of capital transaction, and takes a 0 otherwise.5 He then adds a dummy variable for the existence of dual or multiple exchange rates, and takes the average of the values across all the variables to create the capital restriction index. The Miniane measure does not distinguish between inf low and outf low transactions. There are some drawbacks of this method worth mentioning. First, as mentioned before, the binary coding method cannot capture the intensity of controls in each category; it can only capture changes from either having or not having controls to the opposite. Second, the details of regulation on capital market securities, money market instruments, collective investment securities, and derivatives and other instruments, for the previous format, are not stated. 4.1.5 Mody-Murshid (2005) The data cover 60 countries and the period of 1979–1999. The measurement of financial openness consists of four proxies for government restrictions that impact capital mobility. These four measures, which are available in the IMF’s AREAER, comprise (1) the openness of the capital account; (2) the openness of the current account; (3) the stringency of requirements for the repatriation and/or surrender of export proceeds; and (4) the existence of multiple exchange rates for capital

88

●

Potchamanawong, Denzau, Rongala et al.

account transactions. For each of these 4 factors, a 1 indicates a relatively open regime and a 0 otherwise. Then the value of the index is simply the sum of these four restrictions measures. As a result, the values of the index range between 0 and 4, where a 0 indicates that a country has closed capital and current accounts, places restrictions on its export receipts, and operates a system of multiple exchange rates; the value of 4 indicates a completely open regime. However, a problem might arise from simply summing four categories together when missing values are present since they will be counted as 0 as well. 4.2 Measures of the Intensity of Controls This type of measure seeks to capture not just the existence of controls on some/any types of f lows, but also the intensity of the controls. The measures described in the previous section all seek to measure, in a variety of ways, the existence of controls on various types of capital f lows. That having been said, most of them combine in some fashion binary variables at the f low type-level. As such, if a country should change the stringency of f lows on any type of capital transaction, the change will not be captured by the previously discussed variables unless the change involved a movement to or from complete liberalization of that particular type of f low. In order to address this problem, some researchers have created measures focusing primarily on the intensity of controls, rather than the mere existence of controls, on particular types of f lows. 4.2.1 Quinn (1997) Dennis Quinn (1997) made the first attempt to include the level of intensity in a capital restriction index. Quinn’s openness index is the combination of international agreement (0–2), current account transactions (0–8), and capital account transactions (0–4). The score ranges from 0 (fully controlled) to 14 (fully liberalized). The rules of coding for current and capital account transactions are identical, and are based on government’s approval and taxation requirements on the transactions. Quinn apparently duplicates the coding rule from a similar measure of current account openness, 6 which relies on tax information to distinguish between value of 1 (heavily taxed) and 1.5 (taxed). However, since taxes on capital transactions normally are not reported by countries, the coders had to use their individual judgments or find another source of information to replace the tax information. Quinn’s criteria are well-suited to the current account category but not to the

New Measures of Capital Controls

●

89

capital account side due to this lack of information on the taxation on the capital transaction. Quinn’s capital restriction measurement is coded as follows: 0

Approval of transaction is required but rarely given, and surrender of receipts is required; 0.5 Approval is required and sometimes given; 1 Approval is required and frequently given; or if approval isn’t required but capital is heavily taxed; 1.5 Approval isn’t required but capital is lightly or moderately taxed; 2 Capital transactions are not restricted. Quinn’s measure is based on an earlier version or the AREAER consisting of only two sections: Capital Receipts and Capital Payments; the coding rules are based on the aggregated information of the capital transactions. These rules are difficult to apply to the new post-1996 disaggregated AREAER format, since each subcategory of capital transaction may not be subject to the same policy; that is, a country might put more controls on money market transactions than portfolio investment transactions. As a result, it is difficult for a coder to decide what value to assign to particular levels of restrictions; this can in turn lead to inconsistencies in the dataset. Quinn partially ameliorates this problem by coding the index twice using two independent coders and cross-checking their findings. Following Quinn’s methodology, Van Den Handel (2002) increased the sensitivity of the measure to control for intensity by separately scoring inf low and outf low transactions and recombining them in the overall index. The new measure covers 48 countries from 1996 to 1999. The scale of capital control ranges from 0 (fully liberalized) to 4 (fully restricted). The rules of coding were as follows: 0 0.5 1.0 1.3 1.5

No restrictions Taxes Approval Approval and taxes Repatriation required or quantitative restrictions on institutional investors or on foreign participation 1.7 Unspecified combination of restrictions, or a combination of approvals, quantitative restrictions, or blocked f lows 1.9 Repatriation and graduated surrender requirements 2.0 100% surrender required

90

●

Potchamanawong, Denzau, Rongala et al.

Both the original Quinn measure and the Van Den Handel revision provide reasonably good approaches toward measuring intensity. It should be noted, however, that the overall Quinn measure may not be the most appropriate measure of capital account restrictions per se, as it is heavily skewed toward the financing of current account transactions; 8 of the 12 components of the overall index ref lect restrictions on financing the current account. It should also be noted that both the Quinn and Van Den Handel use essentially arbitrary coding methods for quantifying the relative intensities of the restrictions, though this may perhaps be forgiven due to the lack of theoretical or empirical bases for determining exactly how the weightings should be done. 4.3 Composite Measures 4.3.1 Chinn-Ito (2002) Chinn and Ito (2002) have created an openness index using principal component analysis of 4 main categories of controls, consisting of multiple exchange rates, restrictions on current account, share of a 5-year period of capital account restriction, and surrender of export proceeds to generate the openness index by calculating the score of the first principal component that has a mean of 0. Higher values indicate more openness. One advantage of this index constructed by Chinn and Ito is that it is available for 105 countries for the period 1970–1997. This index covers a large number of countries with a long range period of samples. Chinn and Ito also seek to address the overall intensity of the capital account restrictions through the consideration of four different types of restrictions in their analysis. While this is not an unreasonable approach, it is questionable whether the flow type binary variables used in the analysis are sensitive enough to generate meaningful results. It is conceivable that countries may regularly adjust the stringencies of flow type controls, but short of the moving to or from a complete absence of controls on a given f low type, the binary variable for that f low type registers no change. Because of this, the Chinn-Ito measure is agnostic to the intensity of restrictions on particular flow types; its approach to intensity relies on the number of types of flows restricted, though in a way that is different from the disaggregated measures described earlier. 4.3.2 Edwards (2005) Edwards (2005), created a capital mobility index by combining information of Quinn (2003) and Mody and Murshid (2005), with the data coverage

New Measures of Capital Controls

●

91

of 1970–2000, with 163 countries has a scale from 0 to 100, where higher numbers denote a higher degree of capital mobility; a score of 100 denotes absolutely free capital mobility. Missing values of the new index are imputed based on the following inputs: the two original indices (Quinn; Mody and Murshid), their lagged values, openness as measured by import tariffs collections over imports, the extent of trade openness measured as imports plus exports over GDP, and GDP per capita. Finally, country-specific data is used to revise and to refine the control measure created by the imputation procedure. This measure has the largest country sample coverage and longest range of year coverage. 4.4 Behavioral Measures We refer to measures that seek to indirectly measure the existence/ intensity of capital controls by looking at the observed capital flows or stocks themselves, or at other similar economic data, as “behavioral” measures. While such measures may enable the researcher to avoid the weighting problems faced by nearly all of the aforementioned measures, these measures may be less useful in econometric studies that seek to use the measures in conjunction with other variables that may have been used previously in the construction of the controls indices themselves. 4.4.1 Aizenman (2004) Aizenman proxies for financial openness using the sum of total capital outf lows and inf lows as a percentage of GDP. The capital f lows assessed include FDI, portfolio, and other investment capital. Their dataset covers 83 developing and Organization for Economic Cooperation and Development (OECD) countries from 1982 to 1998. They create this measure alongside an analogous measure of trade openness, and test to feedback between the two. 4.4.2 LMF Edwards (2006) utilizes the data on international assets positions from Lane and Milesi-Ferretti (2006). He computes the sum of total external assets plus total external liabilities as a proportion of GDP as a proxy for capital controls. The data covers 147 countries from 1970 to 2004. A high value denotes that the country is integrated to world financial markets (i.e., fewer controls on capital flows). This measure should perhaps be considered as a measure of capital mobility, rather than capital restrictions. However, it is included in this study for sensitivity analysis purposes.

92

●

Potchamanawong, Denzau, Rongala et al.

4.5 Subjective Measures While all measures of capital controls involve subjective elements to varying degrees, there are some measures for which the coding methodology turns heavily on the judgment of the coding researcher. One example of such a measure is the inf low controls proxy used in a 1999 study by Reinhart and Montiel; beyond the existence of capital controls, their measure also depends on whether or not a country also imposed restrictions on the foreign indebtedness of domestic institutions in a form considered by the researchers as “in excess of commonly used prudential measures.” Most of these measures are study-specific, which is to say that they are constructed specifically for use in a particular study. In such cases, the measures are perhaps of limited use to those seeking to consider capital controls more generally. 4.6 Event Study Measures Some researchers, most notably Peter Blair Henry (2000, 2006), have analyzed liberalization from an event study perspective, and have therefore found variables identifying the dates of liberalization episodes to be of more use than variables that look at the extent of controls imposed or lifted. Henry, for example, creates measures of both capital account and stock market liberalizations; he identifies liberalization dates from policy announcements or, failing that, the establishment of the first closed-end country fund. 4.7 A New Combined Measure of Extensiveness and Intensity As available data has increased in recent years, it has become possible to construct new, separate indices for outflow and inflow capital transaction. Following the methodologies of Quinn, Johnston and Tamirisa, and Miniane, these new indices could more usefully describe the effects of restrictions, since each type of control must by nature have different effects on different issues as it is being enforced. While one might think that capital inflows are generally preferable to outflows since they bring in capital for developing a country’s economy and lead to higher living standard of its people, this is only one side of the coin; it leaves out the potential for macroeconomic instability as a result of overwhelming capital surges followed by sudden reversals. This could wipe the accumulated wealth out within a short period of time when capital flows out of the country, as some Asian and Latin American countries experienced.

New Measures of Capital Controls

●

93

A new measure constructed by Potchamanawong (2007) combines both breadth and intensity of controls in a single index. The Potchamanawong measure draws on the new disaggregated AREAER reporting to construct a measure that takes into account both the existence of controls on particular types of capital f lows and the intensity of said controls. Separate indices are constructed for controls on inf lows and outf lows. The coding rules for this measure are as follows: the value in each capital transaction (except dual/multiple exchange rate arrangement) ranges from 0 to 1, with 0.25 intervals, with a higher value representing higher degree of restriction.7 0

Capital transaction is allowed freely (no restrictions); possibly requiring reporting or notification of authorities after transactions take place; 0.25 Prior approval is not required, but requires supporting evidence or registration. Transactions are required to be made through authorized banks or exchange houses; 0.5 Prior approval is not required, but quantitative restrictions exist, that is, limited ownership, or limited transfer amounts per period of time; 0.75 Prior approval is required before engaging in any transaction and is approved on a case-by-case basis;8 1.0 Flow type is not allowed or transaction is not permitted. The measure is intended to ref lect the costs an individual or firm has to bear in dealing with government agencies when trying to conduct capital transactions. These would be costs related to paperwork, gaining the necessary approvals before the transactions take place, and quantitative restrictions of capital transaction to individuals and enterprises. These processes and requirements could discourage and slow down the capital mobility both from inf lows and outf lows. 4.8 Constructing Separate Measures for Controls on Inflows and Outflows Besides capturing both the breadth and intensity of controls, this measure also separately codes controls on inflows and outflows, allowing for more versatility of use in research studies. However, due to limited availability of disaggregated information on capital transaction within AREAER, the indices could not be constructed for periods prior to 1995. The inf low and outf low measures do tend to be highly correlated, but not so much that we cannot test for whether the inf low and outf low

94

●

Potchamanawong, Denzau, Rongala et al.

measures have different effect.9 And indeed they do—at least on the probability of currency crises. In a test of the new measures, Potchamanawong analyzed the association of his new inflow and outf low measures with currency crises for 26 developing and emerging market countries over the period 1995– 2004. He found that controls on capital inf lows tended to a decreased probability of crises, while controls on outf lows tended to increase the likelihood of crises. This lends a measure of support both to the signaling hypothesis and to the hypothesis that prudential regulation of inf lows can help to prevent the onset of crises by limiting a country’s exposure to potentially volatile capital f lows. 5 Comparison of Measure Results While most of the commonly used measures of capital controls are constructed from information from the same source—the IMF’s AREAER as shown in table 5.1—the calculated values of the measures are highly sensitive to the specific information used and the calculation method. Different measures can vary substantially in their assessments of the levels and/or intensity of controls for a given country at a given time. Because of this, we find considerable differences across the measures. Table 5.2 provides a sampling of reported capital control measures for a selection of Asian crisis countries in 1996, the year just prior to the onset of the Asian currency crises.10 As illustrated in figures 5.1 through 5.4, we also see that there are substantial differences in the patterns of behavior shown by the different measures. For instance, in the case of Malaysia, the Potchamanawong index shows that the government had been trying to generally loosen restrictions after the crisis hit the country, especially controls on capital inflows; anecdotal evidence lends support to this. It also indicates that Malaysia raised control on capital outflows in 1998, right after the crisis. Miniane’s index does not reflect this situation, showing instead a flat level of controls during the crisis period. The Chinn-Ito control index shows that restrictions on capital flows had been increasing since 1996 and peak at 2000. The new Potchamanawong measure reflects the actual situation quite well, in comparison to the other measures. Similarly, the Potchamanawong indices show that Korea reduced its controls on capital flows significantly right after undergoing the crisis. Neither Miniane nor Chinn-Ito reveal significant reduction of capital control restrictions by Korea until 2001. The Potchamanawong indices reveal a downward slope of capital restrictions during 1995–2004. This pattern is different from the other indices that are almost flat lines over time. In our final example, the Potchamanawong inflow and outflow

New Measures of Capital Controls

●

95

indicators also illustrate that India has been gradually liberalizing its capital account since 1997. The liberalizing trend illustrated by the new measures fit the Reserve Bank of India’s intention of achieving full capital account convertibility, as shown in the report of the Committee on Capital Account Convertibility by the Reserve Bank of India (Tarapore Report) in 1997 (Kletzer 2004). However, the Edwards and Chinn-Ito measures show a sharp drop in 2000 with restrictions returning to the initial levels a couple of years later, while Miniane’s measure does not capture the changes in the controls of India at all. 6 Conclusion Since we have no unambiguous “true” measures of the severity of capital controls, there is room for experts to differ about which of the various measures offer the best picture of actual developments. Based on our review of the cases discussed above, we believe that the ClaremontPotchamanawong measure does the best job for the post 1995 period— but we are hardly unbiased evaluators. We invite other researchers to make their own judgments.11 Appendix

0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1

Malaysia

1995

1996

1997

1998

1999

2000

2001

2002

Year Incontrol Miniane Edwards Figure 5.1 Malaysia

Outcontrol N_Chinn-Ito LMF

2003

2004

0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1

Korea

1995

1996

1997

1998

1999

2000

2001

2002

2003

2004

2003

2004

Year Incontrol Miniane Edwards

Outcontrol N_Chinn-Ito LMF

Figure 5.2 Korea

0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1

India

1995

1996

1997

1998

1999

2000

2001

2002

Year Incontrol Miniane Edwards Figure 5.3 India

Outcontrol N_Chinn-Ito LMF

0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1

Mexico

1995

1996

1997

1998

1999

2000

2001

2002

Year Incontrol Miniane Edwards Figure 5.4

Mexico

Outcontrol N_Chinn-Ito LMF

2003

2004

Table 5.1

IMF AREAER Components

Restriction Type

Description of AREAER Category (Selected)

Controls on capital market securities

Covers shares or other securities of a participating nature, and bonds and other securities with an original maturity of more than one year.

Controls on money market instruments

Covers securities with an original maturity of one year or less, such as certificates of deposit, Treasury bills, and so forth.

Controls on collective investment securities

Covers share certificates or any evidence of investor interest in an institution for collective investment, such as mutual funds.

Controls on derivatives and other instruments

Refers to operations in other negotiable instruments and nonsecuritized claims not covered under the previous three items.

Controls on commercial credits

Covers operations directly linked to international trade transactions.

Controls on financial credits

Covers credits other than commercial credits.

Controls on guarantees, sureties, and financial backup facilities

Covers securities pledged for payment of a contract, such as warrants, letters of credit, and so on.

Controls on direct investments

Covers creation or extension of a wholly owned enterprise, subsidiary, or branch and the acquisition of full or partial ownership of a new or existing enterprise that results in effective influence over the operations of the enterprise.

Controls on repatriation of profits or liquidation of direct investment Controls on real estate transactions

Covers the acquisition of real estate not associated with direct investment. The investments of a purely financial nature in real estate or the acquisition of real estate for personal use.

Controls on personal capital movements Provisions specific to commercial banks and other credit institutions

Regulations that are specific to these institutions, such as monetary and prudential controls.

Provisions specific to institutional investors

For example, a limit on the share of the institution’s portfolio that may be held in foreign assets.

Multiple exchange rate arrangements

Table 5.2 Country

China Hong Kong India Indonesia Korea Malaysia Philippines Singapore Thailand

Capital Control Indices in 1996 by Country Potchamanawong Inflows

Outflows

0.54 0.04 0.63 0.50 0.25 0.46 0.37 0.12 0.31

0.68 0.04 0.64 0.50 0.52 0.43 0.43 0.11 0.48

Chinn-Ito Miniane IMF Matrix Quinn

0.85 0.00 0.85 0.06 0.85 0.36 0.55 0.00 0.62

0.77 0.08 0.92 0.85 0.85 0.85 0.85 0.23 0.69

0.83 0.08 1.00 0.92 0.92 0.92 0.92 0.25 0.75

0.63 0.00 0.50 0.25 0.38 0.38 0.25 0.00 0.63

Johnston Edwards

0.73 – 0.87 0.50 0.70 – 0.47 – 0.63

Note: Dual Exchange Rate Category is included in Potchamanawong Inflow and Outflow measures.

0.625 0 0.75 0.375 0.375 0.5 0.25 0 0.5

LMF Glick-Hutchison

0.98 0.11 0.99 0.96 0.98 0.90 0.95 0.69 0.94

– 0 1.00 1.00 1.00 1.00 1.00 0 1.00

100

●

Potchamanawong, Denzau, Rongala et al.

Notes 1. For examples of these debates and discussion of the evolution of views on capital controls see Abdelal (2007), Eichengreen et al. (1999), and Ries and Sweeney (1997). 2. For a more detailed survey and evaluation see Potchamanawong et al. (forthcoming). 3. See, for example, Aizenman and Noy (2004). 4. A summary of the components of capital account restrictions recorded in AREAER is given in table 5.2. 5. However, Miniane omits the control on personal capital movement category because of a lack of consistent information in past editions of the AREAER. 6. The decision rules for goods and invisibles payments and receipts are as follows. If all receipts or payments are necessarily surrendered or blocked, then X = 0. If transfers require approval (unless automatic), then X ≤ 1. If transfers require approval (usually automatic) and are heavily taxed, then X 1. If transfers are effected through the market mechanism and taxed, then X ≥ 1. The degree of taxation determines Y, where X 1 Y. If transfer are free, then X 2 (Quinn 1997, p. 544). 7. As stated, the Potchamanawong measure is based on the methodologies of Quinn and Miniane, and as such retains the drawback of Quinn’s measure that the coding schema is determined in an essentially arbitrary fashion. It may be that there are more optimal weighting rules may exist, but the means by which the rules may be determined is far from clear. 8. The existence of dual/multiple exchange rate arrangements is also given a value of 0.75. 9. While unable to “cure” the collinearity problem, Potchamanawong (2007) uses joint confidence regions to statistically separate out the different inf low and outf low effects. In general, the results of the analysis provide evidence of distinct effects of the inf low and outf low measures. 10. The values reported in table 5.1 have been standardized to vary between bounds of 0 and 1, and have been converted where necessary so that values closer to one ref lect higher reported levels of capital controls. 11. The Claremont-Potchamanawong measure for 26 countries from 1995 to 2004 is available free of charge from the Claremont Graduate University Web site.

References Abdelal, R. 2007. Capital Rules: The Construction of Global Finance. Cambridge, MA: Harvard University Press. Aizenman, J. and I. Noy. 2004. On the Two Way Feedback between Financial and Trade Openness. NBER Working Paper 10496.

New Measures of Capital Controls

●

101

Bartolini, L. and A. Drazen. 1997. Capital Account Liberalization as a Signal. American Economic Review 87 (1): 138–154. Brune, N., A. Guisinger, J. Sorens, and G. Garrett. 2001. The Political Economy of Capital Account Liberalization. Paper presented at 2001 Annual Meetings of the American Political Science Association, San Francisco, California. Calvo, G. A. 2003. Explaining Sudden Stop, Growth Collapse, and BOP Crisis: The Case of Distortionary Output Taxes. IMF Staff Papers 50, 1–20. Chinn, M. D. and H. Ito. 2002. Capital Account Liberalization, Institutions and Financial Development: Cross Country Evidence. NBER Working Paper 8967. Edwards, S. 2005. Capital Controls, Sudden Stops and Current Account Reversals. NBER Working Paper 11170. Eichengreen, B., M. Mussa, G. Dell’Ariccia, E. Detragiache, G. M. Milesi-Ferretti, and A. Tweedie. 1999. Liberalizing Capital Movements: Some Analytical Issues. IMF Economic Issue No. 17. Glick, R. and M. Hutchison. 2000a. Capital Controls and Exchange Rate Instability in Developing Economies. San Francisco: Center for Pacific Basin Monetary and Economic Studies Economic Research Department, Federal Reserve Bank of San Francisco. ———. 2000b. Stopping “Hot Money” or Signaling Bad Policy? Capital Controls and the Onset of Currency Crisis. EPRU Working Paper Series 00–14, Economic Policy Research Unit (EPRU), University of Copenhagen. Department of Economics (formerly Institute of Economics). Henry, P. B. 2000. Stock Market Liberalization, Economic Reform, and Emerging Market Equity Prices. Journal of Finance 55 (2): 529–564. ———. 2006. Capital Account Liberalization: Theory, Evidence, and Speculation. NBER Working Paper 12698. International Monetary Fund. Various years. Annual Report on Exchange Arrangements and Exchange Restrictions. Washington, DC: International Monetary Fund. Johnston, B. R. and N. T. Tamirisa. 1998. Why Do Countries Use Capital Controls? IMF Working Paper 98–181. Kletzer, K. M. 2004. Liberalizing Capital Flows in India: Financial Repression, Macroeconomic Policy and Gradual Reforms. In NCAER/Brookings India Policy Forum, 227–263. Washington, DC: Brookings Institution. Leblang, D. A. 1997. Domestic and Systemic Determinants of Capital Controls in the Developed and Developing World. International Studies Quarterly 41: 435–454. ———. 2001. To Devalue or to Defend: The Political Economy of Exchange Rate Policy. Manuscript Collection, Department of Political Science, University of Colorado, Boulder. CO. Miniane, J. 2004. A New Set of Measures on Capital Account Restrictions. IMF Staff Papers 51: 276–308. Mody, A. and A. P. Murshid. 2005. Growing Up with Capital Flows. Journal of International Economics 65: 249–266.

102

●

Potchamanawong, Denzau, Rongala et al.

Potchamanawong, P. 2007. A New Measure of Capital Controls and Its Relation to Currency Crises. Unpublished Ph.D. dissertation, Claremont Graduate University. Potchamanawong, P., J. C. Walton, and T. D. Willett. Forthcoming. Measures of Capital Controls and How to Improve Them. Forthcoming in International Interactions. Quinn, D. P. 1997. The Correlates of Change in International Financial Regulation. American Political Science Review 91: 531–551. ———. 2003. Capital Account Liberalization and Financial Globalization, 1890–1999: A Synoptic View. International Journal of Finance and Economics 8: 189–204. Reserve Bank of India. 1997. Report of the Committee on Capital Account Convertibility. Chairman, S. S. S. Tarapore. Electronic document. Available at http://rbidocs.rbi.org.in/rdocs/PublicationReport/Pdfs/14029.pdf. Ries, C. P. and R. J. Sweeney. 1997. Capital Controls in Emerging Economies. Boulder, CO: Westview. Rodrik, D. 1998. Who Needs Capital-Account Convertibility? In Should the IMF Pursue Capital-Account Convertibility? Princeton Essays in International Economics 207, ed. S. Fischer, R. N. Cooper, R. Dornbusch, P. M. Garber, C. Massad, J. J. Polak, and D. Rodrik, 55–65. International Economics Section, Department of Economics Princeton University. Rongala, S. 2003. Capital Controls and the Asian Currency Crisis. Unpublished Ph.D. dissertation, Claremont Graduate University. Rossi, M. 1999. Financial Fragility and Economic Performance in Developing Economies: Do Capital Controls, Prudential Regulation and Supervision Matter? IMF Working Paper 99–66. Sula, O. and T. D. Willett. 2007. Measuring the Reversibility of Different Types of Capital Flows. Working Paper 2006–08, Claremont Working Papers Series in Economics. Electronic document. Available at http://www.cgu.edu/ pages/1381.asp. Van Den Handel, C. 2002. The Quinn Openness Scale and the Magnitude of Restrictions on Current and Capital Accounts. Claremont mimeo. Wihlborg, C. and T. D Willett. 1997. Capital Account Liberalization and Policy Incentives: An Endogenous Policy View. In Capital Controls in Emerging Economies, ed. C. P. Ries and R. J. Sweeney, 111–136. Boulder, CO: Westview. Willett, T. D., A. Angkinand, and E. M. P. Chiu. Forthcoming. Testing the Unstable Middle and Two Corners Hypotheses. Forthcoming in Open Economies Review. Willett, T. D., E. Nitithanprapas, I. Nitithanprapas, and S. Rongala. 2004. The Asian Crises Reexamined. Asian Economic Papers 3 (3): 32–87.

CHAPTER 6

Measuring Welfare Bryan Roberts

1 Measuring Welfare: An Overview Welfare measures are arguably the most important member of the political economy indicator family. They are used routinely to craft economic and social policies and evaluate the success or failure of communities and societies. They are also beset with theoretical and methodological controversies and challenges. This chapter will review the basic approaches that have been taken to both conceptualizing and measuring welfare experienced at the level of the individual and the nation-state: real income, extended national accounts, composite social indicators (the physical quality of life and human development), subjective well-being, and objective happiness.1 Welfare has traditionally been viewed as being determined by the levels of various inputs (consumption of goods, services, and leisure) and preferences over these inputs. Real income and extended national accounts are exemplars of this tradition. Composite social indicators first emerged in the 1970s and generally seek to measure welfare outcomes as opposed to inputs. Sen’s capabilities approach appeared in the 1980s. In the 1990s and 2000s, attention has increasingly been paid to subjective well-being and objective happiness, measures that attempt to directly evaluate individuals’ well-being through responses to questions about its level. 2 During this review, four paradoxes are identified that raise interesting and troubling questions about these indicators. A concluding section summarizes the dramatic differences in what these indicators imply about actual welfare growth and degree of inequality in its distribution across countries. These paradoxes and differences help explain why we

104

●

Bryan Roberts

continue to lack a welfare measure that enjoys widespread consensus. Some thoughts are offered on how progress might be achieved. 2 Real Income and Classical Welfare Theory The effort to empirically measure welfare arguably began with the attempt to measure a key input into the determination of welfare: income. At the level of the individual, income is a measure of the person’s ability to purchase consumption today or, through savings, tomorrow. At the nation-state level, the appropriate income measure is national income, which is a measure of all income paid to factors of production in an economy. National income must also equal the total value of goods and services produced by an economy, and the national accounts describe both the production of goods and services by sector and the allocation of goods and services to various uses (consumption and investment). After sporadic efforts to estimate national income dating back to the seventeenth century, national accounts were developed in their modern form during the 1920s–1940s under the leadership of Simon Kuznets in the United States and Colin Clark, James Meade, and Richard Stone in the United Kingdom. Kuznets presented the first set of national accounts to the U.S. government in 1937, and the first official U.S. national accounts were published in 1947. By the 1950s, many developed and developing countries were estimating national income, and this movement was supported by international organizations such as the IMF and United Nations, which first published key guidelines on developing national income accounts in 1953.3 Income measures at the individual or national level aggregate together many different types of goods and services using market prices as weights. National income also aggregates together income flows of individual units into one “social” income f low. The challenges involved in comprehensively measuring and aggregating the output of goods and services of a national economy, and interpreting income measures in welfare terms at the individual and national levels, have preoccupied economists and national income statisticians for decades. We review theoretical and measurement issues involved in using income as a welfare measure in turn. 2.1 Real Income and Welfare The use of income measures to proxy for welfare is traditionally justified by appeal to the neoclassical theory of production, consumption,

Measuring Welfare

●

105

and general equilibrium. This theory assumes that individual households and firms maximize utility and profit under budget and production constraints in a competitive environment. Households are assumed to have preferences over inputs determining the level of welfare (e.g., goods, services, and leisure time) that satisfy certain mathematical properties, and these preferences imply the existence of a utility function that ranks outcomes according to an arbitrary ordinal scale. Neoclassical theory provides a basis for evaluating how welfare changes at the individual level when prices and income change.4 How to aggregate together the components of a welfare indicator is a problem of fundamental importance that will be encountered repeatedly in this chapter. An important strength of the income measure is that prices can be justified as aggregation weights on the basis of neoclassical theory: income might be the only aggregated empirical measure in the social sciences that has theoretically-justified aggregation weights. However, that same theoretical apparatus also reveals serious limitations in using real income as a welfare measure. Using real income as a measure of welfare at the national level introduces particularly difficult philosophical and methodological challenges. Many argue that it is illegitimate to make interpersonal welfare comparisons because utility is an arbitrary ordinal measure.5 Neoclassical theory can provide a basis for aggregation of individual preferences into community preferences, but this aggregation cannot serve to measure the level of social welfare. 6 Sen (1979) comprehensively reviews the issues involved in using real income as a welfare measure at the individual and national levels. The “named good” approach, in which quantities of the same commodity consumed by different individuals are treated as distinct goods, has yielded useful practical results. Sen shows that if one is willing to make the distributional judgment of rank order weighting, which places a welfare valuation on consumption of a commodity that varies inversely with the rank of an individual in the income distribution, then the product e*(1-G), where e is per capita income and G is the Gini coefficient for income, can be used to make social welfare comparisons.7 2.2 Comprehensive Measurement and Aggregation of Goods and Services In addition to the theoretical issues of welfare economics, limitations of the national accounts as an empirical measure of total production of goods and services have been the subject of much debate and

106

●

Bryan Roberts

research since the 1930s. 8 National income generally does not include any economic outcome presumably affecting welfare that is not traded in a legal market.9 These outcomes include the value of production by households, volunteers, and the informal/illegal economy, and monetized values of positive and negative externalities. The cost of measures that mitigate negative externalities such as crime and pollution are, however, included. Goods and services produced by government are not traded in markets and must be valued at input cost rather than at market prices. The national accounts measure f lows at one point in time and do not capture whether these f lows are sustainable. Methods such as hedonic price regression are often used to control for quality change in multi-attribute goods and services, but correction for quality change continues to pose theoretical and practical challenges. Special complications arise when the level of material welfare is compared across countries using national account measures. Developed and developing countries differ structurally, and these structural divergences can cause significant bias when using national income measures to compare country-level welfare outcomes. Kuznets first noted as early as 1947 that developing countries will typically have a greater proportion of economic activity located in the household and informal sectors that are not captured in national accounts, and this structural difference will cause national income measures to overstate the true difference in availability of goods and services between these countries.10 The relative price of nontraded goods and services also differs systematically across countries at different real income levels, and the use of commercial exchange rates to convert national income levels into a common currency metric understates the true purchasing power of income in poorer countries. Purchasing power parity exchange rates are now widely used to control this problem. Because of both theoretical and practical limitations associated with the national income measure, its creators never intended for it to be used as a welfare measure. The impetus for development of the national accounts was the Great Depression, emergence of Keynesian macroeconomic theory, and the need for planning during World War II, and monitoring and policymaking needs drove interest in the national accounts, not the desire to measure welfare outcomes.11 Nonetheless, per capita GDP and other income measures are now routinely used explicitly or implicitly as proxies for the level of welfare. Cross-country comparisons emphasize stark differences in per capita GDP. The number of people living on less than $2 per day, an income measure, is now

Measuring Welfare

●

107

cited as a standard measure of the level of world poverty. This leads us to our first welfare-measure paradox: Paradox 1: If real income is the preferred measure of welfare, then all of the many reasons why it cannot serve as a measure of welfare must be ignored. This paradox could be resolved if an alternative measure that better captured welfare outcomes existed. Are there any? 3 Extended National Accounts The desire to extend the national accounts to incorporate factors inf luencing welfare other than market-mediated production and consumption was present from the beginning. In his pioneering work on U.S. national income, Kuznets also estimated the value of nonwork (leisure) hours, and he had a major disagreement in the late 1940s with the U.S. Department of Commerce over their refusal to incorporate an estimate of household production and move the national income measure closer to a welfare measure.12 In spite of Kuznets’ urgings, it was not until the early 1970s that determined attempts to extend the national accounts began to be made, and these were confined almost exclusively to academic as opposed to governmental efforts.13 Extensions of national accounts take as a basic principle that the best approach to measuring welfare is to alter and extend the national account framework rather than abandon it. The first major effort to extend the accounts into a better welfare measure for the United States was made by Nordhaus and Tobin (1972). Eisner (1988) reviews in detail all national account extension efforts made for the United States through the 1980s. We examine here Nordhaus and Tobin’s effort in some detail and then summarize brief ly the other extensions. 3.1 The Measure of Economic Welfare William Nordhaus and James Tobin of Yale University developed in 1971 the measure of economic welfare (hereafter, MEW), which seeks to measure total household consumption and reorganizes and adds to the national accounts. MEW includes household consumption with the exception of private purchases of education and health care, which are considered to be investment items.14 Government expenditures are classified into components that contribute to household consumption, which is included in MEW, and components that contribute to intermediate production inputs, capital accumulation, and purchase

108

●

Bryan Roberts

of “regrettable necessities,” which are not. “Regrettable necessities” includes spending on defense and foreign affairs. MEW then develops estimates of three components that are not found in the national accounts: the value of household production of goods and services, leisure, and the value of disamenities of urbanization. The estimate of household production is based on an estimate of the amount of time in hours devoted to work and nonwork activities,15 and valuation of that time. The valuation of time is the single most important issue in the construction of MEW, because the value of household production is estimated to be greater than real (market) income. It is also important to note that determination of the value of time is also determination of the aggregation weight for household production in MEW. Nordhaus and Tobin appeal to optimal allocation of time at the margin to identify a value/weight, but even after making this assumption, they must decide whether to use the price of consumption or the wage as the opportunity cost of time.16 The value of disamenities of urbanization attempts to measure the impacts of pollution, congestion, and crime. Table 6.1 below gives values of per capita NNP, MEW, and several of MEW’s components for Nordhaus and Tobin’s preferred MEW values. The level of MEW is roughly twice as large as official national income, because of the large size of the estimated value of household production and leisure (roughly 50% and 100% of official national income respectively). NNP and MEW grew at the same rate in the immediate prewar era, but MEW grew at a much slower rate in the two decades following World War II because there was no growth in the monetary value of leisure. The low rate of growth in leisure is presumably due to the shift of women into the labor market.17 Given the significant difference between long-run growth rates in national income and MEW, it is odd that MEW received relatively little attention after the 1970s. Interest may have been inhibited by methodological issues that introduced significant uncertainty in the estimates, and controversy surrounding some of the assumptions (e.g., exclusion of defense expenditures).18 3.2 Other Extended Account Measures The late Robert Eisner devoted much of his professional career to extending the U.S. national accounts to improve measurement of welfare, productivity, stocks of capital, and other key economic variables. His 1988 survey article comprehensively reviews the reasons why the accounts need to be modified and extended, and the practical

Measuring Welfare

●

109

challenges that arise when attempting this.19 He then reviews results from several efforts to extend the accounts: Nordhaus and Tobin’s MEW, Zolotas’ “economic aspects of welfare,” Jorgenson and Fraumeni’s “full gross private domestic product,” Kendrick’s “adjusted GNP,” the Ruggles’ “integrated economic accounts,” and Eisner’s “total incomes system of accounts.” These efforts modify the official accounts in different ways and often differ in key assumptions (e.g., how to value household time). Table 6.2 below summarizes the long-run average annual growth rate in per capita income for four of these extended account alternatives and compares them to growth in official per capita aggregate GNP. 20 Long-run growth in official GNP is significantly higher than growth in all of the extended account measures. A basic conclusion is that the level of economic welfare as estimated by extended accounts is much greater than that of the official accounts because of the exclusion of household production and leisure, 21 but the growth in welfare is significantly less than that suggested by the official accounts. Given the degree of difference in levels and growth rates, it is surprising that there has been so little governmental interest in providing a public institutional home for extended account measures, particularly when the option of constructing satellite accounts is available. Even though Eisner (1988) concluded that the academic efforts to extend the accounts “have approached the limits of what is feasible with essentially private research” and urged that “the major resources of the government be put to the task,” little has happened in intervening years. Is government reluctance due to budget constraints, methodological concerns, ideological factors, and/or other issues?22 Better understanding of this issue is a key task for the field of political economy indicator research. 4 Composite Indicators and Human Development A very different approach to measuring welfare began to take shape in parallel with the early efforts on extending the national accounts. This approach was based on the idea that adjusting national income to obtain a composite monetary indicator of welfare is so riddled with difficulties and ambiguities that it should be abandoned in favor of using social indicators that more directly and unambiguously ref lect welfare outcomes. As the movement evolved, a new theoretical basis for such indicators was outlined by Sen (1985) that focuses on measuring capabilities. We review here two of the most prominent measures that emerged from this movement, the physical quality of life indicator and

110

●

Bryan Roberts

human development. 23 As in the case of extending the national accounts, the social indicator movement had to grapple with two basic questions: what individual variables will a measure include, and what methodology is used to aggregate them together? 4.1 The Physical Quality of Life Indicator (PQLI) The physical quality of life indicator was published by Morris David Morris of the University of Washington in 1979. 24 The PQLI resulted from concerns of development economists on the ability of national income to capture improvement in the areas of education, health care, and other basic human needs. Improvement in basic human needs in poor countries to some degree ref lects achieving growth with equity. PQLI would also identify countries that were striking exceptions to the positive relationship between real income and basic human need achievements. Morris states that “The PQLI has very limited objectives. It does not try to measure all ‘development’; nor does it measure freedom, justice, security, or other intangible goods. It does, however, attempt to measure how well societies satisfy specific life-serving social characteristics.” 25 However, Morris clearly considers the PQLI to be a welfare measure, albeit a limited one for which he makes humble claims. 26 PQLI aggregates together three indicators: infant mortality, life expectancy, and the adult literacy rate. The rationale for choosing these three indicators is based on clear criteria that are carefully described. Country values for each indicator are converted into indices that are scaled from 0 to 100, where 0 is an explicitly defined worst-case performance and 100 an explicitly defined best-case performance. PQLI indices measure how far a country is from an upper bound ideal value. These three indices are aggregated together using arbitrary weights of 1/3 for each index. Morris carefully discusses the weighting issue and shows that the PQLI is relatively insensitive to alternative weighting schemes. 27 He also states that “As long as the chosen system of weights remains constant, the index remains unimpaired. Determining some absolutely true ranking of countries is not the point of the index; moreover, the index is not measuring a race in which there is some importance to the particular placing of individual countries in relation to one another. Rather, the index shows where a country is placed in relation to the ultimate objective (i.e., a PQLI of 100) as well as how well a country is making progress toward that end.” 28

Measuring Welfare

●

111

One key property of the PQLI is that it aggregates three indicators each of which is bounded from below and above and is concave with respect to income. Adult literacy and infant mortality clearly have upper bounds of 100%, and life expectancy may have a natural maximum that cannot be exceeded. 29 Unlike the other welfare indicators examined so far, PQLI has an upper limit value of 100 that cannot be exceeded. PQLI and all of its components are also concave with respect to real income: these indicators rise with real income but at a diminishing rate. This concavity property has very important implications for evaluating welfare and its distribution across countries that will be discussed below. 4.2 Human Development In a series of publications in the 1970s and early 1980s, Amartya Sen developed the capabilities approach to understanding poverty, in which commodities are only a means to the end of being able to engage in desired activities that depend not only on command over commodities but a host of environmental and personal factors.30 As outlined in Sen (1985), commodities and their characteristics are one input into the determination of capabilities, which in turn is an input into the determination of the level of a person’s happiness. The United Nations subsequently collaborated closely with Sen to develop an empirical measure of capabilities, and the human development index resulted (HDI hereafter).31 The HDI is intended to measure capabilities at the nation-state level. Technically, the HDI is very closely related to the PQLI. Immediately after the appearance of PQLI in the late 1970s, Rati Ram of Illinois State University developed an index that combined the PQLI with per capita GNP using aggregation weights derived from principal components analysis.32 Technically, the HDI is a direct descendant of the PQLI and Ram’s subsequent work: it aggregates together life expectancy, the adult literacy rate, the school enrollment rate, and the utility of income, thus dropping one of PQLI/Ram’s indicators (infant mortality) and adding a new one (school enrollment rate).33 As in the case of the PQLI, index values of the four indicators are created using explicit worst and best case outcomes. Also as in the case of the PQLI, aggregation weights are set arbitrarily, at 1/3 for life expectancy and utility of income and 1/6 for the adult literacy rate and school enrollment rate. Utility of income is set at the natural log value of per capita GDP converted into internationally comparable PPP dollars. The log function is the most common specification of the utility

112

●

Bryan Roberts

function in the macroeconomics literature and is a natural choice to capture diminishing marginal utility with respect to income. Although the HDI is a cardinal measure, in public presentations and media discussions, the UN focuses on ordinal country rankings derived from the cardinal values. The UN has also not been shy in assertively offering the HDI as a preferred welfare measure to national income. The HDI has been criticized because it is arbitrary both in the choice of development indicators that it aggregates together and the weights used to aggregate them.34 Any welfare indicator will necessarily face these difficult challenges, including national income (which of course was never intended to be a welfare measure). Whether the HDI spans an appropriate set of capability indicators and how to identify nonarbitrary aggregation weights are open issues. There is one aspect of the PQLI and HDI indicators that has received very little attention in the literature. Any welfare indicator that is concave with respect to income will be more equally distributed than income. Both PQLI and HDI are highly concave with respect to income, as figure 6.1 below shows in the case of HDI. The distribution implications of this concavity are dramatic. Table 6.3 gives values for the ratio of country maxima to minima and also the Gini coefficient for GDP, HDI, and PQLI. The difference in the degrees of inequality in welfare distribution across countries is striking. According to the Gini coefficient, human development is distributed roughly five times more equally than GDP.35 In fact, it is almost never the case that a Gini coefficient as low as 0.1 is encountered in empirical analysis of inequality. 36 This leads us to our second welfare-measure paradox: Paradox 2: If the UN’s human development is the preferred welfare measure, then inequality in the distribution of welfare across countries is empirically negligible. It is important to realize just how provocative this paradox is. Concern about distribution of welfare across countries is the basic motivating factor for foreign aid and existence of the World Bank, regional development banks, and many of the UN’s agencies. If inequality in the distribution of welfare across countries is insignificant, then the key motivation for these institutions and policies is misguided. The social indicator movement has met with some success and has apparently enjoyed a resurgence in recent years. Human development is of course a highly publicized measure that enjoys a solid institutional home in the United Nations. According to one recent survey, “the new

Measuring Welfare

●

113

social indicators movement is both more modest and more dispersed, and is based on more diverse motivations and actors.”37 5 Direct Measures of Welfare: Subjective Well-Being National income and social indicators measure welfare indirectly by assessing actual outcomes that presumably ref lect the level of wellbeing. An entirely different approach can be taken by attempting to measure a person’s welfare level directly. Our understanding of how physical processes in the human brain determine levels of pleasure, pain and well-being, and our ability to measure them, are not yet advanced enough to create a welfare measure based on physical neurological measurements.38 However, psychologists and sociologists have administered surveys over several decades that ask people to evaluate their happiness or level of satisfaction with life in general. This measure, known as subjective well-being (hereafter SWB), ranges from a lower bound (0 or 1) to an upper bound (3, 5, 7, or 10): the lower bound is typically associated with the words “generally unhappy” or “very unhappy,” and the upper bound with “generally happy” or “very happy.” SWB surveys have been administered since the 1940s. Economists first took note of SWB upon publication of Easterlin’s classic article in 1974. However, until recently, the economics profession was generally skeptical that responses to SWB questions provide meaningful information on welfare.39 Over the past decade, the level of interest of economists in SWB has risen dramatically, and many articles have been published in the area now known as “happiness economics.”40 Proposals have been made to supplement or replace national income measures with “gross national happiness.” It is not entirely clear what, exactly, answers to SWB questions measure. Answers to SWB questions are inherently subjective, and respondents will assign individualized interpretations to what various points of the SWB question scale represent.41 SWB avoids entirely the selection and aggregation challenges faced by other welfare indicators. In fact, the happiness literature turns this approach on its head by regressing SWB on factors presumed to influence SWB using crosssectional or panel datasets. These factors typically include income, gender, marital status, nationality, education, age, and employment status.42 Many interesting findings have emerged from this literature. Perhaps the most striking finding is that changes in any of the factors that affect SWB have only temporary effects that dissipate within a few years. This phenomenon has been termed the “hedonic treadmill” or

114

●

Bryan Roberts

hedonic adaptation and has motivated a model of behavior in which individuals seek out positive yet temporary shocks to utility that do not result in any permanent improvement.43 Some studies suggest that increased inequality in income and status appear to have a permanent negative impact on SWB.44 Other interesting findings include (among others) significant and stable differences in national average SWB, estimates of the marginal utility of income, the cost of a recession and inflation in utility terms, impacts of catastrophic events such as terrorism, and disutility of airport noise.45 One of the most striking findings of the SWB literature is the Easterlin paradox. National average SWB can be constructed by averaging the responses of those surveyed in a particular country and year. When this is graphed over time, although national SWB f luctuates slightly from year to year, there is essentially no long-run trend in its level over decades.46 Figure 6.2 below shows that in the United States during 1947–2006, average SWB has remained unchanged even though real income has more than tripled.47 An appendix explores in more detail whether there is any trend in average national SWB over time, and the relation of SWB to income, using cross-country data. It is shown that there is no evidence that average nation-state happiness levels have trended significantly over time in the postwar era in either developed or developing countries, and that the relationship with real income is weak or nonexistent. Although more sophisticated econometric analysis does need to be done, the Easterlin paradox is so far strongly supported by available empirical evidence. Although SWB surveys only go back to the 1940s, it is not implausible that similar national average SWB values would have resulted if surveys had been carried out in 1900, 1850, or 1800.48 We are confronted with a startling implication, which is a strong version of Easterlin’s paradox: Paradox 3: If subjective well-being is the preferred measure of welfare, then the Industrial Revolution was irrelevant for human well-being. The Easterlin paradox is an important obstacle to widespread acceptance of SWB as a welfare measure. Some would accept that there has been no long-run trend in SWB over decades that have witnessed rapid economic growth and technological change. Others find it difficult to accept that SWB is a sound measure of welfare if it is uncorrelated over the longer run with any variable that a priori would be expected to impact welfare.49 The Easterlin paradox clearly raises some deep issues that have not been fully resolved. Explaining this paradox, and

Measuring Welfare

●

115

rationalizing how income is positively correlated with SWB in cross sections of individuals and countries but not with the average level of SWB in a country across time, is arguably the greatest challenge in SWB research today. There are other interesting questions and contradictions raised by the SWB literature. First, SWB is bounded between lower and upper limits, and the amount of well-being that a population can experience according to this measure is bounded by definition. If every member of the population was to report maximal happiness, then “gross national happiness” would equal 10*n, where n is the population. Is it plausible that welfare is bounded from above?50 Second, the finding that income and status inequality has permanent negative impacts on SWB raises an interesting policy conundrum. The logical policy implication of this finding is that in order to maximize societal welfare, income and status inequality should be eliminated. However, historical experience suggests that in complex industrial societies, such policies will result in poor incentives and economic stagnation. This should not matter in the longer run, as SWB research suggests that income shocks have only transitory impacts on welfare. However, the defining historical experience of the late twentieth century was the collapse of the Soviet Union and the abandonment of the economic system that it implemented for 70 years that increased equality at the expense of efficiency. How can this contradiction between policy implications and historical experience be resolved? Finally, the use of average SWB as a welfare measure has major implications for cross-country welfare comparisons and related policies. As in the case of human development, SWB is much more equitably distributed across countries than income. Using data on the average level of life satisfaction on a scale of 1–10 for 1994, the Gini coefficient for SWB is 0.03.51 This is essentially 0, so that there is no inequality in the distribution of SWB across countries. Results from happiness functions might be useful for the construction of composite welfare indicators, which have struggled with the issue of identifying appropriate rates of transformation and aggregation weights. National income and extended national accounts generally use market prices as weights, but indicators seeking to incorporate factors that are not traded in markets have often had little choice but to set weights arbitrarily. Can the coefficients on dependent variables in happiness function estimations be regarded as rates of transformation and be used as aggregation weights in the construction of traditional (composite) welfare indicators? SWB studies are estimating potentially useful coefficients. To cite only one example, Blanchf lower and Oswald

116

●

Bryan Roberts

(2000) estimate a happiness function whose coefficients suggest that the annual value of being in a married state is $100,000. Could this estimate legitimately be used in welfare and policy analysis? Results from happiness functions might be particularly useful for evaluating in welfare terms the impact of political and social variables whose impacts are difficult or impossible to monetize and include in an income indicator. 6 Direct Measures of Welfare: Objective Happiness The most recent major development in welfare measurement is the attempt to measure “objective happiness.” Rather than evaluate happiness with life generally, Daniel Kahneman proposed in the 1990s to record subjective evaluations of a particular experience at frequent intervals and to then integrate these momentary evaluations into an overall evaluation of the experience. Evidence from psychological experiments suggests that people remember the overall utility of an experience as the average of the peak- and end-level of pain or pleasure as opposed to the integral over the experience’s duration. Kahneman’s proposal would correct for this peak-end cognitive bias, as well as other biases related to SWB. Kahneman and his colleagues have recently proposed the construction of “national well-being accounts” using objective happiness measures.52 The latest objective happiness research has focused on measuring satisfaction with daily life experiences such as commuting, working, eating, taking care of children, and so on. Objective happiness has been operationalized using two different techniques, the experience sampling and day reconstruction methods.53 Kahneman et al. (2004a) presents results from a survey using the day reconstruction method for 909 employed Texan women.54 Interesting findings are emerging on how people evaluate their satisfaction level with different activities. For example, survey respondents ranked taking care of their children as one of the most unpleasant experiences, which contrasts sharply with what people report when they are asked to evaluate activities in a general sense rather than based on specific experiences the day before. As in the case of SWB, it seems intuitively plausible that results from the objective happiness literature may prove to be valuable for identifying rates of transformation and informing cost-benefit and policy analysis.55 Objective happiness proponents have gone further and proposed constructing a “national well-being account” on the basis of objective happiness measures. Such accounts would be built by surveying

Measuring Welfare

●

117

a representative sample of individuals, determining the welfare outcome of specific experiences through collecting information on the net affect of these experiences, aggregating these experiences for an individual, and aggregating individual experiences into a societal measure. Some individual experiences would be negative (painful), and others positive (pleasurable). The example that Kahneman first used to illustrate objective happiness was the real-time reporting of the level of (dis)satisfaction of patients undergoing colonoscopies.56 A colonoscopy is clearly a negative experience for a patient during the time that the procedure is carried out, and it would presumably enter national well-being accounts as a negative item. However, we are immediately confronted with an obvious question: if objective happiness ref lects the total welfare obtained from an experience, why would anyone ever voluntarily choose to undergo a colonoscopy? This leads us to our fourth welfare measure paradox: Paradox 4: If objective happiness is the preferred measure of welfare, then we cannot explain why people voluntarily choose to engage in unpleasant activities. Medical patients not only choose to undergo colonoscopies, they actually pay for the (anti)pleasure. A colonoscopy must bring benefits to a patient that are not ref lected in the objective happiness measure, and these are primarily avoiding the pain of advanced colon cancer and extending the length of one’s life. The price of a colonoscopy will (under the right conditions) ref lect the value of these medical benefits. Of course, the pain involved in the procedure will be factored into its price. If a patient has a choice between a more-painful and less-painful procedure, he/she would presumably be willing to pay more for the lesspainful procedure. Objective happiness in this case is equivalent to one variable in a hedonic price regression. Market prices are capable of summarizing a great deal of information about costs and benefits that reflect why people make the choices that they actually do. 7 Concluding Observations and Questions for the Future After over a century of welfare economics, we remain far from having any consensus on the best way to empirically measure welfare at the individual or societal level. Important scholars of welfare economics and national income accounting such as John Hicks, Edward Denison, and Amartya Sen have expressed pessimism about the feasibility of measuring welfare with a single index, and developments over the past

118

●

Bryan Roberts

half century are not particularly encouraging for those who disagree.57 As the remarkable range of measures reviewed here and summarized in table 6.4 below attests, however, this has not kept social scientists from trying. What has this review revealed? Social scientists, analysts, politicians, journalists, and other observers share a common interest in understanding three things: the average level of welfare and its growth over time, and inequality in the distribution of welfare across individuals and countries. Although we cannot compare the levels of the measures reviewed in this chapter, table 6.4 summarizes what has been discovered about growth in the measures for a particular country (the United States) and distribution across countries of the world. National income has the highest rate of growth and degree of inequality, both of which decline substantially as we move to measures based on extended national accounts and social indicators.58 Subjective well-being is an extreme situation in which there is no growth and cross-country inequality is essentially completely absent. These four indicators paint very different pictures. Do we live in a world of high growth and high inequality, low growth and low inequality, or no growth and no inequality? It has also been shown that in spite of decades of effort by social scientists, welfare measures continue to suffer from the same set of challenges and problems that other political economy indices face. The concept of what is being measured, utility, is capable of being inf luenced by so many different factors that it can be difficult to see how to proceed empirically. Aggregation has posed challenges that have been met with varying degrees of success.59 Many indicators used empirically to assess welfare are bounded from above and establish an arbitrary limit on the well-being that an individual and society can experience. As Hicks, Denison, and others have argued, it might be the case that the search for a single welfare indicator is quixotic and ultimately fruitless. However, given the strong human tendency to simplify complex information sets in order to make judgments and decisions, it is unlikely that the search for a single welfare index will end anytime soon. Rather than call for an end to attempting what cannot be done, it is more constructive and useful to advance understanding of the difficulties involved in measuring welfare, the range of measures that are available and their strengths and limitations, and how progress might be made. In that spirit, a set of empirical welfare measures for the United States that describes developments over time periods starting at least as far back as 1929 is being made available at this book’s Web site so that readers can make their own evaluations and draw their own conclusions.

Measuring Welfare

●

119

This chapter must end with a series of questions. The most important one is, can a concept of well-being be identified that is psychologically and philosophically compelling and also capable of being measured empirically? For decades, mainstream economics answered this question with an emphatic “no.” Over the past decade, there has been renewed interest in getting to “yes” as reflected by the growing prominence of Sen’s capabilities theory, the human development measure, and the subjective well-being and objective happiness literatures. 60 The mainstream consensus has not yet been overturned, but a ferment is brewing. How can the aggregation challenge be met? Economists have been adroit at identifying marginal rates of substitution and transformation that can be used as aggregation weights. What clever methods could be employed to estimate weights for welfare inf luences other than production and consumption of goods and services? Could coefficients from happiness regressions be used? Are there any actual choices that can be observed that would shed light on such trade-offs?61 What is the nature of the “political economy of political economy indicators”? Creating welfare measures requires resources, and the provision of such measures is impacted by supply and demand factors. Production of the national accounts is supported by a large infrastructure of public- and private sector human capital.62 Other welfare measures have received much less institutionalized support, or none at all. What conditions are required in order for a particular approach to measuring welfare to obtain the kind of ongoing support necessary for it to have any impact on policymaking and the marketplace of ideas? Under what conditions does a particular informational measure become viewed as an essential public good and provided by state institutions? How can differences in the structure of information provided by national governments be explained? Finally, how can political-economic variables be incorporated into welfare measures? Scholars have often argued that to evaluate the welfare of a community, information other than that related to the utility of income must be incorporated in order to have an accurate picture. The level of political freedom, security of property rights, distributive justice, and other political-economic variables that many chapters of this book touch on have been mentioned in this context. 63 Considerable analysis has been done by economists relating these variables to the level of national income, and research has also been done evaluating their relation to subjective well-being. However, until there is some consensus on what constitutes a proper approach to measuring welfare,

120

●

Bryan Roberts

it is not clear how these variables can best be incorporated into welfare indices. Appendix: The Long-Run Trend in Subjective Well-Being and Its Relationship to Income The existence or absence of a long-run relationship between subjective well-being (SWB) and income is an issue of central importance to happiness economics. Easterlin has argued that there is no long-run relationship, and also that SWB has not tended to rise over time in the postwar era. Easterlin’s paradox, and the strong version that the Industrial Revolution has been irrelevant to human well-being, cannot be sustained if a positive long-run relationship between SWB and income exists. Given the importance of this question, there has been relatively little formal investigation of the relationship between average SWB and income in the long run, and whether SWB has trended up or down significantly over long time periods.64 This appendix presents findings based on regression analysis of national average SWB. Surveys asking individuals SWB questions have been conducted since 1946. The World Database of Happiness provides comprehensive data on the average numeric value to SWB questions for each of these surveys.65 Several important issues confront using SWB data in time series research. First, there are many missing observations. Only one country (the United States) comes close to having a fully-populated annual time series on average SWB from the late 1940s to now. Some European countries implemented SWB surveys in the late 1940s and in the mid-1960s, but it wasn’t until the 1970s that many observations appear for these countries. The situation with respect to the developing world is even more sparse. Second, several different kinds of SWB questions are asked in surveys. Individuals are asked about their level of happiness (e.g., “In general, how happy would you say you are?”), life satisfaction (e.g., “Overall, how satisfied are you with your present life?”), best-worst possible life (e.g., “Where do you stand on the ladder of the worst and best possible lives?”), and some others. The vast majority of SWB surveys ask about happiness and/or life satisfaction. Although researchers often use the answers to these questions interchangeably and implicitly assume that the answers relate to an identical concept of SWB, it is not clear that this is valid. Responses to questions on how happy one is and how satisfied one is with one’s life might differ.

Measuring Welfare

●

121

Third, different numerical scales are used in various surveys, ranging from 2-step to 11-step scales. Researchers have often converted the values of these surveys into a standard 10-step scale using mathematical procedures. The basic reason for both scale standardization and using happiness and life satisfaction responses interchangeably is to make the most of a sparse dataset. However, doing this risks introducing error into the time series; for example, scales with an odd number of steps give respondents with a natural mid-point value to choose, whereas an even number of steps does not. 66 The approach taken here is to focus only on average values of responses to happiness questions: analysis of life satisfaction responses await future research. Scale standardization is also not performed. For a particular country, if time series (however sparse) are available for a 3-step happiness question, and for a 4-step happiness question, they are not standardized and merged but used as separate time series. 67 The rule followed here is that there must be at least two observations in a time series, and they must be separated by at least nine years, as the focus is on longer-run trends. To control for the problem of noncomparability of responses to questions having different number of steps, each time series is converted into an index in which the initial value of the series is set at 1. The same is done for real per capita GDP, the income measure used here.68 The time series are then stacked, and stacked happiness indices are regressed on stacked real income indices. Table 6.5 below helps clarify the approach and presents data on happiness and real income for South Korea. SWB surveys were done in Korea that asked 4-step and 5-step happiness questions, and average response values are given in the table. The sparseness of the data is apparent; it is also clear that the sparse data span long time periods of 20+ years. In the analysis done here, the 4- and 5-step responses are not converted into a standardized 10-step scale and merged, but kept separate. Indices are created of both the happiness and real income variables: note that a real income time series specific to a particular happiness time series is created. These indices are then stacked so that there is one happiness column and one real income column. This is done for all countries having suitable data, and one happiness column that can be regressed against one income column is obtained.69 OLS regression results are presented in table 6.6 below for the full sample of all countries, and groups of countries: those having very long time series starting in the 1940s, west European countries, transition European countries, and developing countries. Separate regressions were run for the following independent variables: linear time trend,

122

●

Bryan Roberts

indexed level of real income, and the natural log of indexed real income. In no case is there a statistically significant coefficient on a linear time trend variable, confirming the Easterlin hypothesis that happiness has not risen in the postwar era. Although happiness may have risen in some countries, it fell in others, and the average result is no significant trend. This holds true for all countries together and the subgroups evaluated separately. Results for the income regressions are intriguing. For the full sample and for long time series countries, happiness is not significantly related to either income or log of income. For western European countries, there is a significant negative relationship between SWB and income. This negative relationship is all the more surprising given an insignificant time trend coefficient for these countries, as log income as trended up linearly in these countries. For transition countries during 1990– 2006, there is a significant positive relationship between happiness and log income, confirming the finding of Frijters, Shields, and HaiskenDeNew (2004) of a positive relationship between life satisfaction and income in East Germany.70 There is no significant relationship between happiness and income for nontransition developing countries for which data is available. The results presented here represent a preliminary first step in analyzing available SWB data over the long run in a strictly macroeconomic framework in which all data are national-average variables.71 They confirm Easterlin’s paradox as a robust result: happiness has generally not trended up over time in either the developed or developing worlds. However, more sophisticated econometric research is needed that confirms formally that SWB time series are integrated of order 0 and evaluates relationships between SWB and macroeconomic variables such as real income, unemployment and inf lation using appropriate cointegration and error-correction modeling techniques.

UNDP HDI Value

1.000

0.800

0.600

0.400

0.200 0

5,000

10,000 15,000 20,000 25,000 30,000 35,000 40,000 45,000 Per Capita GDP (2002; PPP$)

Figure 6.1 UNDP HDI versus Per Capita Income

$30,000

6

$25,000

5

$20,000

4

$15,000

3

$10,000

2

$5,000

1

Real GDP Per Capita (1996 $)

2006

7

2000

$35,000

1990

8

1980

$40,000

1970

9

1960

$45,000

1950

10

1940

$50,000

Average Happiness (1–10 scale)

Figure 6.2 Income and Happiness in the United States

Table 6.1

Per Capita National Income and MEW (in 1958 Dollars) Average Annual Growth Rate (%)

GNP NNP MEW (sustainable) Leisurea Nonmarket productionb Disamenities Share in MEW Leisure Nonmarket production

1929

1947

1965

$1,672 $1,507 $4,463 $2,787 $704 ⫺$103

$2,142 $2,015 $5,934 $3,227 $1,103 ⫺$132

$3,175 $2,894 $6,378 $3,221 $1,518 ⫺$178

62% 16%

54% 19%

51% 24%

1929–1947 1947–1965 1929–1965 1.3 1.5 1.5 0.8 2.4 1.3

2.1 1.9 0.4 0.0 1.7 1.6

1.7 1.8 1.0 0.4 2.1 1.5

Source: Nordhaus and Tobin (1972), p. 55. a Valued at the wage rate. b Household production of goods and services, valued at the price of (aggregate) consumption.

Table 6.2

Official and Extended Account Long-Run Growth Rates Average Annual Growth of Per Capita Income Measure Extended-Account Alternative (%)

Nordhaus-Tobin MEW Zolotas EAW Kendrick GNP Eisner TISA

0.4 0.8 0.9 1.3

Official GNP (%) Time Period (%) 2.2 2.3 2.5 2.1

1947–1965 1950–1977 1948–1973 1946–1981

Sources: Extended-account growth rates calculated from data in table 6.5 in Eisner (1988), p. 1673. Official GNP growth rates calculated from data taken from U.S. Department of Commerce, Bureau of Economic Analysis, gross national product quantity index values.

Table 6.3

Inequality in Distribution of Welfare Measures across Countries GDP (2000)

Maximum Minimum Ratio of maximum to minimum Gini coefficient a

Per capita GDP in PPP$.

$40,000 $478a 83.7 0.519

a

HDI (2000) PQLI (1970–1975) 0.96 0.27 3.5 0.100

97 12 8.1 NA

Table 6.4

Empirical “Macroeconomic” Characteristics of Welfare Measures

Official national incomea Extended national accountsa Human developmentb Subjective well-beingc a b c

Long-Run Annual U.S. Growth Rate (%)

Gini Coefficient: Across Countries

2.3 0.8 0.3 ~0

0.52 NA 0.10 0.03

Average of long-run growth rates in table 2. Growth over 1975-2005. Data taken from the UN HDR database. For Gini coefficient, see footnote 51 and related text.

Table 6.5

Happiness and Income Data—South Korea Absolute Values

4-Step Happiness 1979 1981 1990 1996 2001 2003 2004 2006

Real Income

Indices

5-Step Happiness Real Income 4-Step Happiness Real Income 5-Step Happiness Real Income 3.38

2.41 2.86 3.00 2.96

Sources: Veenhoven (2007).

3,756 1.00 1.19 1.24 1.23

3,808 7,481 10,978 12,689 3.49 3.40 3.53

13,849 14,434 15,681

1.00

1.00

1.03 1.01 1.04

3.69 3.84 4.17

1.00 1.96 2.88 3.33

Table 6.6

Regression Results Intercept Time Trend Real Income f ln (Real Income)f

All countriesa

Countries with 1940s datab

West European countriesc

Transition countriesd

Developing countriese

1.01 (188.4) 1.01 (115.2) 1.01 (225.8) 1.01 (64.8) 0.97 (50.1) 0.98 (78.4) 1.01 (117.9) 1.09 (56.1) 1.03 (135.2) 0.99 (70.7) 0.95 (23.9) 1.00 (113.8) 1.02 (48.1) 1.00 (41.6) 1.02 (71.7)

0.00 (0.0) ⫺0.002 (⫺0.3) ⫺0.01 (⫺0.6) 0.00 (0.0) 0.01 (1.0) 0.00 (0.27) 0.00 (0.5) 20.06 (⫺4.6) 20.09 (⫺4.6) 0.00 (0.9) 0.05 (1.3) 0.06 (1.7) 0.00 (0.4) 0.02 (1.2) 0.04 (1.3)

R2

Observation

0.00

383

0.00

349

0.00

349

0.00

115

0.01

87

0.00

87

0.00

190

0.12

156

0.12

156

0.01

63

0.03

56

0.05

56

0.00

50

0.03

50

0.04

50

Note: OLS regressions. T statistic in parentheses. Coefficients significant at the 10% level in bold. a Argentina (5), Australia (4), Austria (3), Belarus (4), Belgium (13), Brazil (4), Bulgaria (3), Canada (7), Chile (4), China (3), Czech Republic (2), Denmark (14), East Germany (6), West Germany (21), Estonia (3), Finland (5), France (16), Great Britain (20), Hungary (5), Iceland (3), India (3), Ireland (16), Israel (2), Italy (16), Japan (8), South Korea (8), Latvia (3), Lithuania (3), Luxembourg (10), Mexico (5), Netherlands (25), Nigeria (3), Norway (5), Poland (14), Portugal (2), Romania (4), Russia (8), Singapore (3), Slovakia (3), Slovenia (4), South Africa (8), Spain (5), Sweden (8), Switzerland (5), Turkey (4), United States (56). Number of annual observations available for each country in parentheses. b Australia, Canada, France, Great Britain, Netherlands, United States (time trend); Australia, Canada, Great Britain, United States (real income, ln real income). c Austria, Belgium, Denmark, France, Great Britain, Ireland, Italy, Luxembourg, Netherlands, Norway, Spain, Sweden, Switzerland. d Belarus, Bulgaria, Czech Republic, Estonia, Hungary, Latvia, Lithuania, Poland, Romania, Russia, Slovakia, Slovenia. East Germany included only in time trend regression. Data series for transition countries start in 1989 or 1990, except Hungary (which has an observation for 1981). e Argentina, Brazil, Chile, China, India, South Korea, Mexico, Nigeria, Singapore, South Africa, Turkey. f East Germany and West Germany not included due to lack of real income series.

128

●

Bryan Roberts

Notes 1. Throughout this chapter, the terms welfare, well-being, utility, and happiness will be used interchangeably. 2. It should be noted that enormous literatures are available on each approach to welfare measurement reviewed here, and a comprehensive bibliography could easily fill a book. The references provided here are usually the “tips of icebergs” and provide extensive bibliographies of the many studies that have been written to establish, develop and refine a particular approach to measuring welfare. 3. Studenski (1958) and Vanoli (2005) provide comprehensive reviews of the history of the development of national accounts. In addition to the well-known contributions of Kuznets and Clark, Meade, and Stone, the contributions in the 1930s of Clark Warburton, Edward Denison, Milton Gilbert, and George Jaszi of the United States, Jan Tinbergen of the Netherlands, Erik Lindahl of Sweden, Ragnar Frisch of the Netherlands, and Viggo Kampmann of Denmark should also be noted (see Vanoli [2005, pp. 17–20] and Eisner [1988, p. 1612]). 4. Classical welfare analysis is quite humble in its claims on the possibility of empirically measuring welfare at the individual level. Neoclassical utility functions only provide ordinal rankings of outcomes, and specific numeric values of levels of well-being associated with particular outcomes are arbitrary and unobservable. The most important prediction of the classical approach is that (compensated) demand curves slope downward, so that when the explicit or implicit price of an outcome increases, a person will want less of it. Although the theory is silent on absolute levels of well-being, it can be used to assess how utility changes if a person experiences a small change in an outcome. Introductions to the classical approach to welfare economics can be found in microeconomics textbooks such as Nicholson (1992) and Varian (1992). Stigler (1950) reviews the history of development of utility theory, which is what is of most interest to this essay (firm welfare, i.e., profits, is a component of income that is distributed to current or future households). Sen (1979) provides a comprehensive and accessible survey of the issues involved in using real income as a measure of welfare at the individual and social level, and an exhaustive set of references to the vast literatures related to these issues. Sen (1982) offers a sophisticated set of essays that reviews (and often challenges) some of the key conclusions of modern welfare economics. McFadden (2005) reviews key developments in augmenting the classical approach since the 1950s, including nonlinear budget constraints, unobserved preference heterogeneity, hedonic attributes, 0 and lumpy purchases, household production, and stated preference methods (see pp. 1–11). 5. Sen (1982), Chapter 12 reviews the interpersonal welfare comparison issue. 6. Sen (1979), pp. 27–29. It should also be noted that other aggregation controversies have arisen in economics, for example, the “Cambridge capital controversy” (see Cohen and Harcourt 2003).

Measuring Welfare

●

129

7. See Sen (1979), pp. 30–33, for a discussion of this approach as well as objections to it. 8. See Eisner (1988) and Vanoli (2005) for comprehensive reviews. 9. With the notable exception in the United States of owner-occupied housing services. 10. See Easterlin (1988), p. 1613. It is important today to know the details about how national accounts are compiled for particular countries, because some countries attempt to include some production of the household and informal sectors in their national accounts. 11. To quote a leading student and practitioner of national income accounting, “National accountants, with the exception of Kuznets, always clearly indicated that their measurement of consumption, and of capital formation, the base for future consumption, did not intend to estimate the level of or the change in the standard of living, and even less to estimate welfare, which depends on many other factors. . . . National accounting provided a complete set of detailed and aggregated measures, for the use of analysts interested in macroeconomic equilibrium or in the study of households, in the relationship between consumption and total product or between consumption and saving, etc. Users were free to go further and look for interpretations of the changes in national accounts, as specialists in climatology do with meteorological observations” (Vanoli [2005], p. 296). 12. See Kuznets (1952), pp. 63–69 for an estimate of the value of leisure in the United States during 1890–1948. See Kuznets (1941) for a discussion of the issue of excluding household production from the national accounts. 13. In the 1950s and 1960s, economists were generally pessimistic that welfare could be measured. See, for example, Report on International Definition and Measurement of Standards and Levels of Living (New York: United Nations, 1954) and International Definition and Measurement of Levels of Living: An Interim Guide (New York: United Nations, 1961). 14. Classifying these expenses as investment neglects the fact that people get pleasure from being educated and healthy. It should also be noted that MEW replaces the value of consumer durables purchases with an estimate of the f low of services from consumer durables, making this component equivalent to the treatment of housing. 15. Taken from a 1954 time-use survey. 16. See Nordhaus and Tobin (1972), pp. 38–49. Robin Matthews’ discussion of MEW on pp. 87–89 of that reference is an accessible and illuminating discussion of the issue of household time valuation. Gronau (1973) is another important early contribution on this issue. 17. Recent work by Statistics Canada suggests that in that country since 1970, long-run growth in per capita GDP has significantly exceeded that of MEW for the same reason. See Messinger (1997). 18. The most important methodological issue for the MEW exercise is how to value time. Nordhaus and Tobin present a range of estimates of the monetary value of household production that ref lects uncertainty on how

130

19. 20.

21.

22.

23. 24. 25.

26. 27. 28. 29.

30.

●

Bryan Roberts

to value time, and MEW growth rates are sensitive to what assumption is made. Analysts may have felt that the degree of uncertainty present in the estimates was simply too great, particularly public sector experts responsible for providing official estimates to the public. See also Vanoli (2005), Chapter 7 for a useful review and references to extended account efforts outside the United States. No constant-price (real) data are available for Jorgenson-Fraumeni, and the Ruggles’ measure is available only for 1969–1980, which is a relatively short-time period. In addition to the results shown for MEW and official national income in table 6.1, see Eisner (1988) for other estimates of household production and leisure. Based on a review of studies through the early 2000s, Vanoli (2005, Chapter 7, Box 50, p. 287) concludes that including household production alone would increase official national income by 50–200%. It would be straightforward to present extended accounts in the framework of satellite accounts, which have already been established by the U.S. government for the travel and tourism and R&D sectors. The United Nations’ 1993 System of National Accounts includes household production of goods and services (excluding own-use services) in the central accounts, and encourages development of satellite accounts. See the UN’s Handbook of National Accounting series, volumes 1 and 2, for extensive treatments. See Morris (1979), Sen (1991), and Vanoli (2005, Chapter 7, pp. 292–294) for reviews of the social indicator movement. See Morris (1979). Ibid., p. 4. Morris also notes that Kuznets suggested an indicator like the PQLI in order to measure welfare across countries at different income levels as early as 1947 (see p. 8). See also Lipton and Ravallion (1995). For a comprehensive survey on poverty that includes a review of basic needs indicators and the PQLI. See Morris (1979), pp. 15–18, 34–35, and 94–95. Subsequent work by Ram (1982) derived aggregation weights for the PQLI using principal components analysis. Morris (1979), p. 48. It does seem that human beings cannot live past the age of roughly 120 years. Although statistics might be inadequate to fully test this proposition, it may be true that the upper bound of 120 years has not changed much in the past 500 years, but the proportion of people managing to live past various lower values has been rising. If this is so, then life expectancy is essentially a bounded indicator. Technology may eventually be able to alter the 120-year upper bound, but it does not seem to have had much success to date. See Sen (1985) and (1999). Also see Lipton and Ravillion (1995).

Measuring Welfare

●

131

31. The human development measure is described in detail in the UN Web page http://hdr.undp.org/en/statistics/indices/hdi/. For a detailed description of the HDI and references to relevant early studies, see Anand and Sen (1994). 32. See Ram (1982). 33. The original PQLI did not include per capita income, but Ram’s (1982) modification did. The UN argues that the conceptual underpinnings of PQLI and HDI are different, in that the former measures achievements and the latter measures capabilities. The indices are, however, quite similar technically. 34. See Anand (1991), Kanbur (1990), Kelley (1991), McGillivray and White (1993), Srinivasan (1994), and Lipton and Ravallion (1995). 35. Ram (1992) was the first to show the dramatic difference in inequality measures for GDP and HDI, although he did not discuss the fact that the result is due to concavity with respect to income. 36. The across-country Gini value for human development is much less than the Gini value distribution of household income within countries that are regarded as having a high degree of income equality. For example, the Gini coefficient for household income distribution in Scandanavian countries is typically between 0.2 and 0.3. 37. Bernard Perret, “Social Indicators, State of the Art and Perspectives,” Council for Employment, Income and Social Cohesion, January 2002, as quoted in Vanoli (2005), p. 294. 38. See Kahneman et al. (1999) for various chapters on biological aspects of pain and pleasure. Neuroeconomics is a recently emerged field of study that is concerned with this nexus. 39. Important reasons for this reluctance is the neoclassical belief that cardinal values cannot be assigned to utility levels, and concern about the ability to obtain accurate information from surveys. Interestingly, one of the most important modern scholars of welfare economics, Amartya Sen, showed considerable sympathy to subjective assessments in his writings in the 1970s and 1980s (see Chapter 2, p. 71–72 in Sen [1982 and 1985, p. 29]), but in recent years he has evinced skepticism, apparently because of difficulty in reconciling the hedonic treadmill phenomenon with differences in objective circumstances (see Sen 2002). 40. A selective set of books and articles that review key results in the SWB literature and provide references to the many studies that cannot be cited individually here include Kahneman et al. (1999), Easterlin (2001), Frey and Stutzer (2002), Layard (2005), Di Tella and MacCulloch (2006), Kahneman and Kreuger (2006), and Deaton (2007). 41. For an involved discussion of the cognitive processes involved in answering an SWB question, and a model of the judgmental process, see the chapter by Schwartz and Strack in Kahneman et al. (1999). 42. The estimation of “happiness functions” is not without controversy. Bertrand and Mullainathan (2001) challenge the legitimacy of estimating

132

43.

44. 45.

46. 47.

48. 49.

●

Bryan Roberts

regressions such as happiness functions. They argue that measurement error due to cognitive effects impact peoples’ responses to subjective questions such as SWB, and that this measurement error is correlated with explanatory variables and is particularly problematic when subjective responses are the dependent variables. See the chapter by Schwarz and Strack in Kahneman et al. (1999) for an in-depth discussion of the cognitive processes that affect answers to SWB questions. For in-depth discussions of the “hedonic treadmill” phenomenon, see the chapter by Frederick and Loewenstein in Kahneman et al. (1999), Easterlin (2001), Kahneman and Kreuger (2006), and Clark, Frijters, and Shields (2007). A recent contribution to empirical understanding of hedonic adaptation is Di Tella, Haisken-De New, and MacCulloch (2007), who find using German panel data that people fully adapt to income shocks within four years, but not to status shocks (e.g., job prestige). Rayo and Becker (2007) develop a theoretical model of utility optimization with hedonic adaptation and peer comparisons. See Di Tella et al. (2007). See the chapter by Diener and Suh in Kahneman et al. (1999), Diener and Suh (2000), and Deaton (2007) on differences in national average SWB. See Layard. Mayraz, and Nickell (2007) on estimating the marginal utility of income. See Di Tella, MacCulloch, and Oswald (2003) on the utility costs of recession and inf lation. On SWB and terrorism, see Frey, Luechinger, and Stutzer (2007), who find a very large impact of terrorism on SWB in France, the UK, and Ireland during 1973–1998, and Zussman, Zussman, and Romanov (2007), who find that terrorist events in Israel during 2002–2004 had no significant impact on SWB. On airport noise, see van Praag and Baarsma (2004). This was first noted by Easterlin (1974) and has come to be called the Easterlin paradox. The data source for U.S. average national happiness is calculations made by the author using data from Veenhoven (2007). Data on U.S. real per capita GDP is from the U.S. Bureau of the Census. As long as cognitive understanding of the meaning of SWB questions remained reasonably stable. One recent critical review argues that “One could conclude from the lack of correlation over time between aggregate happiness and almost any other socio-economic variable of interest one of two things. Either that attempting to improve the human lot through economic or social policy is futile, or that happiness data over time is an extremely insensitive measure of welfare. The evidence points to the latter” (Johns and Ormerod 2007, p. 13). Further evidence of the insensitivity of SWB is provided by Smith (2007), who uses panel data from a health and retirement study and finds that SWB values reported by the elderly are not systematically impacted by health shocks and do not change up until the time of death.

Measuring Welfare

●

133

50. Note that the PQLI and human development measures are also bounded from above, because their component indicators have natural lower and upper limits. 51. Average national values are given in the Diener and Suh chapter in Kahneman et al. (1999), p. 436. SWB data are available for a subset of countries that accounted for 66% of the world’s population in 1994. 52. See Kahneman’s chapter in Kahneman et al. (1999). Kahneman et al. (2004b) describe how national happiness accounts could be developed. 53. The experience sampling method corresponds to Kahneman’s original proposal and measures satisfaction in real time. The day reconstruction method requires respondents to reconstruct from memory their experiences in the previous day and describe the intensity with which different emotions and affects were present in that experience. 54. See also Kahneman and Krueger (2006). 55. Results from the recent survey research might be particularly relevant for evaluating trade-offs between different uses of time. 56. See Kahneman (1999), p. 4–5. 57. Hicks wrote in 1975 that “We have indexes of production; we do not have—and it is clear that we cannot have—an index of welfare” (cited in Vanoli [2005], p. 279). Edward Denison wrote in 1971 that “It would be enormously convenient to have a single, generally accepted index of the economic and social welfare of the people of the United States. At a glance it would tell us how much better or worse off we have become each year and each decade. We could judge the desirability of any proposed action by asking whether it would raise or lower this index. Some recent discussion seems almost to imply that such an index could be constructed. Articles in the popular press even criticize GNP because it is not such a complete index of welfare, on the one hand ignoring the fact that it was never intended to be such an index, and on the other, suggesting that with appropriate changes it could be converted to one. A single, generally acceptable index of welfare cannot be constructed” (p. 13). Sen (1985) wrote in 1985 that “Given the variety of contexts in which the assessment of interest is relevant, it is quite unlikely that we shall get some one measure of interest that is superior to all others and applicable in all contexts” (1985 p. 4). 58. Although extended national account data across countries are not available, these measures are presumably more equitably distributed across countries than national income if household and informal production are relatively more important in poorer countries, as Kuznets and others have argued. 59. National income has met the aggregation challenge through the use of market prices as weights, which has been theoretically justified by neoclassical welfare economics. Extended national accounts have had to identify an appropriate price with which to weight household production and leisure, and this introduces substantial uncertainty into estimates. Social

134

60.

61. 62.

63. 64.

65. 66.

67.

68.

69.

●

Bryan Roberts

indicator indices have generally used arbitrary weights that have no theoretical or empirical basis. SWB avoids the issue altogether. Some scholars consider that Sen’s capabilities approach is the current dominant approach for measuring welfare (Deaton 2007, p. 30–31). It has been shown here that the empirical index that has been used to measure capabilities, human development, raises a troubling paradox that derives from the fact that human development’s components are bounded from above and highly concave with respect to income. Migration decisions come to mind, although many barriers exist to free movement that would complicate estimation and inference. Public sector human capital refers primarily to academic experts but also includes experts in national government offices and international institutions (in particular, the IMF and the United Nations). The private sector also produces journals dedicated to measuring economic activity, for example, the Review of Income and Wealth. See, for example, Sen (1982, pp. 425–428; 1985). Most studies on the relationship between happiness and income have focused on cross-sectional or panel data analysis that estimate the relationship across individuals at a particular point in time. Formal analysis of the longer-run relationship between income and happiness is less common. Frijters, Shields, and Haisken-DeNew (2004) find a positive and significant correlation between life satisfaction and income in East Germany in the decade after reunification. For overviews of cross-section and panel estimations of happiness functions and also the Easterlin paradox and related evidence, see Clark et al. (2007), Layard et al. (2007), and Di Tella and MacCulloch (2006). All data used in this analysis are taken from Veerhoven (2007). Another issue is that each survey is treated as being equivalent in terms of quality. The many SWB surveys carried out through the years may differ with respect to their sampling methodologies, sample size, and other factors. This introduces measurement error into response values. There are instances in which several surveys asking happiness questions with the same step scale were carried out in a given country in a given year, particularly for European countries and the United States. The average of the surveys’ average response values is taken and used as a single observation here. Per capita GDP in local currency units at constant prices for the period 1960–2006 are taken from the World Bank’s World Development Indicators database. Per capita GDP for Australia, Canada, the United Kingdom, and the United States in the 1940s were constructed from data taken from official national sources and linked to the World Bank data using an index approach. A panel data estimation approach is not taken here and is a task for future research. Implementing panel estimation might be complicated by the sparseness of the available dataset.

Measuring Welfare

●

135

70. Note that East Germany was not included here in the happiness-income regressions, so results here suggest that Frijters et al.’s (2004) finding applies to other transition countries. The transition experience is of course an unusual one. In initial years, societies were subject to profound shocks in all areas of political, social, and economic life that moved them far from any normal steady state. After the early 1990s, recovery and movement toward a more normal growth steady state began. Using developments in transition countries to explicitly or implicitly characterize longer-run trends and relationships in more stable developed and developing countries is questionable. 71. Di Tella et al. (2003) estimate happiness functions on individual-level data from European and U.S. surveys that include macroeconomic variables as explanatory variables, including real per capita GDP, unemployment, and inf lation. They also include yearly time trends and find that European happiness had a significant negative trend during 1975–1992. They do not try to explain this result but leave it as an important question for future research.

References Anand, S. 1991. Poverty and Human Development in Asia and the Pacific in Poverty Alleviation in Asia and the Pacific. New York: United Nations Development Program. Anand, S. and A. Sen. 1994. Human Development Index: Methodology and Measurement. Human Development Report Office Occasional Paper No. 12. Bertrand, M. and S. Mullainathan. 2001. Do People Mean What They Say? Implications for Subjective Survey Data. American Economic Review (Papers and Proceedings) 91 (2): 67–72. Blanchflower, D. and A. Oswald. 2000. Well-Being over Time in Britain and the USA. NBER Working Paper No. 7487. Clark, A., P. Frijters, and M. Shields. 2007. Relative Income, Happiness and Utility: An Explanation for the Easterlin Paradox and Other Puzzles. IZA Discussion Paper No. 2840, June. Cohen, A. and G. C. Harcourt. 2003. Whatever Happened to the Cambridge Capital Theory Controversies? Journal of Economic Perspectives 17 (1): 199–214. Deaton, A. 2007. Income, Aging, Health and Wellbeing around the World: Evidence from the Gallup World Poll. NBER Working Paper No. 13317. Denison, E. 1971. Welfare Measurement and the GNP. Survey of Current Business 51 (1): 13–16. Di Tella, R., J. Haisken-De New, and R. MacCulloch. 2007. Happiness Adaptation to Income and to Status in an Individual Panel. NBER Working Paper 13159. Di Tella, R. and R. MacCulloch. 2006. Some Uses of Happiness Data in Economics. Journal of Economic Perspectives 20 (1): 25–46.

136

●

Bryan Roberts

Di Tella, R., R. MacCulloch, and A. Oswald. 2003. The Macroeconomics of Happiness. Review of Economics and Statistics 85 (4): 809–827. Diener, E. and E. M. Suh. 2000. Culture and Subjective Well-Being. Cambridge, MA: MIT Press. Easterlin, R. 1974. Does Economic Growth Improve the Human Lot? Some Empirical Evidence. In Nations and Households in Economic Growth: Essays in Honor of Moses Abramovitz, ed. P. A. David and M. W. Reder, 98–125. New York: Academic Press. ———. 2001. Income and Happiness: Towards a Unified Theory. The Economic Journal 111 (473): 465–484. Eisner, R. 1988. Extended Accounts for National Income and Product. The Journal of Economic Literature 26 (4): 1611–1684. Frey, B., S. Luechinger, and A. Stutzer. 2007. Calculating Tragedy: Assessing the Costs of Terrorism. Journal of Economic Surveys 21 (1): 1–24. Frey, B. and A. Stutzer. 2002. What Can Economists Learn from Happiness Research? Journal of Economic Literature 40 (2): 402–435. Frijters, P., M. Shields, and J. P. Haisken-DeNew. 2004. Money Does Matter! Evidence from Increasing Real Incomes in Germany Following Reunification. American Economic Review 94 (3): 730–741. Gronau, R. 1973. The Measurement of Output of the Nonmarket Sector: The Evaluation of Housewives’ Time. In The Measurement of Economic and Social Performance, ed. Milton Moss, Studies in Income and Wealth, vol. 38, National Bureau of Economic Research. New York: Columbia University Press. Johns, H. and P. Ormerod. 2007. Happiness, Economics and Public Policy. London: Institute of Economic Affairs. Kahneman, D., E. Diener, and N. Schwarz (eds.). 1999. Well-Being: The Foundations of Hedonic Psychology. New York: Russell Sage Foundation. Kahneman, D. and A. B. Krueger. 2006. Developments in the Measurement of Subjective Well-Being. Journal of Economic Perspectives 20 (1): 3–24. Kahneman, D., A. B. Krueger, D. A. Schkade, N. Schwarz, and A. A. Stone. 2004a. A Survey Method for Characterizing Daily Life Experience: The Day Reconstruction Method. Science 306 (5702): 1776–1780. ———. 2004b. Toward National Well-Being Accounts. American Economic Review (Papers and Proceedings) 94 (2): 429–434. Kanbur, R. 1990. Poverty and Development: The Human Development Report and the World Development Report. World Bank WPS 618. Kelley, A. C. 1991. The Human Development Index: Handle with Care. Population and Development Review 17 (2): 315–324. Kuznets, S. 1941. National Income and Its Composition, 1919–1938. New York: National Bureau of Economic Research. ———. 1952. Long-Term Changes in the National Income of the United States of America since 1870. Review of Income and Wealth Series 2. Layard, R. 2005. Happiness: Lessons from a New Science. New York: Penguin Press.

Measuring Welfare

●

137

Layard, R., G. Mayraz, and S. Nickell. 2007. The Marginal Utility of Income. CEP Discussion Paper No. 784, March. Lipton, M. and M. Ravallion. 1995. Poverty and Policy. In Handbook of Development Economics, vol. 3. Amnsterdam: Elsevier Science B.V. McFadden, D. 2005. The New Science of Pleasure: Consumer Behavior and the Measurement of Well-Being. Frisch Lecture, Econometric Society World Congress, August 20. McGillivray, M. and H. White. 1993. Measuring Development? The UNDP’s Human Development Index. Journal of International Development 5 (2): 183–192. Messinger, H. 1997. Measuring Sustainable Welfare: Looking Beyond GDP. Presentation at the annual meeting of the Canadian Economics Association. http://www.csls.ca/misc/cea9731.pdf Morris, D. M. 1979. Measuring the Condition of the World’s Poor. New York: Pergamon Press. Nicholson, W. 1992. Microeconomic Theory: Basic Principles and Extensions. Orlando: Dryden Press. Nordhaus, W. and J. Tobin. 1972. Is Growth Obsolete? Economic Growth, National Bureau of Economic Research, Fiftieth Anniversary Colloquium. New York: Columbia University Press. Ram, R. 1982. Composite Indices of Physical Quality of Life, Basic Needs Fulfillment, and Income: A “Principal Component” Representation. Journal of Development Economics 11 (2): 227–247. ———. 1992. International Inequalities in Human Development and Real Income. Economics Letters 38 (3): 351–354. Rayo, L. and G. S. Becker. 2007. Evolutionary Efficiency and Happiness. Journal of Political Economy 115 (2): 302–337. Sen, A. 1979. The Welfare Basis of Real Income Comparisons: A Survey. Journal of Economic Literature 17 (1): 1–45. ———. 1982. Choice, Welfare and Measurement. Cambridge, MA: MIT Press. ———. 1985. Commodities and Capabilities. Oxford: Oxford University Press. ———. 1999. Development as Freedom. New York: Alfred A. Knopf. ———. 2002. Health: Perception versus Observation. British Medical Journal 324 (7342): 860–861. Smith, V. K. 2007. Reflections on the Literature #3 (unpublished manuscript). Srinivasan, T. N. 1994. Human Development: A New Paradigm or Reinvention of the Wheel? American Economic Review (Papers and Proceedings) 84 (2): 238–243. Stigler, G. 1950. The Development of Utility Theory I and II. Journal of Political Economy 58 (4 and 5): 307–327 and 373–396. Studenski, P. 1958. The Income of Nations. New York: New York University Press. van Praag, B. M. S. and B. E. Baarsma. 2004. Using Happiness Surveys to Value Intangibles: The Case of Airport Noise. Tinbergen Institute Discussion Paper 04–24/3.

138

●

Bryan Roberts

Vanoli, A. 2005. A History of National Accounting. Amsterdam: IOS Press. Varian, H. 1992. Microeconomic Analysis. New York: W. W. Norton. Veenhoven, R. 2007. World Database of Happiness, Erasmus University Rotterdam. http://worlddatabaseofhappiness.eur.nl Zussman, A., N. Zussman, and D. Romanov. 2007. Does Terrorism Demoralize? Evidence from Israel. Hebrew University of Jerusalem. Working Paper.

CHAPTER 7

Why and How to Move from Capturing Perception of to Quantifying Corruption? Omer Gokcekus and Justin Myzie

1 Introduction Corruption undermines economic development and damages social stability.1 The current literature on corruption comprises of empirical cross-country analyses and surveys demonstrating perceived corruption’s macrolevel harmful impact on economic, political, and social outcomes in a country. 2 Corruption impedes income growth of the poor (Gupta and Alonse-Terme 2002; You and Khagram 2005), and corruption reduces productivity within a country (Lambsdoorf 2003b). Outside investors’ perception of corruption may also discourage FDI and other capital inf lows (Mauro 1997; Lambsdoorf 2003a; Wei 2000). 3 The course of privatization in highly corrupt countries tends to be less efficient than in countries where corruption is controlled; the inefficiency fosters an environment conducive for monopolies (Bjorvatn and Soreide 2005). Researchers have also identified macrolevel characteristics of countries with corruption. Corruption is prevalent in countries that have a large public sector (Mauro 2002);4 have poor governance (Rose-Ackerman 2004); have a low level and a low quality of openness (Gokcekus and Knoerich 2006); and tend to lack political and civil rights (Harms and Ursprung, 2002). Countries with an unstable legal administration and unstable markets support the growth of corruption (Lambert-Mogiliansky 2002). Klitgaard (2000) identifies stages in anticorruption efforts. Susan Rose-Ackerman (2004) proposes different options for addressing

140

●

Omer Gokcekus and Justin Myzie

corruption. Shulze and Frank (2003) find that corruption decreases as salary increases. Obviously, eradicating corruption is a noble cause and as Klitgaard, Rose-Ackerman, and others discuss we have a pretty good idea how to fix this problem, mainly by increasing monitoring, punishment, and installing appropriate incentives. With some exaggeration, we may claim that we know everything about corruption: What causes corruption; what corruption does to economic growth, poverty, and so on. Yet, the following two questions are not answered; actually they are not even asked explicitly: At what cost should we eradicate corruption? Should we care about the cost and benefit of a reform aiming to lower if not eradicate corruption? Economic thinking dictates that for every decision we consider both cost and benefit, and we do so at the margin. If unlimited resources were available, eradicating corruption would always be a worthwhile project. Unfortunately that is not the case and therefore like any other economic decision, what to do with corruption should boil down to marginal cost and benefit of reducing corruption. We know exactly the cost of particular public sector reform projects, that is, the cost of an attempt to reduce corruption. For instance, this type of information is available at the project Web sites of the World Bank and other institutions. Yet there is nothing on the potential benefits of reducing corruption, because corruption is described as something “perceived” instead of in terms of things quantifiable, for example, in dollars. Accordingly, if a reform attempt is targeting a 20% reduction in corruption, because we do not know the level of corruption, we know nothing about the marginal benefit. Given the recent emphasis on and amount of resources devoted to eradicating corruption, this is clearly a disturbing situation. In this chapter, we provide a review of the sources that attempt to quantify corruption. Before introducing these ideas, first, we provide a definition of corruption so that we know exactly what we need to quantify; second we discuss the existing corruption indicators to understand how we are currently capturing corruption. After these two sections, we brief ly discuss how we should measure corruption by examining a few recent attempts at it. Our concluding remarks are in our last section of the chapter. 2 What Is Corruption? Schleifer and Vishny (1993) call the phenomenon “government corruption,” defined as “the sale by government officials of government

From Capturing to Quantifying Corruption

●

141

property for personal gain” (p. 599). These goods are neither demanded nor sold for their own sake, but to gain access to economic activity otherwise inaccessible, to wit “(l)icenses, permits, passports, and visas” (p. 599). Schleifer and Vishny distinguish between cash and in-kind bribes, and between corruption with and without theft. Corruption without theft is where the official continues selling the service at the government rate, keeps the money, and does not record the sale. Corruption with theft is where the official sells the service and demands a bribe, taking the bribe for personal use, and passing the price of the sale to the government and recording the sale. Susan Rose-Ackerman (2004) analyzes the “common definition” of corruption—the, “misuse of public power for private or political gain.” More narrow forms of corruption are left out; she specifies, “morally corrupting activities” (p. 1) and “run-of-the-mill constituency-based politics” (p. 1). Corruption is also called “state capture,” a narrower form, according to Rose-Ackerman. She defines “state capture” as “the problem of creating open democratic/market societies in states where a narrow elite has a disproportionate inf luence on state policy” (p. 1). Svensson (2005) calls the phenomenon both corruption and public corruption. He starts with the “common definition” “the misuse of public office for private gain” (p. 20). This is very close to Schleifer and Vishny. Svensson elaborates on this definition. His point is the phenomenon is not “clear-cut,” and thus cannot have a universal definition. Svensson does focus on bribery, but notes bribery is not the only form of corruption. Hellman and Kaufmann (2001) refer to the phenomenon as “state capture.” For them state capture is a form of “grand corruption.” They define state capture as “the efforts of firms to shape the laws, policies, and regulations of the state to their own advantage by providing illicit private gains to public officials” (p. 1). Hellman and Kaufmann’s (2001) definition of state capture is lucid; but what they omit—what state capture is not—has implications for measuring corruption. They state most types of corruption inf luence how laws are “implemented,” but state capture refers to how those laws are “formed.” Transparency International calls corruption simply “the misuse of entrusted power for private gain”5 (p. 1). TI further differentiates between “according to rule” corruption and “against the rule” corruption. A facilitation payment, where a bribe is paid to receive preferential treatment for something that the bribe receiver is required to do by law, constitutes the former. The latter, on the other hand, is a bribe paid to obtain services the bribe receiver is prohibited from providing.

142

●

Omer Gokcekus and Justin Myzie

3 How Are We “Measuring” Corruption? Corruption is clandestine, which makes measuring difficult. Measuring it directly may be impossible. Thus most measurements are made by proxy. To add insult to injury, instead of corruption itself, the perception of corruption is measured. These perceptions of corruption are attained through surveys. The surveyed vary: the proverbial stakeholder, individuals, officials, multilateral donors, NGOs, domestic firms, foreign firms, foreign firms with domestic headquarters, and so on have been surveyed. The questions asked also vary. Many organizations conduct the surveys, most with their own take, tweaking the surveyed and the survey questions for their particular needs. The World Bank has the most widely used survey—Business Environment and Enterprise Performance Survey (BEEPS). Knott (2003) in a Transparency International report counted 14 unique surveys, including BEEPS. The surveys ask respondents relevant background data as well as their perceptions. Kaufmann, Kray, and Mastruzzi (2006) argue these perceptions are the best and only information available. There are two main methods for measuring corruption. One is to track a country’s institutional features. Institutional features are derived from a country’s aggregate survey results. The other method is to audit specific projects.

3.1 Institutional Features Quantifying corruption is a relatively nascent field. One of the oldest or first used methods for measuring corruption is tracking a country’s institutional features. This tool grew in popularity during the 1990s. The World Bank, European Bank for Reconstruction and Development, United Nations, Transparency International, Business International, and other institutions, as well as regional groups and individual countries measure and compare institutional features. The definition of “institutional features” is broad and f luid. It generally encompasses the country’s legal system, economic policies, property rights, political system, and culture. Narrower definitions are dependent on the surveys and their codebooks. Sometimes institutional features are defined by the investigation. Di Tella and Schargrodsky (2003) examined public hospitals and their procurement officers. Hellman and Kaufmann (2001) analyze, “six institutions: parliament, the executive apparatus, the criminal courts, the civil courts, the central bank, and political parties” (p. 2). In the Kurtzman Group’s Opacity

From Capturing to Quantifying Corruption

●

143

Index 2004 institutions include “business and government corruption, an ineffective legal system, deleterious economic policy, inadequate accounting and governance practices, and detrimental regulatory structures” (p. 2). The measurements and comparisons are made at the macro and the micro levels. Two of the most frequently used surveys are the World Development Report and World Business Environment Survey. The World Development Report is succeeded by BEEPS. One of its primary purposes is to examine the relationship between state governance and firm governance. BEEPS “is a survey of over 4000 firms in 22 transition countries conducted in 1999–2000 that examines a wide range of interactions between firms and the state.”6 According to Hellman et al. (2000, p. 5), “BEEPS represents the first major attempt to provide sound empirical measures of various forms of ‘grand’ corruption, such as ‘state capture’ (purchase of laws and decrees by enterprises) as well as corruption in public procurement, and to measure the characteristics of firms that engage in such forms of corruption” (p. 7). One of the problems, as previously mentioned, with analyzing institutional features is it is limited to telling you about the frequency and level of corruption. BEEPS does offer us something at the microlevel. In addition to frequency and level, institutional comparisons help our understanding of “uncertainties associated with corruption, the recipients of bribes, the extent of bribery associated with public procurement and the extent to which there is bureaucratic accountability in the provision of public services” (p. 36). In addition to BEEPS the World Bank published the World Business Environment Survey. The first survey conducted during 1999 and 2000 surveyed more than 10,000 firms in 80 countries. Questions in the survey focused on the quality of the investment climate as shaped by domestic economic policy; governance; regulatory atmosphere, infrastructure, and financial impediments; and assessments of the quality of public services. The International Finance Corporation’s (of the World Bank) Governance Indicators Project conducted surveys every two years between 1996 and 2002 and annually since 2002. It collects data on six governance indicators: Voice and Accountability, Political Stability and Absence of Violence, Government Effectiveness, Regulatory Quality, Rule of Law, and Control of Corruption. The survey has gathered data on at least 206 countries for each indicator. Transparency International’s Corruption Perceptions Index is one of the most important indices of corruption. The annual index ranked

144

●

Omer Gokcekus and Justin Myzie

41 countries in 1995, its first year, and expanded to rank 180 countries in 2007. Transparency International relies on “14 different polls and surveys from 12 independent institutions” (FAQs, p. 2). Since 1996 Transparency International has published annually its Bribe Payers Surveys. It examines the likelihood of firms from 30 exporting countries to pay bribes. Transparency International also publishes the Global Corruption Barometer, “a survey that assesses general public attitudes toward and experience of corruption,” and the National Integrity System looking at states’ institutions. Freedom House’s Freedom in the World reports on the political rights and civil liberties of countries and territories. The first report in 1972 examined 151 countries and 45 territories; the 2007 edition reported on 191 countries and 15 territories. The survey asks 25 questions, 10 of political rights and 15 of civil liberties. Freedom House publishes 3 additional reports: Freedom of the Press; Nations in Transit examines 29 formerly communist countries of Europe and Eurasia; and Countries at the Crossroads, “an annual survey of government performance in 60 strategically important countries.” Some organizations have a regional focus. The Caucasus Research Resource Centers (CRRC) focuses on the South Caucasus namely Armenia, Azerbaijan, and Georgia. The CRRC started data collection in 2003, conducting its first survey in 2004. Since then, it has expanded its data collection in breadth and depth among the three countries. The corporate sector also measures institutional features. The consultancy firm McKinsey & Company produces reports on a vast range of corporate issues and concerns. Much of the firm’s research is specific to their customers needs, but it also conducts surveys and research for knowledge creation. The firm published the results of its most recent foray into governance research in their quarterly reports between 2001 and 2003. Survey data have inherent limitations. Respondents may lie or may simply have the wrong perception of corruption. Despite its limitations Kaufmann et al. (2006) find “the correlation of perceptions of corruption from cross-country surveys of domestic firms tend to be very highly correlated with perceptions of corruption from commercial risk rating agencies or multilateral development banks” (p. 2). This observation does not show that surveys are fairly reliable indicators of corruption. Clearly, these indicators are subjective and not comparable with each other. More importantly, they are not helpful in quantifying corruption.

From Capturing to Quantifying Corruption

●

145

3.2 Auditing Projects The other main method is to audit specific projects. Auditing looks at the projects’ finances or its outputs. Audits are simple and look at where and how the project spends the money or at the returns of the project. Benjamin Olken (2005) is one of the better examples of measuring corruption by auditing specific projects. Olken conducts financial audits on many road projects in Indonesia. Like many other papers, his is not about quantification, his focus is to determine how much audits can reduce corruption. Olken, like the papers to follow, does not set out a model to quantify corruption at the microlevel, or even at all. His research tests the Becker-Stigler hypothesis, not for the sake of quantification itself. He tests it by analyzing the microlevel data of a number of road projects in Indonesia. He seeks to determine if (1) monitoring and (2) community participation reduce corruption. Olken “designed and conducted a randomized, controlled field experiment in 608 Indonesian villages” (p. 2). All of the villages were to build roads. Olken randomly selected some and told them after designing the project that before construction or procurement they would be monitored, increasing the probability of auditing from 4% to 100%. He designed two community participation experiments: both groups received invitations to “accountability meetings”; the second group (but not the first) also received anonymous comment forms to be completed and deposited in a drop box at the meeting. The comment forms increased participation. For the measure of corruption Olken used experts to survey the completed projects. They determined for each project the quantity of materials used, the prices of supplies, and the wages paid. Olken then compares these estimates with the village’s reports of the cost to build the road. The possibility of an audit reduces corruption. The promise of an audit reduced, “missing expenditures of an average of 8 percentage points” (p. 4). Community involvement also resulted in lower missing expenditures, but did not reach statistical significance. The available data imposes limitations on what we can analyze and how. A country’s institutional features contribute toward our understanding of macro corruption, but count little toward quantification. Audits are too specific to gain a country-wide or even sector-wide understanding of corruption. Thus we must use survey data. Survey data allows us to examine microlevel corruption. Audits have drawbacks too: Audits “are not suited for cross-country comparisons or for monitoring over time” (Kaufmann et al. 2006, p. 1).

146

●

Omer Gokcekus and Justin Myzie

Auditing a, “project cannot distinguish between corruption, incompetence and other sources of noise” (p. 2). 4 How to Measure Corruption? In this section, we closely examine five recent studies attempting to “measure” corruption: (1) Di Tella and Schargrodsky (2003); (2) Reinikka and Svensson (2004); (3) Olken and Barron (2007); (4) Gorodnichenko and Peter (2007); and (5) Gokcekus and Muedin (2008). The commonality among these papers is not the quantification of corruption per se. Each is a unique contribution to our understanding of corruption in their respective contexts. The authors quantify corruption, not just measure the incidence of corruption. Granted these studies’ foci were in areas other than cost/benefit analyses. But their methodologies can be applied for the sort of analyses Rose-Ackerman found too difficult. The models developed in these papers have potential to give the policymaker the ability to conduct a cost-benefit analysis of a reform package before its implementation. The models measure corruption at the microlevel. It is scalable adapting with equal ease for the country in the aggregate or for specific sectors. 4.1 Di Tella and Schargrodsky (2003) Di Tella and Schargrodsky (2003) perform microlevel analyses in their test of the Becker-Stigler model (of the relationship between wages and corruption) of corruption in hospital purchasing agents in Buenos Aires. They analyze the prices paid for identical basic medical supplies across a number of hospitals in Buenos Aires over a period of time divided into three parts. The auditing is low in the first part, auditing is high (the crackdown) in the second part and the last part is when auditing intensity subsides. They determine auditing reduction of prices by 10%. Anecdotal evidence indicates corruption accounts for the high prices paid for supplies, but they cannot rule out “lack of motivation” nor “lack of information” (p. 271). Di Tella and Schargrodsky surveyed 360 doctors and nurses within Buenos Aires; they perceived corruption as moderate to high, an average level for Argentina. Insofar as Di Tella and Schargrodsky test the Becker-Stigler model, they follow it using data derived from their surveys. They determined the prices paid for basic medical supplies by using the Government of the City of Buenos Aires reports requiring each of its 33 hospitals to report the price of inputs of certain basic

From Capturing to Quantifying Corruption

●

147

medical supplies. Di Tella and Schargrodsky interviewed procurement officers in each hospital to determine there wages; they also collected relevant demographic data for the officers such as “gender, age, tenure on the job, marital status, head-of-household status, and education” (p. 276). 4.2 Reinikka and Svensson (2004) Reinikka and Svensson (2004) describe the results of their research on ends of capitation grants given for education in Uganda. They determined local officials captured most of the grants. Like other researchers they determined political participation played a large role in the size of capture. The better off the community a school served, the less it suffered from capture. Schools “on average, received only 13 percent of the grants.” Most schools failed to receive funding as a result of capture. Reinikka and Svensson “focus on a large public educational program in Uganda—a capitation grant to cover schools’ nonwage expenditures— financed and run by the central government, using district offices as distribution channels.” Reinikka and Svensson also used surveys. They designed “a public expenditure tracking survey—to gauge the extent to which public resources actually filtered down to facilities” (p. 684). The survey covered 250 schools. The surveys allowed them to measure the difference between the central governments capitation grants and the amount received by the schools. They examined data from central ministries, local governments, and schools. Their “school-specific measure of capture is (1) capitation grants received/intended capitation grants from the center, where a low value indicates extensive capture” (p. 685). The survey also analyzed school records needed to comply with the requirements of the central government. Local officials siphoned off 87% for personal gain or for other government uses; 73% of schools received less than 5% of intended funds; 10% received 50% or more of the intended grant. 4.3 Olken and Barron (2007) Olken and Barron (2007) examine the degree to which standard pricing theories from industrial organization are consistent with actual patterns of bribes and extortion payments. They examine the bribes paid by truck drivers on two routes through Aceh. Drivers make the roundtrip with an assistant. Olken and Barron placed a surveyor on a number of

148

●

Omer Gokcekus and Justin Myzie

trucks who worked as an assistant while making the observations. Drivers make three payments to officials—corruption. These are made at checkpoints, weigh stations, and as protection payments. Police and military officers demand bribes from the drivers. Despite corruption’s elusiveness Olken and Barron’s surveyors measured it on over 300 trips. The illegal payments account for 13% the trips cost. They also determined the breakdown of the remainder of each trip’s cost. Direct observation allowed Olken and Barron to observe corruption in action. They determined the exact cost of each bribe as its beneficiary. They made their observations over a period when troops were withdrawing from the province. This allowed Olken and Barron to determine the differences in bribes paid between higher and lower levels of troops. Admittedly direct observation is a tool too difficult to use in many situations, but when it can be used it provides the policymaker with a precise assessment of corruption. Such a measurement lends itself for cost-benefit analysis. Direct observations such as this allow the policymaker to conduct an analysis in the aggregate. It is equally adaptable to smaller analyses such as at the road level, truck, cargo, or even weight station. 4.4 Gorodnichenko and Peter Gorodnichenko and Peter claim their study is the first quantification of corruption using micro data. They observe “labor market outcomes, household spending, and asset holdings” in Ukraine. They find public sector workers make 24% to 32% less than private sector workers. They also find corruption accounts for “0.9–1.2% of Ukraine’s GDP” (p. 965). Their methodology is to estimate the residual wage differentials between the public and private sectors, compare these differentials with the sectoral differences in household expenditures and asset holdings, and then use the conditions of labor market equilibrium to compute a monetary value of unobserved nontaxable compensation (i.e., bribery) at the aggregate level (p. 964). Gorodnichenko and Peter follow the trends of analyzing survey data; they use the Ukrainian Longitudinal Monitoring Survey (ULMS). The survey consists of a variety of data including schooling, experience, tenure, full-time worker, union participation, firm size, and region as well as other data; some is less significant; some is more. Much of the data types are common to all surveys. The portability of the data types permits this model to be repeated for many developing countries.

From Capturing to Quantifying Corruption

●

149

Gorodnichenko and Peter examine three sectors of the economy: private, public, and state-owned enterprises. They observe that “sectoral differences in expenditures and asset holdings are considerably smaller than the sectoral differences in wages” (p. 981). They attribute the difference to bribes. Based on their conclusion that corruption is 0.9% to 1.2% of Ukraine’s GDP, Gorodnichenko and Peter advise that 1% of Ukraine’s GDP be used to reduce corruption (p. 988). 4.5 Gokcekus and Muedin (2008) Gokcekus and Muedin design a model to quantify corruption at the microlevel using the human capital earnings model. Their model differs greatly from the others. Unlike previously covered research they seek to quantify corruption in itself, using a human capital earnings equation to quantify administrative corruption in the public sector. Regression analyses are conducted based on information from available surveys administered to public officials in Albania. After accounting for officials’ characteristics, e.g., schooling, experience, gender, type of agency, and public and private sectors’ features, they deduce the administrative corruption averaged 2.6 times officials’ current salary in Albania, equivalent to 16.7% of the country’s GDP. After referring to the human capital earnings theory, which suggests wages are related to productivity; and productivity in turn depends on such economic factors as knowledge, job experience, and “other non-economic factors, such as gender and place of employment” (p. 5). They decompose wage differentials for civil servants. Gokcekus and Muedin also examine the costs of moving from the public sector to the private sector when they examine age, agency, and positions of civil servants. They derived an estimate of corruption in Albania’s public sector, based on public officials’ actual salaries in the public sector and their willing-to-accept salaries in the private sector. This study was the first attempt to quantify administrative corruption on the microlevel anywhere in the world by looking at public officials’ salaries. To judge the robustness of this paper’s findings, further studies are necessary to look into the estimates of governance in Albania and conduct analyses based on both public and private sector data. Also, cross-country studies might be undertaken to compare and contrast these findings with other findings from transitioning economies or developing countries. After accounting for factors that impact wages such as schooling, experience, gender, agency type, personal characteristics, and public

150

●

Omer Gokcekus and Justin Myzie

sector features, they conclude that money from bribes composes a large amount of a public official’s income; they estimated that the amount is 257.3% of a public official’s salary. Such a figure implies a significant impact of corruption on the growth of the public sector over the private sector, which is particularly significant for economies in the midst of transition. With corruption composing the bulk of a public official’s salary, it is no wonder bloated public administrations with weak private sectors are so prevalent in corrupt countries. Private sector wages cannot compete with such a high amount of corruption income in the public sector. Without appropriate incentives in the private sector, few people will aim to enter that sector and few people will educate and prepare themselves for the sector. With these characteristics, there are few enticements for FDI or capital inf lows; this will further retard the progress of privatization, and the path to economic growth. 5 Concluding Remarks: What Is Next? There are numerous reasons why the policymaker needs to know whether a program is realizing real net gains. It seems the world is in a trend where accountability is demanded. The donors and stakeholders of the international institutions demand accountability from the World Bank, the United Nations, as well as other institutions. Donors of all sizes also seem to want to know there are real returns on their investment. When the fight is against corruption the only way to determine the net returns is to first quantify corruption. Accountability is likely to remain a priority for donors in the future. More important to long-term sustainability of the developing world is administering projects with the highest returns. The policymakers need to be aware of the strengths as well as weaknesses of perception-based indicators and auditing projects. Also, it is important to recognize the emerging new wave of studies focusing on measuring corruption. Unless, there is a clear understanding of the potential benefits of eradicating corruption, proponents of these reforms may end up without the publics’ support behind them. Notes 1. For details, see Bardhan (1991), Grindle (1997), Gupta, Davoodi, and Alonso-Terme (1998), Klitgaard (1998), Mauro (1997), Rose-Ackerman (1999), Tanzi and Davoodi (1997), and World Bank (2000b).

From Capturing to Quantifying Corruption

●

151

2. Bardhan (1991), Gupta et al. (2002), Knack and Keefer (1995), Klitgaard (1998), Mauro (1997), Tanzi and Davoodi (1997), and World Bank (2000a). 3. However, Egger and Winner (2005) have noted a positive relationship between corruption and FDI in their empirical cross-country analyses. 4. Graeff and Mehlkop (2003) point out that some regulation from the public sector can help reduce corruption and that a large public sector in a rich country tends to have a lower level of corruption. 5. For details, visit Transparency International’s Web page on “Frequently Asked Questions About Corruption,” http://www.transparency.org/news_ room/faq/corruption_faq 6. For details, see the BEEPS interactive dataset at http://info.worldbank.org/ governance.

References Bardhan, P. 1997. Corruption and Development: A Review of Issues. Journal of Economic Literature 35 (3): 1320–1346. Bjorvatn, K. and T. Soreide. 2005. Corruption and Privatization. European Journal of Political Economics 21 (4): 903–914. Di Tella, R. and E. Schargrodsky. 2003. The Role of Wages and Auditing during a Crackdown on Corruption in the City of Buenos Aires. The Journal of Law and Economics 46 (1): 269–292. Egger, P. and H. Winner. 2005. Evidence on Corruption as an Incentive for Foreign Direct Investment. European Journal of Political Economy 21 (4): 932–952. Gokcekus, O. and J. Knoerich. 2006. Does Quality of Openness Affect Corruption? Economics Letters 91 (2): 190–196. Gokcekus, O. and A. Muedin. 2008. Quantifying Corruption in the Public Sector by a Human Capital Earnings Equation. International Review of Economics 55 (3): 243–252. Gorodnichenko, Y. and P. Klara. 2007. Public Sector Pay and Corruption: Measuring Bribery from Micro Data. Journal of Public Economics 91 (5–6): 963–991. Grindle, M. 1997. Getting Good Government: Capacity Building in the Public Sector of Developing Countries. Boston: Harvard Institute for International Development. Gupta S. and R. Alonse-Terme. 2002. Does Corruption Affect Income Inequality and Poverty? Economics of Governance 3 (1): 23–45. Gupta, S., H. Davoodi and E. Tiongson. 1998. Corruption and the Provision of Health Care and Educational Services. Working Paper 00/116, International Monetary Fund, Washington, DC. Harms, P. and H. Urpsurng. 2002. Do Civil and Political Repression Really Boost Foreign Direct Investments? Economic Inquiries 40 (4): 651–663.

152

●

Omer Gokcekus and Justin Myzie

Hellman, J. S., G. Jones, D. Kaufmann, and M. Schankerman. 2000. Measuring Governance, Corruption, and State Sapture: How Firms and Bureaucrats Shape the Business Environment in Transition Economies. Washington, DC, World Bank Institute Governance, Regulation and Finance and European Bank for Reconstruction and Development Chief Economist’s Office. Hellman, J. and D. Kaufmann. 2001. Confronting the Challenges of State Capture in Transition Economies. Finance and Development 38 (3): 31–35. Kaufmann, D., A. Kray, and M. Mastruzzi. 2006. Measuring Corruption: Myths and Realities. World Bank. Washington, DC, www.worldbank.org/wbi/ governance/pdf/six_myths_measuring_corruption.pdf Klitgaard, R. 1998. International Cooperation against Corruption. Finance and Development 35 (1): 3–6. ———. 2000. Subverting Corruption. Finance and Development 37 (2): 2–5. Knack, S and P. Kiefer. 1995. Institutions and Economic Performance: CrossCountry Tests Using Alternative Institutional Measures. Economics and Politics 7 (November): 207–227. Knott, L. 2003. Measuring Corruption Workshop Report. Merida, Mexico: Transparency International. Kurtzman, J., G. Yago, and T. Phumiwasana. 2004. The Opacity Index 2004. MIT Sloan Management Review 46 (1): 38–44. Lambert-Mogiliansky, A. 2002. Why Firms Pay Occasional Bribes: The Connection Economy. European Journal of Political Economics 18 (1): 47–60. Lambsdorff, J. 2003a. How Corruption Affects Persistent Capital Flows. Economics of Governance 4 (3): 229–243. ———. 2003b. How Corruption Affects Productivity. Kyklos 56 (4): 457–474. Mauro, P. 1997. Why Worry about Corruption. IMF Economic Issues Series No. 6. Washington, DC: International Monetary Fund. ———. 2002. The Persistence of Corruption and Slow Economic Growth. IMF Working Paper Series No 02/213. Washington, DC: International Monetary Fund. Olken, B. 2005. Monitoring Corruption: Evidence from a Field Experiment in Indonesia. National Bureau of Economics Research. www.nber.org/papers/ w11753. Olken, B. and P. Barron. 2007. The Simple Economics of Extortion: Evidence from Trucking in Aceh. NBER Working Paper No. W13145. Reinikka, R. and J. Svensson 2004. Local Capture: Evidence from a Central Government Transfer Program in Uganda. Quarterly Journal of Economics 119 (2): 678–704. Rose-Ackerman, S. 1999. Corruption and Government: Causes, Consequences, and Reform. Cambridge: Cambridge University Press. ———. 2004. The Challenge of Poor Governance and Corruption. Copenhagen Consensus. www.copenhagenconsensus.com/Files/Filer/CC/Papers/Governance_ and_Corruption_300404_%286.4MB_version%29.pdf Schleifer, A. and R. Vishny. 1993. Corruption. Quarterly Journal of Economics 108 (3): 599–617.

From Capturing to Quantifying Corruption

●

153

Schulze, G. G. and B. Frank. (2003) Deterrence versus Intrinsic Motivation: Experimental Evidence on the Determinants of Corruptibility. Economics of Governance 4 (2): 143–160. Svensson, J. 2005. Eight Questions about Corruption. Journal of Economic Perspectives 19 (3): 19–42. Tanzi, V. and H. Davoodi. 1997. Corruption, Public Investment, and Growth. IMF Working Paper No 97/139. Washington, DC: International Monetary Fund. Transparency International. 2005. Frequently Asked Questions about Corruption. Transparency International. Wei, S. 2000. Local Corruption and Global Capital Flows. Brookings Papers on Economic Activity. Brookings Papers on Economic Activity (2): 303–346. World Bank. (2000a). Anticorruption in Transition: A Contribution to the Policy Debate. Washington, DC. ———. 2000b. The BEEPS Interactive Dataset. info.worldbank.org/governance/ beeps You, J. and S. Khagram. 2005. A Comparative Study of Inequality and Corruption. American Sociological Review 70: 136–157.

This page intentionally left blank

CHAPTER 8

New Interpretations of Indices of Economic Freedom King Banaian and William Luksetich

1 Introduction The Heritage Foundation/Wall Street Journal Index of Economic Freedom is the best known index documenting the factors affecting economic freedom and showing the relation between measures of economic well-being (Gross Domestic Product in Purchasing Power Parity—GDPPPP, hereafter) and economic freedom. Economic freedom as defined in their annual Index of Economic Freedom is “the absence of government coercion or constraint on the production, distribution, or consumption of goods and services beyond the extent necessary for citizens to protect and maintain liberty itself. In other words, people are free to produce, consume, and invest in the ways they feel are most productive” (Beach and Miles 2005, p. 1). The index is the unweighted average of 10 factors deemed to be equally important to determining the level of economic freedom in a country. It includes measures of trade policy, the fiscal burden of government, government intervention in the economy, monetary policy, capital f lows and foreign investment, banking and finance, wages and prices, property rights, regulation, and informal market activity (p. 2). Each factor for each country in the survey is rated on a 1–5 scale, the number 1 representing the greatest degree of freedom and 5 the least. Usually a score of 1, 2, 3, 4, or 5 is assigned; although in some categories the scoring is finer, that is, 1, 1.5, 2, and so on. The validity of the weighting system has been examined by Richard Roll (2004) and Lewis Snider (2003, pp. 181–228). Both employ principal

156

●

King Banaian and William Luksetich

components analysis (PCA) to measures of political and economic freedom. PCA is an effective technique for taking a set of indicators that relate to a concept such as economic freedom and determine how many independent dimensions exist. The n indicators are reduced to a set of m < n orthogonal components that are linear combinations of the indicators. Roll argues that an equal weighting of the indicators in the Heritage measure is not refuted by PCA evidence. Snider finds that the high correlations allow for a reduction of the data to no more than three dimensions or components. Moreover, “Both the correlation and the factor analyses clearly show how pervasively the attributes of secure property rights and reliable and impartial contract enforcement affect the role of government in society” (p. 219). While Roll’s major concern in his paper is the correlation among the measures of economic freedom and whether the equal weighting scheme of the measures is the appropriate weighting scheme, Snider goes a step further and puts emphasis on the causal effects of economic freedom on economic performance. For example, in one place Snider writes “. . . secure property and reliable and impartial contract enforcement are essential for long-term investment of the sort that promotes economic growth” (p. 185). Elsewhere, he emphasizes, “that a low political risk, a favorable investment climate, and a minimum of government intervention in the economy contribute significantly to increased foreign direct investment per capita which, in turn, should promote economic growth” (p. 219). Clearly, the role of economic freedom in affecting economic growth, while not the purpose of Roll’s discussion, is recognized in the Heritage Foundation’s publications. Beach and Driscoll (2003, p. 27) note, “Properly constructed constitutions incorporate the concept of negative liberty, constraining governments to the protection of person and property. A system of private property fosters economic growth and wealth creation.” Robert Pollock (2003) discusses the lack of progress in the Middle East. He first notes the importance economists Milton Friedman and Robert Lawson place on the role of the rule of law as more basic than property rights (p. 31). Subsequent to his discussion of the particulars in the area Pollock concludes: After all, the Arabs do not lack the desire for freedom-according to the UNDP, about 50 percent of adolescents polled say they would like to emigrate. They do not lack for talent, as countless success stories of those who have already do so attest. And they do not lack an understanding of markets, as anyone who has ever visited an Arab souk would know. The problem is bureaucracy, corruption, and uncertainty make it difficult to

New Interpretations of Economic Freedom

●

157

build a business bigger than a market stall. If accountable government and the rule of law could be brought to the region, fortunes could grow rapidly. (p. 33)

Matt Laar’s (2003) essay contrasts with Pollock’s as he notes three key lessons that led to much different results in Estonia. First, the progress observed in Estonia relied on the recognition of the rule of law and that “There can be no market economy and democracy without laws, clear property rights, and a functioning justice system” (p. 36). Second, adopt reforms and stick with them. For gain, there must be some pain. It is difficult to live with the pain, but it must be done. Finally, competition must be supported and individuals must be allowed to keep the income they earn. Estonia abolished tariffs, became a free trade zone, dropped subsidies, and introduced a f lat tax. Corporate taxes were abolished on income reinvested domestically (p. 36). Scandinavia’s well-known social welfare systems have long been believed to have handicapped these countries because of their high incidence of taxation and generous social welfare programs. Their disincentive effects have resulted in lower economic growth than otherwise would have occurred. Sara Z. Fitzgerald (2003) has noted that in recent years the level of economic freedom in these countries has been improving. All of these countries have improved their overall scores since past year, with four out of five now ranked “free” on the Index of Economic Freedom. Notably, Sweden and Iceland have achieved the rank of “free” for the first time. Only Norway, which has adopted some market-orientated reforms, remains “mostly free” (p. 39). The privatization taking place in the Scandinavian countries, according to Fitzgerald, that includes an open trade policy, will increase foreign investment. Moreover, the strong rule of law will also fuel economic growth in these countries. As investors continue to abandon markets that are riddled with corruption, whether in Southeast Asia or Latin America, they will be more likely to look to the Nordic countries, where their investment would be buttressed by the rule of law. It is up to the governments in Scandinavia to continue to institute sound market-oriented policies to lift their economies to even greater heights of growth and prosperity (p. 45). 2 Two Issues It seems clear that indices of economic freedom, particularly the one under consideration in this chapter, provide an adequate and useful measure of the degree of economic freedom within countries. Nevertheless,

158

●

King Banaian and William Luksetich

some of the components of the overall index are more likely to be determinants of economic freedom, while others are expressions of the degree of economic freedom. For example, extensive government regulation, the presence of illegal markets, and corruption are the expression of the absence of property rights and, consequently, while associated with the absence of economic freedom and its accompanying prosperity, it is the absence of private property rights that are the causal factor in lesser economic prosperity. Moreover, high taxation and government spending are also expression of the degree of economic freedom and the lessening of property rights. It is difficult to imagine a free monetary regime without significant private property rights. On the other hand, there is the possibility of a significant degree of property rights even though there might be significant trade barriers having the effect of restraining foreign investment and imports, while encouraging exports. When looking at causal factors within the index of economic freedom, one might be encouraged to look at the degree of property rights and the extent of free trade policy, especially while checking the major factors affecting economic growth. Second, examinations of the relationship between economic prosperity and economic freedom usually take the form of plotting GDPPPP on the vertical axis and the degree of economic freedom on the horizontal axis. Fitting of a regression line between the two measures shows that as economic freedom increases, so does economic welfare. Moreover, it appears that the relationship between the two measures is a smooth one. It seems plausible that this is case; although it also seems plausible that a country may have to reach some level of economic freedom before there can be significant economic progress. Both issues were examined in our previous work (Banaian and Luksetich 2001) on central bank independence and economic performance. The Cukerman index of central bank independence is an index comprising a set of factors indicative of the central bank’s independence of the political sector. The components of the central bank index are continuous numerical values and the index used in studies of the effects of central bank independence use either the central bank index or some weighted average of the components of the central bank index. We dissected the central bank index into its component parts and regressed inf lation rates of 54 countries on each of the components. Our results showed that it was only the term of office of the governor(s) and the structure of conf lict resolution that were associated with lower inf lation rates. Moreover, we also found that only the most wideranging powers to the central bank in resolving conf licts with the

New Interpretations of Economic Freedom

●

159

government lead to lower inf lation rates. We concluded that without such powers, economic agents may be very skeptical of the ability of central banks to resist government pressure to deviate from the goal of price stability. In this chapter, we model the usual relationships between measures of economic welfare and indices of economic freedom dichotomously. Our results are interpreted as showing that the relationship between the two are not continuously linear; rather they show that a minimum level of economic freedom must be attained before substantial improvements in economic welfare are attained. Moreover, most of the components of the economic freedom measure are irrelevant in affecting economic welfare. Indeed, it is only the presence of significant degrees of private property rights that are important in affecting economic welfare. We are able to use the linear measures to expand the results. Table 8.1 contains the simple correlations between the overall country freedom scores and country scores in the various categories. Note the high correlations between the overall index and the property rights, regulation, and informal market scores. Moreover, note the high correlations between the term of office of the governor(s) and the structure of conflict resolution that were associated with lower inf lation rates. Also, we found that granting the most wide-ranging powers to the central bank in resolving conf licts with the government will lead to lower inf lation rates. We concluded that without such powers, economic agents may be very skeptical of the ability of central banks to resists government pressure to deviate from the goal of price stability. 3 Empirical Results The measurements in the Heritage Index are indicators of economic freedom, not necessarily causative. As noted earlier, countries most free receive a score of 1, least free a score of 5. For ease of interpretation, we reverse the scores, so that the least free receive a score of 1 and the most free a score of 5. Table 8.2 contains the simple regression results for the relation between per capita GDP in 1990 international dollars and the components of the economic freedom measure. With the exception of the fiscal policy and government intervention measures, all of the components of the overall index of economic freedom are positively and significantly related to per capita GDP. The fiscal policy measure is a weighted average of a country’s top marginal income tax rate (25%), its top corporate marginal tax rate (50%), and the year-to-year change in

160

●

King Banaian and William Luksetich

government expenditures as a percent of GDP (25%). The government intervention measure comprises measure of state and federal ownership of production facilities and consumption of as a percent of GDP. Privatization factors are also taken into account. Not included in this measure are government regulation and the presence of wage and price controls. Note the strong relation between GDP per capita and the property rights, regulation, and the presence of informal markets. Not surprisingly, the simple correlations between property rights and these two measures are 0.81 and 0.86, respectively. The values assigned to most of the components of the overall index of economic freedom took on values from 1 to 5. We construct a set of dummy variables for each measure (taking on the value of 1 if in a specific category, 0 otherwise) and regressed per capita GDP on these variables. The coefficients on these variables are relative to the omitted category, the least free measure (i.e., when a country takes on the value of 1). We do this in the attempt to ascertain if the relation between GDP per capita and the index or its component parts is a continuous or a threshold type of relation. The results reported in table 8.3 show that the usual situation is one in which substantial economic freedom is required before there is a significant impact on GDP. Once again, the equations with property rights, regulation, and informal markets as right-hand side measures have the greatest explanatory power. Table 8.4 shows the relation between per capita GDP and property rights followed by the relation between GDP and property rights paired with each of the other components of the overall index of economic freedom. While some of the other measures are statistically significant in these equations, none adds any explanatory power to the estimates. Finally, in table 8.5 we pair property rights with the trade monetary policy and informal market components of the index. Only the informal market variable adds to the explanatory power of the equation, but this measure is a consequence of the loss of property rights or economic freedom, not a cause of the decrease in economic freedom. The property rights index may be picking up the presence of the lack of property rights while the informal market index and the regulation index may be picking up the extent of property rights. Running the same regressions using the dummy variable approach yields the same results. However, these estimates add the information that the relation between the effects of property rights (or other measures) on GDP only take effect if there are substantial degrees of economic freedom.

New Interpretations of Economic Freedom

●

161

4 Relation to Other Studies Banaian and Luksetich (2003) employ PCA to determine the number of dimensions of central bank independence. Rather than the 17 measures as offered by the most popular index of central bank independence, their paper finds 3 dimensions. Caudill, Zanella, and Mixon (2000) use PCA on both the Heritage and Fraser indices and find that the former has two dimensions and the latter four. The first principal component predicts growth better than the Fraser index itself. Snider (2003) uses PCA on different measures of economic freedom, along with rankings on corruption and credit ratings to establish three dimensions: 1. political risk; 2. domestic investment climate; and 3. government intervention in the economy. Snider shows the same dimensionality for the Heritage measure we use and the popular alternative measure from the Fraser Institute’s Economic Freedom in the World.1 Heckelman and Stroup (2000, 2004) argue that PCA is not an appropriate technique for arriving at a proper aggregation because it fails to provide for any conceptual link between the selection of components in the factors and economic theory. There is no reason a priori as well that the first factor acquired in PCA will be correlated with economic growth. And results tend to be sensitive to the choice of countries, years, or scaling of the variables. They use instead hedonic regression. Since they use the Fraser index and do not include measures of property rights (introduced to the Fraser index only after 1995), their results are not directly comparable. Our data contains a newer set of observations than those in the above-cited studies, and to verify these results we conducted PCA on the 10 components of the Heritage Index. The results are in table 8.6. Using a rule of thumb that we would retain only those with eigenvalues greater than 1. Because PCA is sensitive to scaling and the scales of these variables are under question, the analysis is performed on the correlations of the variables rather than the values themselves. The factors are rotated using the Varimax rotation, to show more easily the loading of each variable on the factors. Nearly half of the variance in the scores is found on one factor, which is heavily weighted in four variables—property rights, informal markets, government regulation, and international trade. The second factor loads

162

●

King Banaian and William Luksetich

almost only the fiscal variable. The third factor loads government intervention with foreign investment, banking and finance, and wage and price controls. Monetary policy or inf lation control dominates the fourth factor, but it appears not to account for much variation in economic freedom between the countries. These results differ substantially from previous studies. One may be tempted to dismiss PCA as being a rather arbitrary means to organize the data. And the negative weights on some measures might cause concern. To illustrate, we used the four principal components above regressed on GDP per capita. 2 The advantage of the procedure is that the 4 components are orthogonal and therefore remove the problem of multicollinearity that would arise if we had regressed the 10 components together on GDP. The results are shown in table 8.7. We then disentangle the components from the four factors back to see the effects. In the GDP regression, government intervention has the wrong sign and is highly significant. Property rights continue to be highly significant, as do trade, regulation, and informal markets as we found in table 8.4. We cannot be certain that this will be correct. There is no assurance, for example, that one of the principal components that we dropped for adding little to the variation of the matrix of economic freedom measures does not correlate with GDP growth, or that a later factor might not switch the full effect of government intervention to positive in the per capita GDP regression. Caudill, Zanella, and Mixon find that the sixth principal component is helpful in predicting GDP growth. One must wonder what economic theory accounts for the first and the sixth being important, but not components two through five? 5 Conclusions We agree with Heckelman and Stroup (2004, p. 15) that “any conclusions regarding the role of economic freedom in promoting growth based on studies relying on the aggregated {economic freedom index} may be premature,” but some results appear to be available from looking at the total of the research. Aggregation creates a problem insofar as some of the elements of these indices of economic freedom may be prior to others; some may depend on the presence of others. The one at the base of many of these, in our view, is property rights. The property rights measure is a summary made by experts of “the degree to which a country’s laws protect private property rights and the degree to which its government enforces those laws” (Beach and Miles, 2005, p. 8).3 It

New Interpretations of Economic Freedom

●

163

seems only reasonable that those assessments are made on the basis of things like government intervention or wage and price controls. We found in tables 8.4 and 8.5 that once property rights are included, only a few of the remaining nine measures add any explanation to differences in living standards. Even in these cases, we cannot be sure trade, regulation, and informal markets are independent of property rights—as we saw in table 8.1, they are highly correlated. The principal components regression in table 8.7 suggests they might have independent effects, but as we stated before those results should be treated cautiously. One should also be cautious about using the cardinal scales within these indices. We found that there is little effect of moving from 1 to 2 or even 2.5 for many of the measures within the Heritage Index to lead to higher per capita GDP. This would suggest that researchers may want to reduce the number of points on the scales used. It also suggests that results in the later tables might be inf luenced by mismeasurement of the attributes of economic freedom. Moreover, it appears that changing the weighting scheme does not help matters greatly.4 What we hope the reader will take from this chapter is a stronger sense of caution generally about measures of economic freedom. We have argued here that results one would expect—that economic freedom improves living standards—have not held up to scrutiny when subjected to more rigorous testing. Our results on the Heritage Index confirm results found previously for the Fraser Institute measure. We do not think the need is for a new index, though. We argue instead that we should try to be sure what we mean by economic freedom and to focus on fewer factors.

Appendix

Table 8.1

Correlation Matrix of Variables

Overall economic freedom Trade Fiscal Government intervention Monetary policy Foreign investment Banking Wage and price controls Property rights Regulation Informal markets

1.00 0.69 0.21 0.43

1.00 0.14 0.19

1.00 ⫺0.01

1.00

0.53 0.78 0.79 0.72

0.22 0.46 0.46 0.40

⫺0.03 0.04 0.17 ⫺0.02

0.07 0.33 0.46 0.38

1.00 0.32 0.33 0.38

1.00 0.66 0.61

1.00 0.61

1.00

0.86 0.82 0.85

0.60 0.50 0.60

0.07 0.14 0.14

0.24 0.23 0.19

0.34 0.34 0.36

0.63 0.60 0.60

0.58 0.57 0.56

0.56 0.56 0.51

1.00 0.81 0.86

Wage and price controls

Property rights

Overall Trade economic freedom

Fiscal Government Monetary Foreign Banking intervention policy investment

1.00 0.75

1.00

Regulation Informal markets

Table 8.2

Living Standards and Aspects of Economic Freedom

Dependent Variable Is Log of Per Capita GDP (1990 International Dollars) Component Overall score Trade Fiscal policy Government intervention Monetary policy Foreign investment Banking and finance Wage and price controls Property rights Regulation Informal markets

Coefficient

Standard Error

R2

1.757 0.839 0.304 0.212 0.437 0.901 0.672 0.855 1.059 1.326 1.139

0.139 0.099 0.182 0.153 0.102 0.129 0.116 0.170 0.076 0.112 0.068

0.51 0.32 0.01 0.01 0.10 0.24 0.18 0.14 0.56 0.48 0.65

Note: Bold indicates significance at 1% level. Results are simple regressions of per capita GDP on each measure. No controls are used, 153 countries in sample.

Table 8.3

Regressions Treating Steps in Freedom Components as Not Equidistant

Component Trade Monetary policy Foreign investment Banking and finance Wage and price controls Property rights Regulation

Government intervention Informal markets

Value Is 2

Value Is 3

Value Is 4

Value Is 5

Standard Error

R2

0.59155 1.939 ⫺0.39291 0.574 ⫺0.35198 0.503 ⫺0.05617 0.103 ⫺0.71107 0.84 0.07594 0.2145 0.30979 0.871

0.54819 1.689 0.31394 0.569 0.24625 0.367 0.55104 1.061 ⫺0.99381 1.317 0.88584 2.439 2.3052 6.35

2.6545 8.657 0.72469 1.332 1.5536 2.254 1.064 2.019 0.73973 0.9818 2.6703 6.343 3.2266 7.558

3.5337 3.681 1.3764 2.759 2.6991 3.454 2.7566 4.547 2.5339 1.546 3.5076 9.019 4.0034 5.329

1.3181

0.39

1.5998

0.10

1.449

0.26

1.504

0.20

1.4656

0.24

1.1553

0.59

1.3756

0.52

Value Is 1.5

Value Is 2

Value Is 2.5

Value Is 3

Value Is 3.5

⫺0.01691 0.017 ⫺0.24546 0.592

⫺1.7263 1.928 0.12851 0.518

⫺1.0714 1.207 1.1103 3.919

⫺0.75211 0.86 1.6755 5.625

⫺0.80019 0.928 2.7203 5.226

Value Is 4 Value Is 4.5 Value Is 5 Standard Error ⫺0.03 0.04 3.22 10.09

⫺1.692 1.557 3.4999 7.946

⫺1.9699 1.088 3.9056 11.71

R2

1.62

0.08

0.92325

0.67

Table 8.4

Property Rights Plus Other Measurements Property Rights

Standard Error

R2

1.0591 13.96

1.2466

0.56

1.193

0.58

1.051 13.89

1.2352

0.56

⫺0.13259 1.268

1.0825 13.89

1.2416

0.56

Monetary policy

0.11678 1.555

1.0171 12.69

1.235

0.56

Foreign investment

0.05735 0.452

1.0311 10.51

1.2532

0.56

Banking and finance

⫺0.02132 0.205

1.0702 11.44

1.2546

0.56

Wage and price controls

⫺0.14037 0.956

1.1081 12.1

1.2473

0.56

Coefficient Property rights Trade

0.27068 2.786

Fiscal policy

0.18752 1.546

Government intervention

0.90266 9.706

Regulation

0.4679 2.708

0.77889 6.114

1.1964

0.58

Informal markets

0.85888 6.58

0.3255 2.502

0.9739

0.66

Table 8.5

Property Rights, Trade, and Other Elements of Freedom

Property rights

0.86359 8.996

0.27415 2.073

Trade

0.26797 2.773

0.16038 1.805

Monetary policy

0.11306 1.539

Informal markets

0.31929 2.447

0.05388 0.8798 0.8105 6.126

0.84348 6.385

Standard error

1.1823

0.9594

0.9762

R2

0.58

0.66

0.66

Table 8.6

Principal Component Analysis (PCA) of Heritage Economic Freedom Index

Variable Trade Fiscal Government intervention Monetary policy Foreign investment Banking and finance Wage and price controls Property rights Regulation Informal markets Eigenvalues Cumulative %

Table 8.7

Factor 1

Factor 2

Factor 3

Factor 4

⫺0.762 ⫺0.087 ⫺0.056 ⫺0.185 ⫺0.620 ⫺0.506 ⫺0.480 ⫺0.904 ⫺0.818 ⫺0.885 4.9556 0.496

⫺0.087 ⫺0.988 0.026 0.016 0.013 ⫺0.183 0.099 0.018 ⫺0.073 ⫺0.071 1.1356 0.609

⫺0.121 ⫺0.000 ⫺0.907 ⫺0.048 ⫺0.468 ⫺0.628 ⫺0.542 ⫺0.170 ⫺0.191 ⫺0.109 1.0168 0.711

⫺0.037 ⫺0.022 ⫺0.080 0.942 0.250 0.263 0.383 0.164 0.221 0.193 0.80376 0.791

Principal Component Regression (T-Stats in Parentheses) Dependent Variable Is Per Capita GDP

Principal component 1

⫺6.7196 (14.73)

Principal component 2

⫺5.5386 (5.81)

Principal component 3

3.5267 (3.50)

Principal component 4

⫺3.0555 (2.70)

R2 Standard Error Transformed Back to Original Freedom Attributes Trade Fiscal Government intervention

0.64 1.0153 Dependent Variable Is Per Capita GDP 0.30144 (9.89) 0.05368 (0.50) ⫺0.33244 (4.71)

Monetary policy

0.01106 (0.18)

Foreign investment

0.13912

New Interpretations of Economic Freedom

●

169

Notes 1. For more on the measurement of the Fraser index, see Gwartney and Lawson (2001). 2. In the version of this chapter presented to the Western Economics Association meetings we included a regression with foreign trade, measured in dollars, as a dependent variable. Further research showed the results are fragile with respect to the year chosen for study, or even for an average of years. We have not included that part of the chapter. 3. Heritage uses a combination of reports from the Economist Intelligence Unit, the Commercial Guides from the U.S. Department of Commerce, and the U.S. Department of State’s reports on human rights practices. The Commerce Department guides are derived from embassy reports and the State Department. 4. For example, we attempted to simply recode the property rights measure dichotomously, using the original lesser two categories as a 0 and the upper three categories as a 1. We get log (GDP per capita) = 1.94 + 6.63* recoded property rights, R² = 0.33. Both coefficients are highly significant. The results are somewhat worse than those for property rights provided in table 8.3 but not at all implausible.

References Banaian, K. and W. A. Luksetich. 2001. Central Bank Independence, Economic Freedom, and Inflation Rates. Economic Inquiry 40 (1): 149–161. Beach, W. W. and M. A. Miles. 2005. Explaining the Factors of the Index of Economic Freedom. In 2005 Index of Economic Freedom, ed. Miles, M. A., E. J. Feulner Jr., M. A. O’Grady, A. I. Eiras, and A. Schevey. Washington, DC: Heritage Foundation and Dow Jones. Beach, W. W. and G. P. O’Driscoll. 2003. The Role of Property Rights in Economic Growth: An Introduction of the 2003 Index. In 2003 Index of Economic Freedom, ed. G. P. O’Driscoll, E. J. Feulner Jr., and M. A. O’Grady. Washington, DC: Heritage Foundation and Dow Jones. Caudill, S. B., Zanella, F. C., Mixon, F. G. Jr., 2000. Is Economic Freedom One Dimensional? A Factor Analysis of Some Common Measures of Economic Freedom. Journal of Economic Development 25 (June): 17–40. Fitzgerald, S. Z. 2003. Scandinavia’s Changing Political and Economic Landscape. In 2003 Index of Economic Freedom, ed. G. P. O’Driscoll, E. J. Feulner Jr., and M. A. O’Grady. Washington, DC: Heritage Foundation and Dow Jones. Gwartney, J. and R. Lawson. 2001. The Concept and Measurement of Economic Freedom. Presented to the Analysis and Measurement of Freedom: Theoretical, Empirical, and Institutional Perspectives. A conference held in Palermo, Italy, September 27–29. Heckelman, J. and M. Stroup. 2000. Which Economic Freedoms Contribute to Growth? Kyklos 53 (4): 527–544.

170

●

King Banaian and William Luksetich

Heckelman, J. and M. Stroup. 2005. A Comparison of Aggregation Methods for Measures of Economic Freedom. European Journal of Political Economy 21(4) (December): 953–966. Laar, M. 2003. How Estonia Did It. In 2003 Index of Economic Freedom, ed. G. P. O’Driscoll, E. J. Feulner Jr., and M. A. O’Grady. Washington, DC: Heritage Foundation and Dow Jones. Pollock, R. 2003. In the Middle East, Arbitrary Government Feeds Rage. In 2003 Index of Economic Freedom, ed. G. P. O’Driscoll, E. J. Feulner Jr., and M. A. O’Grady. Washington, DC: Heritage Foundation and Dow Jones. Roll, R. 2004. Weighting the Components of the Index of Economic Freedom. In 2004 Index of Economic Freedom, ed. M. A. Miles, E. J. Feulner, and M. A. O’Grady. Washington, DC: Heritage Foundation and Dow Jones. Snider, L. 2003. Comparing Measures of Economic Freedom: The Good, the Bad and the Data. In Global Risk Assessments: Issues, Concepts & Applications, ed. J. Rogers, 181–228 Riverside, CA: Global Risk Assessment.

CHAPTER 9

On the Methodology of the Economic Freedom of the World Index Robert A. Lawson

If you can’t measure it, measure it anyway. Milton Friedman To measure is to know. Lord Kelvin

1 Why Measure Economic Freedom? Anyone who has taught macroeconomics knows that students sometimes have difficulty grasping the enormity of the concept of GDP. The usual definition given is, “the market value of all final goods and services produced in a nation in a year,” and a number, $13,620 billion according to the latest estimate (Bureau of Economic Analysis 2007). But what does it mean? It is just a number to them. To make it seem more concrete, I ask my students to imagine a long printout that lists every activity in America this year: production of 10 million cars, 1.2 billion haircuts, 2430 major league baseball games. Then I ask them to imagine the same printout but with dollar values instead of quantities: $200 billion worth of cars, $12 billion in haircuts, $2 billion in ticket sales at major league games. Finally I ask them to imagine adding up all the numbers. Slowly it dawns on them what we are talking about. Clearly the total production of the United States is a big, multidimensional thing and GDP boils it down to a single, mind-bogglingly huge number.

172

●

Robert A. Lawson

Why do we go to so much trouble to measure GDP? The simple answer is that we want to know how much we have produced this year relative to last year. We also want to know how much we’ve produced (per person) relative to Japan or other countries. Despite the fact that GDP is a single number we know it represents a multidimensional thing, and we worry about the ability of the number to tell us anything useful. We wonder, for instance, whether today’s number is comparable with yesterday’s or if the United States’ number is comparable with Japan’s. But because these questions and others are important to us, we persevere doing our best to adjust for price changes over time and purchasing power parity difference among countries. The bottom line is that unless we take the time to measure GDP, we simply cannot begin to address many questions we are interested in. Now consider the concept of economic freedom. It is not too difficult to come up with a quick, working definition such as “the ability of individuals to consume, produce, and voluntarily trade with others without interference” that would satisfy most.1 Economists would probably agree that freedom is an economic good in the sense that people prefer more of it than less. We may be interested in a number of questions about this economic freedom thing: How much freedom do we have? Are we more or less free than we used to be? Are the Japanese freer than us? Do societies with more economic freedom perform differently than those with less? Milton Friedman was responsible more than anyone else for elevating the concept of economic freedom in our minds. Friedman (1962, p. 9) wrote, “I know of no example in time or place of a society that has been marked by a large measure of political freedom, and that has not also used something comparable to a free market to organize the bulk of economic activity.” Milton and Rose Friedman (1980, p. 148) wrote “a society that puts freedom first will, as a happy by-product, end up with both greater freedom and greater equality,” and argued that the economic success of Hong Kong was the result of its high level of economic freedom (p. 37). Free market advocates like the Friedmans like to paint a rosy picture of economic freedom leading to higher growth and incomes, less poverty, more equality, more civil rights, and so on. Meanwhile many other scholars of the same period argued that economic freedom would lead to ruin. Harrington (1962), Galbraith (1967), Thurow (1980) all argued forcefully that the United States should reject economic freedom in favor of greater government taxation, regulation, and industrial planning to solve various social problems like poverty, inequality, slow growth, and business cycles.

Methodology—Economic Freedom Index

●

173

Everyone seemed to agree on the ends (higher incomes and growth are good; poverty, inequality, and business cycles are bad) but they disagreed on the means for achieving those ends. The debate was mostly unscientific. It was as if two chemists who disagreed about a particular chemical reaction decided to argue about it in the hallway rather than go to the lab to run the necessary experiments to decide who was correct. The problem for the great debate between free market advocates and central planners was the inability to test their competing hypotheses empirically. Either economic freedom results in higher incomes, faster growth, more equality, less poverty, and so on as the free market advocates claim or it does not. It is as simple as that, but to test these hypotheses, we first need to measure economic freedom. 2 The Economic Freedom of the World Index More than two decades ago, Milton Friedman and Michael Walker at Canada’s Fraser Institute conducted a series of conferences that focused on defining and measuring economic freedom for a large number of countries. Approximately 60 of the world’s leading scholars, including Nobel Prize winners Gary Becker and Douglass North, participated in the series of 6 symposia. The early meetings concentrated on the conceptual issues about the meaning and definition of economic freedom while the later workshops focused more on the nuts and bolts of putting together a measure (Walker 1988; Block 1991; Easton and Walker 1992). Ultimately these efforts led to the publication of Economic Freedom of the World, 1975–1995 by Gwartney, Block, and Lawson (1996) and subsequent annual volumes (e.g., Gwartney and Lawson 2007). The Economic Freedom of the World (EFW) index is now a widely used variable in scholarly studies of everything from economic growth to income equality to military conf lict (e.g., Berggren 2003; De Haan, Lundström, and Sturm 2006; Scully 1999; Gartzke 2007). The purpose of this chapter is not to discuss the EFW index results or its empirical implications. Instead, this chapter will discuss the various conceptual and methodological issues and trade-offs involved in the creation of such an index. 3 What Is Economic Freedom? Although Lord Kelvin said, “to measure is to know,” it is quite useful in practice to know what you want to measure. The creators of the EFW index were very deliberate about creating a clear understanding,

174

●

Robert A. Lawson

in their minds at least, of what they meant to measure. Simply put, the EFW index is designed to measure the consistency of a nation’s institutions and policies with economic freedom. In order to achieve a high EFW rating, a country must provide secure protection of privately owned property, evenhanded enforcement of contracts, and a stable monetary environment. It also must keep taxes low, refrain from creating barriers to both domestic and international trade, and rely more fully on markets rather than the political process to allocate goods and resources. Institutions and policies are consistent with economic freedom when they provide an infrastructure for voluntary exchange, and protect individuals and their property from aggressors. Personal ownership of self is an underlying postulate of economic freedom. Because of this selfownership, individuals have a right to choose—to decide how they will use their time and talents. On the other hand, they do not have a right to the time, talents, and resources of others. Thus, they do not have a right to take things from others or demand that others provide things for them. We debated then and continue to debate what belongs and does not belong in an economic freedom index and the deeply philosophical viewpoints summarized above inform our choices throughout the process. 4 Types of Indexes Before embarking on the creation of any index, one must decide what type of index to create. There are several distinct approaches. 4.1 Surveys One of the simplest ways to measure something is to ask people. At one of the symposia leading to the creation of the EFW index, Milton and Rose Friedman conducted a brief experiment asking participants to simply rank-order 11 countries according to economic freedom (Easton and Walker 1992, pp. 280–282). Indeed for a time the Fraser Institute pursued a plan to conduct a large-scale survey of economic freedom. Alas, the initial results once tabulated were judged to be so inconsistent that these plans were shelved in favor of the current approach. The problem seemed to be that it was difficult, with the very limited resources available, to get a large enough sample of knowledgeable people in enough countries who understood the concept of economic freedom.

Methodology—Economic Freedom Index

●

175

One common challenge facing surveys is that survey respondents may fall victim to “success bias,” a tendency to give high marks to all questions, regardless of the content of the question, when times are good and low marks when economic conditions are bad. For these reasons, along with others, economists in particular appear reluctant to embrace survey methods and prefer so-called hard data. 2 This is not to say that surveys have no place in cross-country measurements. The problem is that many areas of interest are simply hard to quantify using dollars, percentages, or other common yardsticks. Without a survey approach many important areas of economic life would be left unexamined. Furthermore, surveys are cheap. At least in a relative sense, survey data are often less expensive to produce than comparable data from other approaches. A number of very good international surveys exist. The World Economic Forum conducts an annual Executive Opinion Survey as a part of its Global Competitiveness Report that asks business executives questions about the business climate. Many of the issues addressed would be difficult if not impossible to assess with traditional data. There are typically dozens of survey respondents per country representing a range of firms: public and private, foreign and domestic, large and small, and across many sectors (see Geiger and Loades 2006). A different but very similar survey is conducted by IMD (2007). 4.2 Expert Panels Another popular approach is to convene a small panel of experts to evaluate conditions within a country. The advantage of this approach over standard surveys is that the persons selected are more likely to be very knowledgeable about the country as that is why they got selected in the first place. That is also the disadvantage. No one person or even small group of people is likely to know much about all aspects of an economy so expert panels risk sample bias. Of course, these can be even cheaper than traditional large-scale surveys and cost is not a trivial concern when dealing with many countries. Several organizations utilize this method with good impact. The PRS (Political Risk Service) Group’s International Country Risk Guide relies on a worldwide network of experts to provide various kinds of risk scores to business clients. Increasing one’s confidence in this source is the fact that PRS is a for-profit concern whose very existence hinges on providing valuable information to its clients. The Freedom House’s

176

●

Robert A. Lawson

(2006) well-known Freedom in the World indexes of political rights and civil liberties are also best characterized as expert panel ratings. Another example of an expert panel approach is the Heritage Foundation’s Index of Economic Freedom, which competes with the EFW index (Kane, Holmes, and O’Grady 2007). Another disadvantage of the expert panel approach is that these indexes tend to be “black boxes.” Neither the PRS Group nor Freedom House are particularly forthcoming about the methodology or underlying data used to construct the ratings. The Heritage Foundation has improved on this margin but it is still difficult to see how the rating of a particular country was derived. 4.3 Case Studies Case studies would seem to have little place in a discussion about constructing cross-country measures of economic institutions. Case studies are incredibly expensive to do well and are by their very nature difficult to generalize from country to country. As a result, until recently at least, case studies have played little role in cross-country indexes. Nonetheless, the World Bank’s Doing Business (2007) project is an important new attempt to bring the case study approach to the table. The methodology of this project is a modified form of that employed by Hernando de Soto (1989) almost two decades ago. De Soto had associates in Peru go through the required procedures to start a generic business legally in several locations and kept track of the time and monetary cost. Like de Soto, the methodology of the World Bank project begins with a generic experiment, such as starting a business, dismissing a worker, or collecting a contractual debt. The various requirements that must be met in order to legally undertake the activity are identified and leading law firms and other professionals that generally handle such matters are contacted and asked to provide estimates for both the time (measured in days) and money cost that would typically be incurred complying with the mandated regulations. Special care is taken to assure that the generic cases are comparable across both countries and time periods. The focus of the cases is on business activities that are highly relevant to small and medium-size domestic companies rather than foreigners doing business in the country. In this respect, the data are ref lective of how the legal and regulatory environment affects the activities of domestic entrepreneurs.

Methodology—Economic Freedom Index

●

177

4.4 Hard Data Of course hard data like marginal tax rates, inflation rates, government spending, tariff rates, and so on can be important indicators of economic freedom. They are cheap to acquire, to the user at least, and reasonable care is taken to assure the datasets are comparable across countries. The problem is that hard data can only take you so far. From the standpoint of the EFW index, cross-country differences in quality of the legal system and regulatory policies have proven particularly difficult to measure using hard data. 4.5 Aggregations Many international indexes are aggregations of data from other sources including those above. Transparency International’s (2006) well-cited Corruption Perceptions Index is one example as are the World Bank’s (2007) Governance Indicators ratings. The EFW index is an aggregation index. All of the data used in the creation of the EFW index are from other sources representing all of the above types. Table 9.1 shows the components of the EFW index along with a brief notation indicating the primary source. Complete source information is available from Gwartney and Lawson (2007). 5 Methodological Choices in the EFW Index Even after deciding to use an aggregation approach to construct the EFW index, there are many methodology hurdles to be overcome, and these choices, like all choices, involve trade-offs. 5.1 Third-Party Data and Transparency Authors of the EFW index, which is an aggregation type index, do not generate any data on their own. All is obtained from third-party sources mostly available at the international level (i.e., country-specific sources are rarely used). In addition, to the fullest extent possible the raw data and sources are clearly identified for outsiders to see. It is impossible to avoid all subjectivity, however. The authors must select which variables to include in the index based on their understanding of the concept of economic freedom, and not everyone will necessarily agree with these choices.

178

●

Robert A. Lawson

Primarily our reluctance to generate our own ratings ref lects our discomfort with trying to evaluate the situation in far-distant countries that we know very little about. But there is another important, unintended benefit to relying only on third-party data. We cannot be easily manipulated politically. As the EFW index has become more widely known, politicians and other interests of various kinds have become interested in seeing their countries do better (or sometimes worse) in the index ratings. I once received a phone call of complaint from the ambassador to the United States from a major Western European nation. One prominent Asian nation every year invites to dinner one of our publication network contacts, who has nothing whatsoever to do with the construction of the index, to attempt to get inside information about the upcoming numbers. Free market reformers, some of whom are our friends, also can complain bitterly about the EFW index results. Some who are involved in government want to see their country’s ratings go up to reflect the progress they have made, while others, typically those outside of power, want the ratings to fall so they can complain more about the lack of freedom in their country. Whenever we have been contacted with complaints, our reply is always the same, “Hey it’s the data speaking not us. If you have a problem with the numbers, that’s not our fault.” The methodological rule requiring third-party data has turned out to be a nice way to insulate the EFW project from attempted political manipulation.3 Finally, one major disadvantage to using only third-party data is that we are somewhat hostage to the third parties that produce the data. There have been instances when the sources have changed methodologies and in some cases stopped reporting a variable altogether. Such instances require the EFW index to make adjustments over time. 5.2 Coverage From the beginning the EFW index has attempted to cover as many countries as possible. The first edition in 1996 included ratings for 102 countries and the latest edition has ratings for 141 countries. Rating these many countries means that you have to restrict your analysis to broad areas of economic freedom. If a country has a particularly idiosyncratic violation of economic freedom that is shared by no other country, it would be hard to include in the index. One of the downsides to using third-party data from published sources is that it is impossible or extremely difficult to get data for some countries of interest. For example, the EFW index does not include

Methodology—Economic Freedom Index

●

179

Cuba or North Korea. While there is little doubt both would be at the very bottom of the rankings, the absence of published international data precludes their inclusion in the EFW index. The more subjective, expert panel approach employed by the Heritage Foundation does not suffer from this problem and they are able to rate more countries including Cuba and North Korea. 5.3 Index Numbers One of the challenges in creating an index is to decide how to convert the raw numbers into index numbers. When the original sources are themselves index numbers, this is quite straightforward usually requiring only an algebraic manipulation. For example, the World Economic Forum’s Executive Opinion Survey (EOS) uses a 1–7 scale. To convert to the 0–10 EFW scale merely requires applying the following formula: EFWi ⫽ ((EOSi ⫺ 1) / 6) * 10. Some of the raw variables are categorical in nature. For example, one EFW component is the freedom to hold foreign currency bank accounts domestically and abroad. The rating is determined simply to be a 10 if citizens can do both, 5 if they can do one but not the other, and 0 if they can do neither. Categorical variables of this type require some kind of arbitrary conversion method. When the data are cardinal and continuous in nature, the EFW index uses a conversion formula and allows the distribution of the index numbers to mirror the distribution of the raw data. For such variables, a maximum and minimum value are determined and the following formula applied: ((X max⫺X i) / (X max⫺X min)) * 10. Countries near the maximum value will get ratings close to 10 and countries near the minimum value will get ratings near 0, and the distribution of index ratings between 0 and 10 will end up reflecting the distribution of the raw data between the minimum and maximum values. 5.4 Weights After having decided how to convert the raw data into index numbers, there still remains the task of deciding how to combine the various components into the final index. Components are grouped together into one of five areas based on theory. The five areas are (1) Size of Government: Expenditures, Taxes, and Enterprises; (2) Legal Structure and Security of Property Rights; (3) Access to Sound Money; (4) Freedom to Trade Internationally; (5) Regulation of Credit, Labor, and Business.

180

●

Robert A. Lawson

Data on tariffs naturally belong in Area 4 while data on tax rates belong in Area 1, and so on. But how should the components be weighted within the area and among the areas to create the final index? Over the years, a number of different weighting methods ranging from the subjective views of “experts” to principal component analysis have been tried. In most cases, the choice of weighting method exerts little impact on the rating and ranking of countries. So in an effort to keep the procedure simple and transparent, a simple average is now used to combine the components into area ratings and the area ratings into summary ratings.4 This does not mean to imply that all components and areas of economic freedom are equally important. For some purposes, clearly some of the components are more important than others. Researchers who want to weight the components and areas to suit themselves are invited to do so. Although Lawson (2006) questions the practice, disaggregating the index for use in econometric studies is an increasingly common practice (Heckelman and Stroup 2000, 2002; Sturm, Leertouwer, and De Haan 2002). 5.5 Comparability among Countries and over Time In order to assure comparability among countries, the EFW index requires that most of the data be available. For example, if there are five components in a particular area, the EFW index would require four of them be present in order to compute the area rating. If you require 100% of all the data be available it would not be possible to construct a rating for more than a few countries. On the other hand, if you set the bar too low, you risk comparing some countries with two components against others with five. Since the distributions of the components are not all the same this is problematic. Requiring most, but not necessarily all, of the data be present seems to be a reasonable compromise between assuring comparability and achieving coverage of many nations. Another more serious concern is comparability over time. The problem is that the underlying data are more complete in recent years than in earlier years. As a result, changes in the index ratings over time for a particular country may reflect the fact that some components are missing in some years but not in others. This is similar to comparing GDP or a price index over time when we know that the underlying goods and services are constantly changing. The problem of missing or changing components threatens the comparability of the index ratings over time. In order to attempt to correct this issue, the EFW index includes a “chain-linked summary index” that is based on the 2000 rating as a

Methodology—Economic Freedom Index

●

181

base year. Changes to the index going backward (and forward) in time are then based only on changes in components that were present in adjacent years. For instance, the 1995 chain-linked rating is based on the 2000 rating but is adjusted based on the changes in the underlying data between 1995 and 2000 for those components that were present in both years. If the common components in 1995 were the same as in 2000, then no adjustment was made to the 1995 summary rating. However, if the 1995 components were lower than those for 2000 for the overlapping components between the two years, then the 1995 summary rating was adjusted downward proportionally to reflect this fact. Correspondingly, in cases where the rating for the common components was higher in 1995 than for 2000, the 1995 summary rating was adjusted upward proportionally. The chain-linked ratings are constructed by repeating this procedure backward in time to 1970 and forward through the most recent year in the index. The chain-linked methodology means that a country’s rating will change across time periods only when there is a change in ratings for components present during adjacent years. This is precisely what one would want when making comparisons across time periods. Researchers using the data for long-term studies should use these chain-linked data. 6 Conclusion The EFW index is an aggregation index created from a combination of hard data, surveys, expert panels, and case studies from various thirdparty sources. The index places emphasis on being able to rate a large number of countries using a transparent methodology and its immunity to political pressure. Its methodology involves numerous trade-offs. Despite the many problems associated with trying to summarize a multidimensional concept like economic freedom into a single number, the availability of this measure has energized empirical work in this area, and has greatly reduced the ideological content within the larger debate. The widespread use of the index in the literature is a testament to the value researchers have found in the project.

Appendix Table 9.1

The Areas and Components of the EFW Index

Area 1: Size of Government: Expenditures, Taxes, and Enterprises A General government consumption spending (WB, IMF) B Transfers and subsidies as a percentage of GDP (WB, IMF) C Government enterprises and investment (WB, IMF) D Top marginal tax rate i Top marginal income tax rate (PW) ii Top marginal income and payroll tax rates (PW) Area 2: Legal Structure and Security of Property Rights A Judicial independence (GCR) B Impartial courts (GCR) C Protection of property rights (GCR) D Military interference in rule of law and the political process (ICRG) E Integrity of the legal system (ICRG) F Legal enforcement of contracts (DB) G Regulatory restrictions on the sale of real property (DB) Area 3: Access to Sound Money A Money growth (WB, IMF) B Standard deviation of inflation (WB, IMF) C Inflation: Most recent year (WB, IMF) D Freedom to own foreign currency bank accounts (IMF) Area 4: Freedom to Trade Internationally A Taxes on international trade i International trade tax revenues (% of trade sector) (WB, IMF) ii Mean tariff rate (WB, WTO) iii Standard deviation of tariff rates (WTO) B Regulatory trade barriers i Nontariff trade barriers (GCR) ii Compliance cost of importing and exporting (DB) C Size of the trade sector relative to expected (WB, IMF) D Black-market exchange rates (MRI) E International capital market controls i Foreign ownership/investment restrictions (GCR) ii Capital controls (IMF) Area 5: Regulation of Credit, Labor, and Business A Credit market regulations i Ownership of banks (WB) ii Foreign bank competition (WB) iii Private sector credit (IMF) iv Interest rate controls/Negative real interest rates (WB, IMF) B Labor market regulations i Minimum wage (DB) ii Hiring and firing regulations (GCR) Continued

Methodology—Economic Freedom Index Table 9.1

C

●

183

Continued

iii Centralized collective bargaining (GCR) iv Mandated cost of hiring (DB) v Mandated cost of worker dismissal (DB) vi Conscription (MB) Business regulations i Price controls (IMD) ii Administrative requirements (GCR) iii Bureaucracy costs (GCR) iv Starting a business (DB) v Extra payments/Bribes (GCR) vi Licensing restrictions (DB) vii Cost of tax compliance (DB)

Source: IMF = International Monetary Fund; WB = World Bank; GCR = World Economic Forum, Global Competitiveness Report; ICRG = PRG Group, International Country Risk Guide; DB = World Bank, Doing Business; IMD, World Competitiveness Yearbook; PW = Price Waterhouse; WTO = World Trade Organization; MB = Military Balance; MRI = MRI Banker’s Guide to Currency. For more information, see Gwartney and Lawson (2007).

Notes 1. Sen (1999) would probably not be one to agree with this definition as he prefers a definition of freedom based on positive rights. 2. It is worth remembering though that just about all “hard data” starts out with someone filling out a form of some kind so the dichotomy between hard data and soft data is not as clear-cut as many believe. 3. The Heritage Foundation has not been so lucky. In one instance that became public, The Washington Post suggested that a close relationship between Malaysian politicians and the Heritage Foundation may have resulted in higher ratings for Malaysia on its Index of Economic Freedom. See Thomas B. Edsall. Think Tank’s Ideas Shifted As Malaysia Ties Grew. The Washington Post, Sunday, April 17, 2005, p. A01. 4. From a computational cost point of view, the simple average is vastly easier to calculate in cases where we have missing values. Spreadsheets can easily skip over empty cells when calculating simple averages. But if you use explicit weights, you have to reallocate the weight of missing variables to the other variables. While not impossible, this requires a lot of programming within a spreadsheet.

References Berggren, N. 2003. The Benefits of Economic Freedom: A Survey. Independent Review 8 (2): 193–211. Block, W. (ed.) 1991. Economic Freedom: Toward a Theory of Measurement. Vancouver: Fraser Institute.

184

●

Robert A. Lawson

Bureau of Economic Analysis. 2007. http://www.bea.gov/national/index.htm. (July 25, 2007) De Haan, J., S. Lundström, and J. E. Sturm. 2006. Market-Oriented Institutions and Policies and Economic Growth: A Critical Survey. Journal of Economic Surveys 20 (2): 157–191. Easton, S. T. and M. A. Walker (eds.). 1992. Rating Global Economic Freedom. Vancouver: Fraser Institute. Freedom House. 2006. Freedom in the World 2006: The Annual Survey of Political Rights and Civil Liberties. New York: Rowman and Littlefield. Friedman, M. 1962. Capitalism and Freedom. Chicago: University of Chicago Press. Friedman, M. and R. Friedman. 1980. Free to Choose: A Personal Statement. New York: Harcourt Brace Jovanovich. Galbraith, J. K. 1967. The New Industrial State. Boston: Houghton Mifflin. Gartzke, E. 2007. The Capitalist Peace. American Journal of Political Science 51 (1): 166–191. Geiger, Thierry and Emma Loades. 2006. The Executive Opinion Survey: Gauging the Business Climate. In The Global Competitiveness Report: 2006–2007, ed. A. Lopez-Claros, M. E. Porter, X. Sala-i-Martin, and K. Schwab. Geneva: World Economic Forum. Gwartney, J., W. Block, and R. Lawson. 1996. Economic Freedom of the World, 1975–1995. Vancouver: Fraser Institute. Gwartney, J. and R. Lawson. 2007. Economic Freedom of the World: 2007 Annual Report. Vancouver: Fraser Institute. Harrington, M. 1962. The Other America: Poverty in the United States. New York: Macmillan. Heckelman, J. C. and M. D. Stroup. 2000. Which Economic Freedoms Contribute to Growth? Kyklos 53 (4): 527–544. ———. 2002. Which Economic Freedoms Contribute to Growth? Reply. Kyklos 55 (3): 417–420. International Institute for Management Development. 2007. IMD World Competitiveness Yearbook 2007. Lausanne: IMD. Kane, T., K. R. Holmes, and M. A. O’Grady. 2007. 2007 Index of Economic Freedom. Washington, DC: Heritage Foundation. Lawson, R. A. 2006. On Testing the Connection between Economic Freedom and Growth. Economic Journal Watch 3 (3): 398–406. PRS Group. International Country Risk Guide. http://www.prsgroup.com/ICRG. aspx. Scully, G. W. 2002. Economic Freedom, Government Policy and the Trade-Off between Equity and Economic Growth. Public Choice 113 (1–2): 77–96. Sen, A. 1999. Development as Freedom. New York: Alfred A. Knopf. de Soto, H. 1989. The Other Path: The Economic Answer to Terrorism. New York: Basic Books. Sturm, J. E., E. Leertouwer, and J. De Haan. 2002. Which Economic Freedoms Contribute to Growth? A Comment. Kyklos 55 (3): 403–416.

Methodology—Economic Freedom Index

●

185

Thurow, L. C. 1980. The Zero-Sum Society: Distribution and the Possibilities for Economic Change. New York: Basic Books. Transparency International. 2006. Corruption Perceptions Index 2006. http://www. transparency.org/policy_research/surveys_indices/cpi Walker, M. A. (ed.). 1988. Freedom, Democracy, and Economic Welfare. Vancouver: Fraser Institute. World Bank. 2007. Worldwide Governance Indicators: 1996–2006. http://info. worldbank.org/governance /wgi2007/ World Bank and International Finance Corporation. 2007. Doing Business How to Reform. Washington, DC: World Bank and International Finance Corporation.

This page intentionally left blank

CHAPTER 10

Government Structure, Strength, and Effectiveness Joshua C. Walton, Apanard Angkinand, Marina Arbetman, Marie Besançon, Eric M. P. Chiu, Suzanne Danis, Arthur T. Denzau, Yi Feng, Jacek Kugler, Kristin Johnson, and Thomas D. Willett

1 Introduction There has been an explosion of cross-national datasets purporting to measure various aspects of political institutions and the strength and stability of governments. These have come from a variety of literatures in International Relations, International Political Economy and analyses of economic growth, among others. This explosion has left the researcher with an embarrassment of riches—and an entangling dilemma: what do the measures mean, and which might be most useful for particular empirical purposes? In this chapter, we attempt to bring some order to this choice with our analyses and discussions of the wide variety of data now available. In this overview, we discuss different concepts of good governance, present some of the datasets, and offer initial comments on them. Because of the daunting breadth of this topic, we will not attempt to provide a comprehensive, detailed explanation of all available measures. Rather we will explore in broad strokes the general dimensions by which governments are categorized and evaluated. The first dimension under review is that of the state’s structure, specifically the structure that delineates the decision-making process of

188

●

Walton, Angkinand, Arbetman et al.

the state’s governmental apparatus. This dimension in particular tends to be the basis for the ways by which we differentiate different kinds of states. Is a government a democracy, an autocracy, or a dictatorship? Does it have a presidential or parliamentary system? Is the executive unconstrained, or constrained by a system of checks and balances? Each of these is a question regarding the state’s political decision-making structure. The second dimension we explore is that of state strength. In order for a state’s decision-making structure to operate smoothly—or at all—it is important that there be a sufficient level of confidence in the state apparatus: confidence that the state will be able to gather the resources it needs to carry out its essential functions and provide for some level of public goods, and confidence that the state apparatus itself will be able to prevent, or failing that, resist, any external or internal threats to its own survival. A lack of confidence in either of these areas necessarily results in critical vulnerabilities for the state, and at the end of this road lies state failure. The third dimension is that of the effectiveness of the state at achieving its broad ends. The list of objectives pursued by governments is long and varied; among them we can find particular types and levels of public goods provision, particular ideological or religious points of view, and a wide variety of others. In this chapter, we look at measures relating to those objectives relevant to what is generally termed “good governance.” Even this term is overly broad for our purposes, and so we shall limit our focus to those measures related to the smooth and effective operation of the apparatus. This chapter will also make some effort to describe some of the failings endemic to the construction and use of these measures. It is important to note that some measures are constructed as objective measures of observed phenomena, while others are constructed as subjective measures of informed opinion regarding particular characteristics of the countries under study; still others seek to aggregate measures of one or both types into indices purporting to capture broader concepts. There are advantages and disadvantages to each of these approaches, as will be discussed in a later section. Whatever measure is used, however, it is critical that the researcher be on guard against analytical confusion— which is to say that researchers should exercise care when selecting variables for use in research queries to ensure that the variable in use actually measures the concept under study. This chapter is organized in the following way. Section 2 looks at the structure of the state, discussing democracy, the relationship of the sizes

Structure, Strength, and Effectiveness

●

189

of winning coalition and selectorate, and the number and nature of veto players within the state. Section 3 looks at the ways by which a state can be considered strong, looking first at relative political capacity, a measure of the government’s ability to extract resources from its populace. The chapter then considers measurement of political instability and state failure. Section 4 turns to consideration of the aims to which governments can strive; topics in this section include competence, accountability, and the reduction of corruption. Section 5 provides some discussion of a wide range of governance measures available for use in research and some of the issues related to their use, and Section 6 concludes the chapter. 2 Structural Characteristics of the State Every state, being an apparatus run by people and by which people are governed, must have in place some set of processes and institutions by which the person or people in charge are selected, and by which the rules and policies are set. In this chapter we term these processes and institutions the structural characteristics of the state, and it is to them that we look first in our survey. 2.1 Democracy Many attempts have been made to measure democracy, and the strength of the state, but it is not clear which, if any, are best for most uses in empirical work. Given this, we suggest that choosing the measure that most closely captures that need in the specific use (relating the measure to the concept that is in the theory or hypothesis being tested) is best. For example, if a researcher studying democracy conceives of it as involving a broadly enfranchised citizenry eligible to vote but also necessarily including a broad range of personal freedoms, such as freedoms of speech and assembly, that researcher may be disappointed by a measure that scores a country as a democracy on the sole basis of having regular elections. For reasons such as this, care must be taken by researchers that they select from among the varied definitions and measures of democracy those measures that are most relevant to their particular research aims. Various definitions for democracy have been provided, some relatively value-free, other quite clearly value-laden. Among the value-free are Schumpeter’s (1976, p. 269) definition of democracy as “. . . that institutional arrangements for arriving at political decisions in which

190

●

Walton, Angkinand, Arbetman et al.

individuals acquire the power to decide by means of a comprehensive struggle for the people’s votes . . .”; Lipset’s (1959, p. 71) definition of it as a “. . . political system which supplies regular constitutional opportunities for government officials . . .”; and Huntington’s (1984, p. 10) definition of a political system as a democracy “. . . to the extent that its most powerful collective decision-makers are selected through periodic elections in which candidates freely compete for votes and in which virtually all the adult population is eligible to vote . . .” Beyond the traditional value-free definitions of democracy we have a number of value-laden definitions of democracy.1 Diamond (1997, p. 3) presents two different components of democracy: (1) the presence of regular, free, multiparty elections and (2) the requirement that elected officials hold effective power, that officials be accountable, that true political choices exist, and that there be broad pluralism and freedom. According to Bollen (1993), democratic rule involves (1) government accountability to the population and (2) individuals being entitled to (in)direct participation in government. Vanhanen (1990, p. 10) defines democracy as “. . . a political system in which ideologically and socially different groups are legally entitled to compete for political power and in which institutional power holders are elected by the people and are responsible to the people . . .”; to Margolis (1979, p. 26), a democratic system is one that “. . . emphasizes facilitation of individual selfdevelopment and self-expression as the primary goals of government. The object of the government is to keep open for the individual a wide range of options and values . . .” There are numerous measures of democracy that have been created in recent years drawing on each of these definitions and many others. Some are discrete 0–1 measures—is the country a democracy or not? Gasiorowski (1996) and Przeworski et al. (2000) have developed such measures, and they reflect the obvious problems of trying to capture what can be a complex concept in a binary measure. Three more continuous measures have also been generated. Gurr (1990), as part of the Polity datasets, attempts to measure institutionalized democracy. This involves aggregating measures of the presence of competitive political participation, the presence of guarantees of openness and competitiveness of executive recruitment, and the existence of institutional constraints on executive power. The data is the most extensive of any of the measures, covering up to more than 150 countries from 1800 to the present, on an 11-point scale. Perhaps the most debated topic in this area is that over the effects of democratic governments versus autocratic or dictatorial governments on

Structure, Strength, and Effectiveness

●

191

a variety of outcomes including economic phenomena such as inf lation and growth. The extensive literature on this subject has made clear that the question, as phrased, is much too simplistic. There are many varieties of democracies, autocracies, and dictatorships, and their performances vary greatly across a range of dimensions. 2 Different institutional structures such as presidential versus parliamentary systems have been found to be important for some types of economic outcomes.3 Likewise, there is disagreement about whether democracy is a 0–1 variable or whether one should consider a continuum of degrees of democracy. 2.2 Winning Coalitions and the Selectorate Political scientists have developed the concept of the winning coalition, the group whose support is essential for a chief executive to survive in office. Bueno de Mesquita et al. (2003, hereafter BdM2S2) have developed a model in which as the ratio of the winning coalition to the group that selects the leader (the selectorate) increases, it becomes increasingly inefficient for the chief executive to focus on diverting resources to the winning coalition to the exclusion of other members of society. A key assumption is that politicians seek to maximize their probability of political survival. Politicians allocate their resources between goods that can exclusively be consumed by members of the winning coalition (private goods) and goods that serve the public at large (public goods) with the goal of maximizing this probability. As the winning coalition becomes larger, the amount of private goods received by each member of the winning coalition becomes smaller, rendering private goods a less and less efficient way of ensuring political survival. Consequently, as the ratio of the winning coalition to the selectorate increases, the chief executive focuses more on providing public goods while limiting attempts to corner private goods for political insiders. Since macroeconomic stability can be considered to be a public good, we should expect greater stability in environments with a high winning coalition/selectorate ratio. The BDM2S2 model, however, does not deal with issues of time inconsistency and hence its relevance for macro stability issues is open to question (issues of time inconsistency problems will be discussed at greater length in a later section on political instability). The critical metric in the BDM2S2 study is W/S, a measure of the proportion of the population that has a say in choosing the leader whom the leader must please in order to survive in office (bounded between 0 and 1). W/S the size of the winning coalition / the selectorate 4

192

●

Walton, Angkinand, Arbetman et al.

This measure follows the logic of Mancur Olsen’s (1982) concept of encompassing groups, where a small group may have tremendous scope for extracting large proportional gains from a large group, that is, there is a large scope for redistribution. With a large group, however, the ability to obtain proportional gains from others is less. Thus with a large coalition there is greater incentive to adopt policies that grow rather than redistribute the pie. Consider, however, the type of situation recently emphasized by Amy Chua (2002) where an ethnic minority has produced much of an economy’s wealth. In such cases gains to popular democracy could easily foster redistributive policies that substantially hurt growth. Clearly in such cases having a strong system of guarantees of property rights would be quite important. It is also not clear that increases in the size of minimum winning coalition, W/S, would improve the incentives for rulers over its whole range. At very low levels, modest increases in W/S could increase the amount of patronage the leader would have to offer to supporters. If the ruler were already extracting the maximum possible from the populace, this would just come from a reduction of the ruler’s share, but if taxation were below the maximum revenue level, say because the leader had some public interest concerns or wanted to reduce the chances of political instability, then initial increases in wealth could increase the incentives for higher taxation and/or the diversion of revenue from public to private uses. Having a well-functioning democracy requires much more than having a large W over S.5 2.2.1 Measures of Winning Coalition and the Size of Selectorate BDM2S2 use POLITY IV’s (Marshall and Jaggers 2000) collection of data plus Arthur Banks’ (2003) cross-national time series data including a number of institutional variables to construct an approximation of an index of the Winning Coalition (W) for the years through 1999. They use another POLITY variable, Legislative Selection (LEGSELEC), as an initial indicator of Selectorate (S). Winning coalition measure—a composite index based on the following four variables: 1. REGTYPE (1–4): taken from Bank’s data. Add one point to W when REGTYPE is not missing and is not equal to code 2 or 3, so that the regime type is not a military or military/civilian regime. Code 1 is civilian regime. 2. XRCOMP (1–3): taken from POLITY, measures the competitiveness of executive recruitment. Another point is assigned to W if

Structure, Strength, and Effectiveness

●

193

XRCOMP is larger than or equal to code 2. Code 1 means that the chief executive is selected by heredity. Code value of 2 and 3 refer to greater degrees of reponsiveness to supporters, indicating a larger winning coalition. 3. XROPEN (1–4): taken from POLITY, measures the openness of executive recruitment. It contributes an additional point to W if the executive is recruited in a more open setting than heredity (larger than 2). 4. PARCOMP (0–5): taken from POLITY data, measures compositeness of participation. One point is added to W if PARCOMP is coded as 5, meaning that “there are relatively stable and enduring political groups that regularly compete at national level.” BDM2S2 divide by the maximum value of 4 for the W index. The normalized minimum value is then 0 and the maximum is 1. Again the progression from 0 to 0.25 to 0.50 to 0.75 and up to 1.0 is not linear. 2.2.2 The Size of the Selectorates (Banks 2003) Use Legislative Selection (LEGSELEC) as a proxy for the selectorates. LEGSELEC measures the breadth of the selectiveness of the members of each country’s legislature. POLITY codes this variable as a trichotomy, with 0 meaning that there is no legislature. A code of 1 means that the legislature is chosen by heredity or ascription or is simply chosen by the effective executive. A code of 2, the highest category, indicates that members of the legislature are directly or indirectly selected by popular election. The larger the value of LEGSELEC, the more likely it is that S is large. BDM2S2 divide LEGSELEC by its maximum value of 2 so that it varies between 0 and 1. 2.2.3 The Loyalty Norm W/S Finally, BDM2S2 construct a variable, which measures the strength of the loyalty norm, 6 by dividing W by (log ((S 1) * 10)) / 3. They make this transformation of S to avoid division by 0 and to ensure that the index for S is never smaller than W. 2.2.4 Summary Comments While initial evidence suggests that the W over S variable constructed by BDM2S2 has some useful explanatory power, its connection to their theoretical model seems far from tight. Of course to the extent that unconstrained democracy does promote economic instability, the implication is not that authoritarian regimes are to be preferred but rather

194

●

Walton, Angkinand, Arbetman et al.

that democracy needs to be complemented by institutions such as central bank independence to promote low inf lation and budgetary procedures that limit the incentives to run large deficits. It should also be noted that Kevin Clarke and Randall Stone (2007) of the University of Rochester argue that BdM2S2 improperly apply a residualization estimation technique; this misapplication leads to the empirical results of the BdM2S2 et al. study being exaggerated. When Clarke and Stone apply the correct estimation techniques, most of BdM2S2’s key findings are no longer supported. 2.3 Veto Players One concept of a strong government is one that can implement policies without needing the agreement of others. This may be far from an ideal governance structure, however, since it conveys the power for arbitrary decision making and scope for government leaders to exploit their country. It can also undermine the ability to make many kinds of commitments in a credible manner. This is especially important for generalizing a favorable climate for investment that is necessary for economic growth. It can also be important for the financing of government through loans. North and Weingast (1989) analyze how the increased check on the power of the English crown that accompanied Glorious Revolution in 1688–1689 (and the structure of the Bank of England formed in 1691) substantially increased perceptions of the creditworthiness of the English government. Of course, too many checks and balances can also have serious adverse effects on the governance process, resulting in gridlock. This suggests the desirability of some degree of balance between the extremes of unlimited power for the executive and roadblocks to any effective action. An important mechanism for improving the trade-off is constitutional limitations on the scope of government power and protection of individual rights through strong adherence to the rule of law. While such a framework can reduce the power of the executive in some respects, it is likely to provide a much stronger overall governance structure and increase the ability of a government to make many types of commitments more credible. Such constitutional type ground rules would include an independent judiciary, constitutional prohibition on some types of actions, and could also include the establishment of independent regulatory agencies in some areas such as the central bank. Within such a framework the literature on institutions and economic growth generally assume “that, on average, the benefits of constraints

Structure, Strength, and Effectiveness

●

195

on executive discretion outweigh the costs of lost flexibility” (Henisz 2000, p. 4). This trade-off is likely more complex, however, when the issue is government response to shocks, since in this case there is usually a clear need for governmental action. In an insightful analysis of government responses to the Asian currency and financial crises of 1997, Andrew MacIntyre (2001) suggests the presence of a U-shaped relationship between effectiveness of responses and the degree of constraints on executive action. An absence of constraints can lead to instability and lack of credible commitment (Indonesia), while too many constraints can lead to gridlock (Thailand). Countries with an intermediate degree of constraints such as Korea responded much more effectively. Angkinand and Willett (2007) test this U-shaped hypothesis concerning responses to crisis for a set of 39 developing countries for the period 1980–1999 and find strong support for it. The measures used were based on George Tsebelis’ (1995, 2002) concept of veto players. These are defined as individual or collective actors whose agreement is required for a change of status quo policies. Proxies for the number of veto players are available from two datasets: the Database of Political Institutions (DPI) from the World Bank collected by Beck et al. (2001), and Political Constraints constructed by Henisz (2000). The variable from DPI, called checks, is the number of checks and balances, adjusted for whether these veto players are independent of each other. The number of checks is based on the Legislative Index of Electoral Competitiveness (LIEC) or Executive Index of Electoral Competitiveness (EIEC), in the same dataset; the score is determined by the number of components present in the country. The minimum score of checks is assigned to be equal to 1 when LIEC or EIEC is less than 5, which indicates the absence of competitive elections for the relevant government branch. In countries where LIEC or EIEC is greater than 5, in presidential systems, an additional veto point stands for a chief executive, each chamber of the legislature, and each party coded as allied with the president’s party, and has an ideological orientation closer to that of the main opposition party than to that of the president’s party. In parliamentary systems, the augmented points of veto players include a chief executive and every party in the government coalition (if that party is needed to maintain a majority or that party has a position on economic issues closer to the largest opposition party than to the party of the executive). Thus, these additional veto points are linearly increased by the number of veto players in the political system. It can also be important to take into account

196

●

Walton, Angkinand, Arbetman et al.

the policy preferences among these veto players. If several players have the same preferences then they reflect only one effective veto point. 3 Strength of the State There are several different concepts of state strength. In the international relations literature the primary focus is on a country’s international power. In the comparative politics literature, however, the focus is on the state’s degree of autonomy from domestic societal pressures. In this concept, the strong state is one that has an institutional environment that protects it from excessive democratic or interest group pressures. A third concept is the strength of the institutional environment. Thus, for example, a mature state might be strong in the sense that there is little danger of an overthrow of its basic institutions, but weak in the sense that the executive has only limited scope for independent action without facing strong political penalties. While there is fairly general agreement that it is good to have a strong state of the first type, there is considerable controversy about the desirability of having a strong state in the latter sense. Note also that the term “state” is used in different senses, ranging from referring to the whole nation at one extreme to referring solely to the executive at the other extreme, with the legislature and government workers being included in an intermediate usage. Our view is that there is no one best definition of state or strength. Each of the concepts and definitions is relevant for some purposes. The important thing is not to confuse the different concepts and definitions. Some view state strength as simply the ability to stay in power, which might be part of any conception of state strength, but which is surely a minimalist approach on its own, and raises questions of distinguishing between the state and society. A different concept involves the extent to which the state can enforce the rules of the political system, and thus its own laws and regulations, or are these regularly ignored or compromised by non-state actors? Can the state change the nation’s culture and its social identity? Can the state choose and implement its foreign policies? Can it affect or inf luence the international political system? In making this judgment, is the measure defined as relative to other countries, or to the country’s own history of governments? At least three different categories of concepts and measures have been created, each based on different answers to the above questions. The first category is relational measures—what is the capability of the state to control the behavior of other states? The measures have been economic and military, absolute and relative to other states. Among the

Structure, Strength, and Effectiveness

●

197

economic measures have been GDP and GNP, both overall, and in per capita terms. These can be just absolute measures of the sort used in economics, or defined relative to some benchmark level achieved by, say the largest or richest economy. Military capabilities have to be measured using the amount spent. The typical data sources used are the SIPRI (Swedish International Peace Research Institute), the United States Arms Control and Disarmament Agency, and the CIA World Factbook (various editions). Again, all of these can be measured in absolute or relative to some benchmark standard. Somewhat distinct from the relational ideas are structural ones. What is the ability of the state to create rules, norms, and customs of operations for the international system that are valued by that state? Some measures are quite general, while others focus on particular categories. The general measures are of the external capabilities of a state, measured in terms of state spending. The spending can be on military items, foreign policy spending, and foreign assistance. Sources include various national budget documents and OECD national account data. As the world has become more complex and inhibitions restraining the use of military force have increased, the measurement of a nation’s strength or power in the international arena has become much more complicated and power has become less fungible from one area to another, a condition that Keohane and Nye (2001) term “complex interdependence.” The soft power of persuasion has commensurately become more important relative to the hard power of military might (Nye 2004). There has also been increased recognition that effective power comes not just from capabilities but also from perceptions of the state’s willingness to use them. A final structural concept are capabilities that ref lect the increasing complexities of the international system. The number of states in a regional grouping or in the overall world system is one measure related to these complexities. Similarly, the number of regional groupings below the world level in some area such as trade or economic cooperation, and their size and overlaps, is another way to try to capture the complexity in specific types of international interactions. The third set of ideas about state strength involve its domestic capabilities that enable it to inf luence the international system over the longer term. For example, what are the resources available to the state that could be utilized to affect foreign activities? This can be measured by governmental revenues or spending (absolutely or relative to GDP).7 A typical source would be the International Financial Statistics (IFS) issued by the IMF, or SPIRI.

198

●

Walton, Angkinand, Arbetman et al.

The level of resources might seem very large, but the level of discretion in reallocating them may vary widely, depending on their other uses. One might try to measure the degree of overreach of a state by its fiscal deficit or its debt service payments. Again, the IFS is a useful cross-national source. Finally, one might define state strength as state actors being sufficiently strong politically so that they could, if needed, reallocate resources to foreign activities. This can be measured in various ways, but these seem to categorize into measures of political strength directly, and the absence/presence of societal pressures for domestic actions. The political strength has been measured by the number of political parties in a ruling coalition (with more reflecting weakness), public support of the ruling party or executive, and the legislative success of the executive’s agenda. Sources include the Lijphart Election Archive and Elections around the World. Domestic pressures that might preclude reallocating resources to foreign activities include inf lation, unemployment, and crime levels. The OECD is a good general source of international data of this type, and for the United States the Departments of Commerce and Justice provide similar data, in addition to public opinion surveys reported in the media. Related to this last category of domestic pressures that make it difficult to reallocate resources to foreign activities are measures of political instability, which will be discussed at more length in a later section, but which merit some brief review here. Various direct measures and related signals have been used to capture this notion. The Banks event dataset attempts to capture data on riots, demonstrations, and similar activities for a wide variety of countries since 1815. Another attempt to create a single-dimensional measure is that of Venieris and several coauthors (Stewart and Venieris 1985; Venieris and Gupta 1986) to measure SPI (Socio-Political Instability). This is based on the Banks data, and uses various of Banks’ categories in its aggregation. The measure has been shown to be correlated to various economic phenomena. From this we will move to a discussion of a measure that seeks to measure the capability of a state to acquire the resources to carry out its policies; this measure, Relative Political Capacity (RPC), will be discussed in the following section. 3.1 Relative Political Capacity Relative Political Capacity (RPC) is a concept introduced by Organski and Kugler (1980, p. 72) to represent the ability of a state to carry out

Structure, Strength, and Effectiveness

●

199

its policies. The initial measure they created was a measure of the ability of the state to extract resources (i.e., taxes) that could be used by the government. They argue that this is a signal, not a direct measure, of a state’s strength. In spite of its being at best a proxy for the desired measure, subsequent empirical work has shown RPC to be a useful explanatory variable for a wide variety of state policies and their effectiveness. Some related measures such as Relative Political Reach (RPR) have been derived by Arbetman and Kugler (1997) and others. The basic Organski-Kugler measure is the ratio of actual taxes raised to the estimated taxable capacity of the state. The specification derives from the public finance literature on tax effort, with some further changes in Snider (1986) and other works. A crucial point is that how one chooses measures of a nation’s economy will determine the measured RPC for a nation. This process introduces some degree of subjectivity into the measurement, as the art of specifying a regression equation has not been reduced to a science. The amendment provided by Snider suggests this more clearly. Snider was studying the Persian Gulf states, many of which export state-owned minerals such as oil and natural gas. These comprise the bulk of state revenue for these countries. Depending on how the contracts with the oil companies doing the extraction work are written, they may be counted as taxes or otherwise. Regardless of this definitional issue, the issue of how to account for this revenue source in properly measuring RPC arises. These states, given the usual economic measures used to measure tax effort, may be viewed as making very high efforts when oil prices are high, and low efforts when they drop—but does this change really ref lect anything about what RPC is supposed to measure? Accounting for this factor is not easy, and just using a binary variable for these mineral economies doesn’t capture the problem well at all. This is a crucial issue in measuring RPC for these types of states, while other problems beset trying to control for other structural factors. Using a measure like RPC should involve care with respect to the specification of the tax effort equation used to set the standard against which a state’s revenue is judged. In some work, this will not be a problem, especially if all the economies are sufficiently similar. But to the extent they differ structurally, care must be used or the measure will involve more error than need be, and biases that are related to the economic and political factors that are ignored in specifying the effort equation. Another conceptual problem is that this approach implicitly assumes that more taxation is better than less. In lower income countries with

200

●

Walton, Angkinand, Arbetman et al.

modest levels of government capacity to raise revenue, this is a reasonable assumption. For the advanced industrial countries, however, levels of taxation are more a reflection of political choice than political capacity. Thus, this approach is much more attractive for application to lower income countries. An alternative approach to implementing this concept that is less subject to the previous qualification has been suggested by Willett (1997). There is a fairly high degree of consensus among economists about the relative economic inefficiency costs of a number of different types of taxation. The lower is a country’s political capacity, the more likely it is to rely on more economically inefficient forms of taxation such as inf lation and tariffs rather than lower cost forms of taxation such as income taxes, value added taxes, and sales taxes. The share of indirect or trade taxes could be used as an alternative to RPC to examine robustness. In spite of these caveats, RPC will continue to be used in more different areas of policies and outcomes, for the reason that it often works. So long as careful attention is paid to the interpretation of the results when the economies are heterogeneous, this approach should continue to be useful. 8 3.2 Political Instability One hallmark of a strong government is one that can create and maintain a stable environment for societal and economic growth and development. Unstable environments lead to volatility and uncertainty on the part of citizens about the long-term viability of their prospects, economic and otherwise. Highly unstable political environments beget concerns not just about the long term, but also nearer-term doubt about one’s ability to safeguard one’s own personal property and safety. Like many issues of such paramount importance, political instability is difficult to define precisely. This poses an acute problem for researchers, who must choose how best to operationalize political instability from among many competing definitions, which often differ in meaningful ways. There are a number of different measures of political instability available, but there are two broad types of concepts. One has to do with instability in the physical sense of riots, destruction , and coups. The second type may occur within even the most peaceful environments; this is the probability of changes in government through elections or parliamentary realignments.

Structure, Strength, and Effectiveness

●

201

Perhaps the most direct approach to measuring political instability is to measure instances when the power structure of the government has changed hands. Of these the most dramatic and consequential shifts for most research purposes are those of irregular government change (i.e., power changing hands outside of planned elections or other forms of anticipated transfers of power). As these tend to be brought on by violence and social turmoil, many measures of political instability are constructed as functions of the number of episodes of politically motivated violence and unrest, such as riots, coups and coup attempts, revolutions, strikes, assassinations and assassination attempts, and so on. Once episodes and their relative severities have been identified, the task still remains to aggregate the relevant occurrences into an index suitable for use in empirical research. There is little by way of guidance as to how to best construct such indices, though scholars generally agree that the variables included in such indices should be chosen in a theorydriven manner, and that the aggregation method should be statistically appropriate rather than arbitrary. Researchers often use principal components analysis to simultaneously address both concerns by identifying the common component of these factors, and using that as their political instability variable. Another method is to estimate political instability as a probability of irregular government change dependent on these violence and unrest factors, see Feng (2003). Regular government change (i.e., government turnover that conforms to an established transfer of power mechanism) is also used in a number of studies as a measure of political instability. Generally these studies are concerned with policy formation in industrialized countries, where even an orderly transfer of power can inf luence the eventual outcomes. Common measures include the number of observed “significant” government changes (such as changes in the party-holding executive office, cabinet portfolio reshuff les, etc.) as is proposed by Grilli et al. (1991) and others. This form of political instability operates largely through its inf luence on shortening the effective time horizons of important political actors. This is one of the aspects of democracy that is most debated. On the one hand, the prospect of “voting the bum out” has long been one of the hallowed principles of democracy. On the other hand, frequent changes in government can offer incentives for politicians to adopt short-time horizons exacerbate problems of time inconsistencies—those situations where policies with good short-term effects and bad long-term effects. In such cases, the temptation to pursue short-term gain at the cost of long-term pain may prove too tempting to pass up when the policymaker

202

●

Walton, Angkinand, Arbetman et al.

expects that someone else will be in office when the pain finally arrives. With well-informed, far-sighted voters this would not be much of a problem, but the concept of rational ignorance from public choice theory explains why voters are not always well-informed and why there thus may remain incentives for politicians to generate political business cycles. Such controversies have generated a large literature with much of the conf licts in views among researchers coming down to whether the glass is half-full or half-empty.9 The political business cycle is sometimes but not always played. Inefficiencies due to time inconsistencies can show up in many other dimensions as well, such as in protectionist trade policies. Where these time inconsistencies are important there is a case for exempting such activities from direct discretionary control via limitations on behavior, that is, debt limits and inf lation targets, or delegating decision-making authority to independent brokers such as the International Trade Commissions and independent central banks. In such cases, while the government is made weaker in terms of the range of its discretionary authority it is made stronger in its ability to withstand societal pressures. Another factor that is considered to be of interest in studies of political instability is the underlying political fragmentation of the society. While difficult to measure directly, a number of proxies have been developed, some of that look at strictly political differences (Franzese 1998), and some which delve into religious and ethnolinguistic fragmentation (Mauro 1995). 3.3 Failed States Index Definitional problems stalk state failure measures. To be effective, selected measures should support the author’s definition of the state’s core purpose. Variations in definitions will substantially affect the choice, and therefore effectiveness, of measures as can be seen by the following three definitions (Chesterman, Ignatieff, and Thakur 2005): (1) the social contract definition, under which state failure is the incapacity of the state to deliver on basic public goods; (2) the legitimate use of force monopoly definition, under which state failure is the breakdown of authority structures; and (3) the legal capacity definition, under which state failure is the incapacity to exercise legal powers effectively. The study of state failure encompasses many important research areas that may be broadly categorized into three questions. Each of the following research questions requires a different set of variables.

Structure, Strength, and Effectiveness

●

203

1. What are the predictors of state failure from endogenous threats? What are the predictors of state failure from an internal (structural and governance quality) perspective? Is there more than one set of predictors when states are differentiated, for example, by geographic, cultural, historical, or resource endowment variables? 2. What are the predictors of state failure from exogenous threats? What are the predictors of state failure from an exogenous shock perspective? 3. What are the indicators of state failure? What are the outcome variables that reveal a state’s level of failure? These research topics are not cut-and-dried in terms of how to create measures and research designs to avoid or deal with the rife complicating factors associated with observed failed and failing states. Some of these complicating factors are as follows: • Western Definitional Bias. How is Western bias removed from the classification of states as “successful” or “failing”? Should success in a least developed country have the same criteria as success in a developed country? • State within a State. Should a state be considered failing if the central government is failing, yet functioning polities remain that provide critical services of government, that is, Hezbollah in Lebanon, warlords in Afghanistan? • Governance Capacity and Effect of Regime Type. How is the capacity of some states to gather resources and cope better than others with the pressures that lead to failure recognized? • Harboring of terrorists. Is a state a failure by definition if it harbors terrorists, yet otherwise has a strong government? From the topics of interest that rely on the use of state failure variables and the potential pitfalls facing these measures, we have the bases for an evaluation of state failure indices and datasets. We can evaluate and compare them by the following eight criteria: 1. Indicators versus predictors. Is the analysis assessing the indicators, or predictors, of state failure? 2. Quantitative analysis versus qualitative analysis. Is the research based on empirical quantitative data or is it qualitative? 3. Differentiated versus global. Are states differentiated to accommodate a focused analysis by a defined geographic, historical,

204

4. 5.

6. 7.

8.

●

Walton, Angkinand, Arbetman et al.

resource endowment, cultural or economic difference or is the system analyzed globally as one model? Multiple years versus limited years. Does the index cover a multiyear timespan or is it limited to a few years or less? Multiple states versus case study (one or two cases). Are a sufficient number of states evaluated to build meaningful data, or is just one or two states analyzed using a case study methodology on a comparative basis? Fixed analysis versus variable analysis. Is the analysis fixed, or is the analysis open to the user adjusting variables and their weighting? Transparent analysis versus opaque analysis. Is the analytical methodology transparent so it may be reproduced, or is it opaque so that it cannot be tested and subject to falsification tests? Public domain versus fee basis. Is the information free and in the public domain for scholars to analyze, or is the information feebased and available only to paying clients?

Ideally a strong state failure index will measure either predictor variables (causes of state failure) or indicator variables (consequences of state failure), will use quantitative data that is differentiated by a meaningful classification methodology, will span multiple years and multiple states, and will be a fixed analysis that is transparent and in the public domain. We shall now discuss one prominent measure of state failure, Foreign Policy’s Failed States Index, and evaluate it on this set of criteria.10 In 2004 state failure research surged forward with Foreign Policy Magazine’s “Failed States Index” (FSI) (2005), a collaboration between the Fund for Peace and the Carnegie Endowment for International Peace. Published annually, this index was designed to provide an “early warning and assessment of societies at risk of internal conflict and state collapse.” The index provides a hierarchical ranking of 177 states based on their aggregated performance against 12 political, economic, military, and social instability indicators. The index for rank-order states (by relative failure and trend performance) for the past three years is available. This index immediately became controversial. This section examines the quality, and effectiveness, of the FSI in measuring state failure. To critique this index, this section evaluates the FSI in four areas: (1) what is the definition of state failure upon which the measures are based? (2) what is the state failure research question? (3) how are other multidisciplinary concerns that impact state failure addressed? and (4) how does the index rate when evaluated against eight criteria?

Structure, Strength, and Effectiveness

●

205

The FSI does not explicitly define what constitutes state failure; however, the measures selected indicate their definition encompasses all three definitions presented above: the incapacity of a state to deliver basic public goods; the breakdown of authority structures and the incapacity of the state to exercise its powers effectively. The FSI uses 12 economic, military, and social instability indicators categories that measure all three research questions. Four categories measure state failure predictors from endogenous threats, one category measures state failure predictors from exogenous shocks, and seven categories measure state failure indicators. The FSI does accommodate some concerns from related disciplines and issues, including the capture of a “state within a state” and four measures that capture various components of effective governance. The FSI combines predictor and indicator measures in one index that unfortunately results in the causes and consequences of state failure being combined together to the detriment of the utility of the index. The index is quantitative and has significant coverage of 177 states, but due to its recent emergence only spans three years, a fact that severely limits time series analysis until more history can be accumulated. The data is undifferentiated with respect to some meaningful factors such as geography, resource endowment, or economic development. Consequently dissimilar countries are often grouped together and compared in one global analysis, for example, developed countries challenged by narco-terrorism are compared with economically isolated countries engaging in genocide. Finally, the analysis is fixed with the capability to be variable. The analysis is in the public domain and transparent—if purchased. The FSI is an important index. At the time of initial publication the index generated significant interest because it was the first index of its time to capture, quantify, and rank states whose conditions were deteriorating. Since this initial publication, much work has been done to begin to define, focus, and separate the complex interactions of the multiple variables that define the large subject of state failure. This later work has established clear opportunities for improvement in the FSI. These improvements include (1) clear statement of the definition of state failure; (2) clear focus on the research question that requires separating the predictor (causes) and indicator (consequences) variables; (3) definitions of multidiscipline areas; (4) a meaningful differentiation of states so that comparisons are based on common groups; and (5) consensus among scholars as to what the definitions and measures of state failure are, so that the variables are fixed and analyses

206

●

Walton, Angkinand, Arbetman et al.

by interested scholars can be conducted using the same assumptions and same data. 4 Effectiveness of the State Any evaluation of the effectiveness of the state must, to some degree, involve an underlying normative component, that is, some conception of what the state ought to be doing. When one measures government effectiveness on any given dimension, the underlying normative element is carried along with it, sometimes hidden below the surface. When one studies corruption, for example, the underlying normative assumption is that more corruption is bad, and less corruption is good. This can create some tricky complications for the positive social science researcher, who must go to some length to become aware of the underlying assumptions and assess their validity. In the corruption example given above, there are cogent arguments that suggest that corruption may “grease the wheels” of an otherwise unresponsive bureaucratic institution, and therefore may not be an altogether bad thing. Economists and political economists in particular are interested in analyzing not just the intended consequences but also the unintended consequences of institutions, policies, and decision-making processes, and researchers who fail to note the underlying normative assumptions driving their research may unknowingly blind themselves to potentially interesting findings or courses of inquiry. The number of available government effectiveness indicators is staggering, and runs the gamut of services a government may potentially be expected to provide. In this section we will touch brief ly on only one of these, corruption, before moving to a discussion of one significant clearinghouse for governance indicators across a broad range. 4.1 Corruption Indicators of corruption have been discussed at length in the chapter by Omer Gokcekus and Justin Myzie in this volume, but are worth a brief review here as well. A basic problem with this data is cross-country comparability, as each expert must be well-versed and up-to-date on their country, but is expected to be able to evaluate some neighboring countries. This greatly limits the cross-national comparability, but may enable the data to be useful as time series. Other corruption measures have been created more recently. The most widely cited and used is the TI (Transparency International) data made available on their Web site

Structure, Strength, and Effectiveness

●

207

(Transparency.org) for free. Alesina and Weder (2002) show that the average ratings for ICRG and TI corruptions measures correlate at the 0.87 level. This is quite high, although the sample is only about 70 countries. The correlation of each with corruption measures other than TI and IMD (the World Competitiveness Index measure of corruptions) are smaller, and in many cases below 0.5. The various indices are capturing different information, and the only clear difference between them is the type and number of sources. Kaufmann, Kraay, and Mastruzzi (2007) advise using several sources in trying to evaluate the corruption level of a country, but do not suggest how to aggregate them, nor how to compare their value. 4.2 Governance Dataset At the World Bank, Kaufmann, Kraay, and Mastruzzi (2004, 2007) have created a set of measures designed to capture key features of governance: 1. 2. 3. 4. 5. 6.

Voice and accountability; Political stability and the absence of violence; Government effectiveness; Regulatory quality; Rule of law; Control of corruption.

They selected and reaggregated a number of existing data sources, and wisely make no attempt to create a single index of governance. This avoids the problem of attempting to force a variety of variables into a single-dimensional measure. On the other hand, one might question if there really are 6 distinct dimensions of governance contained in the many data measures used (in the data used for the 2006 measures, some 310 individual variables were utilized, taken from 33 different datasets; [Kaufmann, Kraay, and Mastruzzi 2007, p. 4]). It would be useful to attempt to measure the effective dimensionality of this underlying data and to see how much of the underlying variance is captured in the resulting six measures. In contrast to some other attempts at data reduction, the authors do not employ factor analysis to create the six indices. Instead, they assign each of the 310 variables to one of the 6 categories a priori. This is attractive, but note that it means that on average, some 52 variables were aggregated into each of the 6 measures. With this many underlying factors, one might wonder how much each one, and especially the

208

●

Walton, Angkinand, Arbetman et al.

new ones added in the latest round, adds to the information in the resulting aggregate measures. How are the variables aggregated? There is some sense of an a priori belief that some measures were of the same thing, although the motivation for the following procedure is not clearly stated: In some cases we use several individual variables from a single data source in our aggregate indicators. When we do so, we first compute a simple average of these variables from a single source, and then treat the average of these individual questions as a single observation from that data source. (Kaufmann, Kraay, and Mastruzzi 2007)

Once this is done, the variables assigned to a particular indicator are precision weighted and a weighted average calculated as the indicator. This procedure is based on an “unobserved components model,” described in Kaufmann, Kraay and Mastruzzi (2004). This is a latent variable extraction technique that can be used with an unbalanced panel of underlying data. This makes it easy to incorporate new data measures as they become available, and to utilize the complete country coverage of each measure, although it may vary from year to year. The authors attempt to guard against criticism about the precise weighting of the underlying components used with statements to the effect that “since the underlying data sources on average are quite correlated with each other, the choice of weights used to construct the aggregate indicator does not substantially affect the estimates of governance that we report” (Kaufmann, Kraay, and Mastruzzi 2007, p. 11). The high correlations that make such concerns irrelevant also make one wonder why the use of so many measures has any particular value. 5 Methodological Survey of Governance Measures The sheer number and variety of governance indicators available to researchers can appear as a daunting thicket to the governance researcher unfamiliar with the voluminous literature. Fortunately, the United Nations Development Programme (UNDP), the European Commission, and the World Bank’ Worldwide Governance Indicators project (of which the governance dataset discussed in the previous section is a product) have spearheaded a major effort to map the terrain. While there are far too many indicators to be discussed at length in this chapter, we are pleased to recommend the 2004 UNDP Governance Indicators: A User’s Guide as an excellent expansive survey and handbook.

Structure, Strength, and Effectiveness

●

209

Another useful guide is a 2003 report by World Peace Foundation (Marie Besançon [2003] summarized over 50 different governance indicators, classifying each according to a number of criteria). Among the criteria mentioned, worthy of note is the distinction between subjective and objective measures. Objective measures are constructed based on observed facts, while subjective measures often rely on surveys of experts or informed observers. Both approaches carry with them advantages and disadvantages. Objective measures are generally considered to be more operationally reliable in that they tend to have clearly defined and replicable measurement processes. That said, these processes may by their formulaic approach, miss underlying nuance. For example, an objective measure of protests would duly note equal-sized large protests in the United States and in the People’s Republic of China as being equal to each other, when in fact the two protests may have widely different domestic implications from each other. Subjective measures, on the other hand, represent the assimilation of expert experience and knowledge through internal, unknown, and unreplicable processes in the minds of those whose opinions are surveyed. While the aggregation of these opinions may give some informative bearing on observed events and phenomena, they do not rise to the researcher’s preferred standard of objective fact. Several other general problems plague most governance datasets in use today. In most of them, several measures are provided of what seem to be conceptually distinct factors. The tie between the concepts and the measures is not always tight, and quite often these conceptually distinguished data measures are highly correlated. In all too many cases, adding more measures conveys almost no new information not already contained in the smaller subset. Collinearity presents daunting interpretational and apparent statistical problems to the analyst, and needs to be dealt with very carefully. This problem is very common in the measures that attempt to provide a single-dimensional summary index for a collection of subindices. New researchers criticize existing measures and attempt to add further subindices to the existing ones. Adding a new highly correlated subindex to the existing ones, diluting the weight of all by taking a simple average to get the new index, is very unlikely to change the resulting empirical analysis much. It sounds good to measure some new conceptual category, and bring this into the overall measure, but the execution often accomplishes little. A further problem in such attempts to create a single aggregate index is that there is often a lack of serious analysis of the proper weighting of

210

●

Walton, Angkinand, Arbetman et al.

the subfactors. These are typically simply aggregated by taking a simple average, implicitly assigning the same weight to each subindex. This may make little sense. Further, adding a new subindex that is highly correlated with an existing one merely doubles the weight on the correlated index, a choice that is done without really thinking if it makes sense. The very issue of aggregation itself is often not carefully thought through. Some aggregate categories can be conceptually disaggregated into a few quite distinct factors. If each of these factors can be measured with some accuracy, it may make more sense to avoid aggregating them at all. Trying to tease out the separate effect of each factor may make more sense than acting as if one can properly aggregate them in a thoughtless way. Further, even if such an aggregation might be sensible in one use of the data, the aggregation and its implied weighting may make far less sense in a different use of the measure. Far more care and thought in using aggregates is needed across the board with these new types of data.11 6 Conclusions For any given characteristic of governments and governance, multiple measures of that characteristic exist, each with its own definition and operationalization technique; they rarely, if ever, overlap perfectly. If you wish to evaluate democracies, and “a democracy” for your purposes is simply “a country that holds regularly scheduled elections,” then you would be well advised to select a measure of democracy that uses as similar a definition of democracy to yours as may be found. Likewise, if you are researching political instability and don’t consider the 1994 Republican takeover of the U.S. Congress to be an example of said instability, you should think twice about using a measure that does consider it to be such an example. You may not be capturing the type of political instability that is relevant for your study, and your results are likely to be less credible for it. Another feature of the datasets that needs to be kept in mind in using them is the data sources. In other words, how was the data created? One cut is whether the data measure is reasonably objective or largely subjective. Are we counting the share of seats in the lower house of the national legislature held by the largest party in the governing coalition—objective—or asking someone how much political risk they perceive in investing in some country. This distinction is not entirely clear-cut. Some statistically constructed measures, such as RPC (Relative

Structure, Strength, and Effectiveness

●

211

Political Capacity), seem to be objective—they rely on statistically processing reasonably objective government-provided data. Others rely on subjective, survey-based data; these may be more useful for capturing nuance than objective measures, but it must not be forgotten that they necessarily are measures of perceptions rather than of objective reality. Notes 1. We refer to these definitions as being “value-laden” due to their inclusion of components that are usually associated with some normative end of democratic rule and unnecessary to meet the strictest technical definition of democracy, that is, rule derived at its core through regularly scheduled elections in which those ruled are able to participate (albeit to varying degrees of enfranchisement). This distinction is made solely in order to preserve the utmost clarity when differentiating between the various definitions; it is not intended to impute any perjorative connotations. Indeed, the value-laden definitions, though they must necessarily involve some level of subjective evaluations when deciding whether or not a particular country meets the definition of a democracy at a given time, are likely to better capture those elements that are usually thought of when referring to democracies in the common parlance than will the strict technical definitions. 2. For one recent study, see Feng (2003). 3. For example, Drazen (2000), Alesina et al. (1997), and Persson and Tabellini (1990). 4. Bueno de Mesquita et al. (2003) define the selectorate as the set of people in a country who have a legal right to participate in the selection of the government leadership. The winning coalition consists of those members of the selectorate whose support is essential to the incumbent government. 5. It should be noted that the BdM2S2 authors explicitly avoid equating any potential W/S balances with democracies, autocracies, or dictatorships. 6. It is the probability of being in a successor-winning coalition. When either the size of the winning coalition shrinks or the size of the selectorate grows, defecting becomes riskier. This risk of exclusion from a challenger’s longterm winning coalition drives loyalty to the current leader. 7. Note that the use of GDP as the denominator is probably incorrect. More appropriate is the total of resources available to nationals, which is better measured by GNP. 8. Estimations of RPC may be obtained online at the Claremont Graduate University Web site free of charge. 9. For further discussion of political business cycles see Willett and Keil (2004). 10. Further information on failed states measures may be found at www.cgu. edu/pages/5162.asp.

212

●

Walton, Angkinand, Arbetman et al.

11. An example is useful to illustrate his point, and we will use one from the UNDP’s “Governance Indicators: A User’s Guide.” As the guide points out (p. 13), the Freedom in the World 2003 political freedom index scores countries from 0 (not free) to 4 (highly free) on three dimensions of political freedom, one of which is the freedom to form and join political or quasi political organizations; the scores for the three dimensions are then summed for the cumulative political freedom score. It is possible, therefore, for a country to ban political and quasi political organizations entirely and still score 8 points out of 12 on the political freedom scale. This is not to say that this score is invalid, but it does suggest that it is worthwhile for researchers using aggregated measures to devote some time to examining the underlying components of the aggregated scores where available.

References Alesina, A. and B. Weder. 2002. Do Corrupt Governments Receive Less Foreign Aid? American Economic Review 92 (4): 1126–1137. Angkinand, A. and T. D. Willett. 2007. Political Influences on the Costs of Banking Crises in Emerging Market Economies: Testing the U-Shaped Veto Player Hypothesis. Forthcoming in Macroeconomics and Finance in Emerging Market Economics. Arbetman, M. and J. Kugler. 1997. Relative Political Capacity: Political Capacity and Political Reach. In Political Capacity and Economic Behavior, ed. M. Arbetman and J. Kugler. Boulder, CO: Westview. Banks, A. S. 2003. Cross-national Time-Series Data Archive. Electronic database. Available at http://www.databanks.sitehosting.net. (March 5, 2008). Beck, T., G. Clarke, A. Groff, P. Keefer, and P. Walsh. 2001. New Tools in Comparative Political Economy: The Database of Political Institutions. World Bank Economic Review 15 (1): 165–176. Besançon, M. 2003. Good Governance Rankings: The Art of Measurement. World Peace Foundation Report 36. Cambridge, MA: World Peace Foundation. Bollen, K. A. 1993. Liberal Democracy: Validity and Method Factors in Crossnational Measures. American Journal of Political Science 37: 1207–1230. Bueno de Mesquita, B., A. Smith, R. M. Siverson, and J. D. Morrow. 2003. The Logic of Political Survival. Cambridge, MA, London: MIT Press. Chesterman, S., M. Ignatieff, and R. C. Thakur. 2005. Making States Work: State Failure and the Crisis of Governance. New York: United Nations University Press. Chua, A. 2002. World on Fire: How Exporting Free Market Democracy Breeds Ethnic Hatred and Global Instability. New York: Doubleday. Clarke, K. and R. Stone. 2007. Democracy and the Logic of Political Survival. American Political Science Review (forthcoming). Diamond, L. 1997. Prospects for Democratic Development in Africa. Hoover Essays in Public Policy 74. Stanford: Hoover Institution Press.

Structure, Strength, and Effectiveness

●

213

Drazen, A. 2000. Political Economy in Macroeconomics. Princeton, NJ: Princeton University Press. Elections around the World. Electronic database. Available at http://www. electionworld.org/election/indexfrm.htm. (March 5, 2008). Feng, Y. 2003. Democracy, Governance and Economic Performance: Theory and Evidence. Cambridge, MA: MIT Press. Foreign Policy. 2005. Failed States Index. Foreign Policy 149: 56–65. Franzese, R. J. 1998. Are Budget Deficits Used Strategically? Mimeo. University of Michigan Available at http://www-personal.umich.edu/~franzese/DebtPaper. Short.pdf. Gasiorowski, M. J. 1996. An Overview of the Political Regime Change Dataset. Comparative Political Studies 29 (4): 469–483. Grilli, V., D. Masciandaro, G. Tabellini, E. Malinvaud, and M. Pagano. 1991. Political and Monetary Institutions and Public Financial Policies in the Industrial Countries. Economic Policy 6 (13): 341–392. Gurr, T. R. 1990. Polity II: Political Structures and Regime Change, 1800–1986. Ann Arbor, MI: Inter-University Consortium for Political and Social Research. Henisz, W. J. 2000. The Institutional Environment for Economic Growth. Economics and Politics 12: 1–31. Huntington, S. P. 1984. Will More Countries Become Democratic? Political Science Quarterly 99 (2): 193–218. International Monetary Fund. Various years. International Financial Statistics. Electronic dataset. Washington, DC: International Monetary Fund. Kaufmann, D., A. Kraay, and M. Mastruzzi. 2004. Governance Matters III: Governance Indicators for 1996, 1998, 2000, and 2002. The World Bank Economic Review 18 (2): 253–287. ———. 2007. Governance Matters VI: Governance Indicators for 1996–2006. World Bank Policy Research Working Paper No. 4280. Available at SSRN: http://ssrn.com/abstract=999979 Keohane, R. O. and J. S. Nye. 2001. Power and Interdependence. 3rd Edition. New York: Addison-Wesley. Lijphart Election Archive. Electronic database. Available at http://dodgson.ucsd. edu/lij/. Lipset, S. M. 1959. Some Social Prerequisites of Democracy: Economic Development and Political Development. American Political Science Review 53: 69–105. MacIntyre, A. 2001. Institutions and Investors: The Politics of the Economic Crisis in Southeast Asia. International Organization 55 (1): 81–122. Margolis, M. 1979. Viable Democracy. New York: St. Martin’s Press. Marshall, M. G. and K. Jaggers. 2000. Polity IV Project: Political Regime Characteristics and Transitions, 1800–1999. Unpublished manuscript, University of Maryland, Center for International Development and Conflict Management. Mauro, P. 1995. Corruption and Growth. Quarterly Journal of Economics 110 (3): 681–712.

214

●

Walton, Angkinand, Arbetman et al.

North, D. C. and B. R. Weingast. 1989. Constitutions and Commitment: The Evolution of Institutional Governing Public Choice in Seventeenth-Century England. Journal of Economic History 49 (4): 803–832. Nye, J. S. 2004. Soft Power: The Means to Success in World Politics. PublicAffairs. Olson, M. 1982. The Rise and Decline of Nations: Economic Growth, Stagflation, and Social Rigidities. New Haven: Yale University Press. OECD National Accounts Database. Electronic database. Available at http://www. sourceoecd.org. (March 5, 2008). Organski, A. F. K. and J. Kugler. 1980. The War Ledger. Chicago: University of Chicago Press. Persson, T. and G. Tabellini. 1990. Macroeconomic Policy, Credibility and Politics. London: Harwood. Przeworski, A., M. Alvarez, J. A. Cheibub, and F. Limongi. 2000. Democracy and Development: Political Regimes and Economic Performance, 1950–1990. Cambridge: Cambridge University Press. Schumpeter, J. A. 1976. Capitalism, Socialism, and Democracy. Cleveland: World Publishing. Snider, L. 1996. Growth, Debt, and Politics: Economic Adjustment and the Political Performance of Developing Countries. Boulder, CO: Westview. Stewart, D. B. and Y. P. Venieris. 1985. Sociopolitical Instability and the Behavior of Savings in Less-Developed Countries. Review of Economics and Statistics 67: 557–563. Swedish International Peace Research Institute Military Expenditures Database. Electronic database. Available at http://www.sipri.org/contents/milap/milex/ mex_database1.html. (March 5, 2008). Transparency International. 2007. Global Corruption Report 2007. Cambridge: Cambridge University Press. Tsebelis, G. 1995. Conditional Agenda-Setting and Decision-Making Inside the European Parliament. The Journal of Legislative Studies 1 (1): 65–93. ———. 2002. Veto Players : How Political Institutions Work. New York: Russell Sage Foundation. United Nations Development Programme. 2004. Governance Indicators: A User’s Guide. Prepared by M. Sudders and J. Nahem. Electronic document. Available at http://www.undp.org/governance/docs/policy-guide-IndicatorsUserGuide. pdf. (February 28, 2008). United States Central Intelligence Agency. Various years. CIA World Factbook. Available at http://www.cia.gov/library/publications/the-world-factbook/index. html. Vanhanen, T. 1990. The Process of Democratization: A Comparative Study of 147 States, 1980–88. New York: Crane, Russak. Venieris, Y. P. and D. K. Gupta. 1986. Income Distribution and Sociopolitical Instability as Determinants of Savings: A Cross-sectional Model. Journal of Political Economy 94 (4): 873–883.

Structure, Strength, and Effectiveness

●

215

Willett, T. D. 1997. Alternative Approaches to Estimating Political Capacity. In Political Capacity and Economic Behavior, ed. M. Arbetman and J. Kugler, 297– 302. Boulder, CO: Westview. Willett, T. D. and M. Keil. 2004. Political Business Cycles. In Encyclopedia of Public Choice, vol. 2, ed. C. Rowley and F. Schneider, 411–415. New York: Springer-Verlag.

This page intentionally left blank

Index

adjusted trade flows, 16, 18 Anderson and Neary measures, 24–25 Annual Report on Exchange Arrangements and Exchange Restrictions (AREAER), 83, 85–87, 89, 93, 94, 99 audits, 145 black market premium, 19, 22 budget deficits, 33–34, 38, 46 Business Environment and Enterprise Performance Survey (BEEPS), 63, 142–143 capital account liberalization, 82, 84, 87 Capital Account Openness Index (CAOI), 86 capital controls, 81–102 central bank independence (CBI), 33–55, 158, 161, 194 central bank’s objectives, 38 classical welfare theory, 104 composite social indicators, 103 Computable General Equilibrium, 25 conflict resolution, 36, 38–40, 42, 44, 46, 158 corruption, 21, 130–153, 156–158, 161, 177, 206–207

credit crunch, 35 Cukierman, Miller, and Neyapti (CMN), 37 Cukierman, Miller, and Neyapti (CMN) index, 37–38 Cukierman, Webb, and Neyapti (CWN) index, 33–34, 36–37 Cumulative Reform Index (CRI), 60–62, 71 democracy, 6, 157, 188–194, 201, 210 Di Tella and Schargrodsky measure of corruption, 142, 146 direct lending, 42–43 direct measures of welfare, 113–117 Doing Business (DB) indicator, 69–71, 176 Easterlin’s paradox, 114, 120, 122 Ebrill and Havrylyshyn ITPR measure, 62, 72 economic freedom, 34, 64, 155–185 Economic Freedom of the World Index, 173 economic growth, 9, 15, 18, 34, 65, 82, 114, 140, 156–158, 161, 173, 187, 194, 200

218

●

Index

economic transformation, 57 exchange rate, 19, 21, 46, 83–87 extended national accounts, 103, 107, 115, 118 failed states, 202, 204, 211 financial stability, 35, 44, 45 fiscal conditions, 57–59, 63, 64, 72 fiscal freedom (FF), 65, 66, 76 fiscal reform, 57–63, 69 fiscal system, 57 free trade, 15, 18, 157, 158 Freedom in the World, 144, 161, 176, 184 Gini coefficient, 73, 105, 112, 115 Gokcekus and Muedin measure of corruption, 149 Gorodnichenko and Peter measure of corruption, 148 governance, 2–3, 143, 187–215 Grilli, Masciandaro, and Tabellini (GMT) index, 36–38, 41–43 Heritage Foundation Economic Freedom Index (EFI), 64, 162 household consumption, 107 human development index (HDI), 111–112 Index of Economic Freedom, 2, 155, 176 Index of Tax Policy Reform (ITPR), 62 Index of the Capture Economy (ICE), 63, 75 inequality, 73, 103, 112, 114, 115, 118, 131, 172, 173 inflation, 33, 34, 37–43, 45–46, 59, 114, 122, 158, 159, 162, 177, 191, 194, 198, 200, 202

International Monetary Fund, 23, 46, 83, 104 Kee, Nicita, and Olarreaga TRI, 39, 40 Martinez-Vazquez and McNab reform indices, 60 measure of economic welfare (MEW), 107 measures of trade openness, 16 Mercantilist Index of Trade Policy (MTRI), 25–26 monetary policy, 33–36, 41, 43, 45–47, 49, 51, 52, 155, 160 nontariff barriers (NTB), 19–27 objective happiness, 103, 116, 117 Olken and Barron measure of corruption, 147, 148 openness measures, 16 Overall Reform Index (ORI), 60–61 physical quality of life indicator (PQLI), 110–112 political fragmentation, 202 political instability, 200–201 preparation for tax reform, 60 prevalence of tax holidays, 61 price liberalization, 39, 58 price stability, 33, 35, 36, 41–46, 52, 159 price-based measures, 19 Reinikka and Svensson measure of corruption, 147 Sachs-Warner measure, 22, 29 selectorate, 189, 191–193, 211 social stability, 139 stability of the tax system, 60

Index state capture, 63–64, 72, 141, 143 state strength, 188, 196–198 subjective well-being (SWB), 103, 113, 114, 118, 119 subprime mortgage crisis, 35 Suits index, 73 tariffs, 19, 20, 91, 157, 180, 200 Tax Policy Reform (TPR), 62, 63 taxation, 59, 67, 70, 157, 158, 172, 192, 200 timing of tax reform, 60 trade barriers, 16, 21, 26, 158 trade flows, 16, 25, 57 trade openness, 9, 15–31, 82, 91

●

219

trade ratios, 17, 18, 22 trade restrictions, 15, 16, 19, 21, 25 Trade Restrictiveness Indices (TRIs), 16, 23, 25–28 transition, 37–39, 57–64, 71, 72, 86, 121, 122, 127, 135, 143, 150 transition economies, 37, 57, 63, 72, 149 turnover, 34, 40, 41, 58, 201 veto players, 189, 195 welfare cost, 20, 67 World Bank’s Trade and Production Database, 25