Dilemmas of Engagement, Volume 10 (Advances in Program Evaluation)

DILEMMAS OF ENGAGEMENT: EVALUATION AND THE NEW PUBLIC MANAGEMENT ADVANCES IN PROGRAM EVALUATION Series Editor: Bob St...

Author: Saville Kushner | Nigel Norris

30 downloads 871 Views 967KB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

DILEMMAS OF ENGAGEMENT: EVALUATION AND THE NEW PUBLIC MANAGEMENT

ADVANCES IN PROGRAM EVALUATION Series Editor: Bob Stake Co-Series Editor: Saville Kushner Volume 6: Telling Tales: Evaluation and Narrative – Edited by Tineke Abma Volume 7: Visions of Quality: How Evaluators Deﬁne, Understand and Represent Program Quality – Edited by Alexis P. Benson, D. Michelle Hinn and Claire Lloyd Volume 8: School-Based Evaluation: An International Perspective – Edited by David Nevo Volume 9: Evaluating the Upgrading of Technical Courses at Two-Year Colleges: NSF’S Advanced Technological Education Program

ADVANCES IN PROGRAM EVALUATION VOLUME 10

DILEMMAS OF ENGAGEMENT: EVALUATION AND THE NEW PUBLIC MANAGEMENT EDITED BY

SAVILLE KUSHNER University of the West of England, Bristol, UK AND

NIGEL NORRIS University of East Anglia, Norwich, UK

Amsterdam – Boston – Heidelberg – London – New York – Oxford Paris – San Diego – San Francisco – Singapore – Sydney – Tokyo JAI Press is an imprint of Elsevier

JAI Press is an imprint of Elsevier Linacre House, Jordan Hill, Oxford OX2 8DP, UK Radarweg 29, PO Box 211, 1000 AE Amsterdam, The Netherlands 525 B Street, Suite 1900, San Diego, CA 92101-4495, USA First edition 2007 Copyright r 2007 Elsevier Ltd. All rights reserved No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means electronic, mechanical, photocopying, recording or otherwise without the prior written permission of the publisher Permissions may be sought directly from Elsevier’s Science & Technology Rights Department in Oxford, UK: phone (+44) (0) 1865 843830; fax (+44) (0) 1865 853333; email: [email protected]. Alternatively you can submit your request online by visiting the Elsevier web site at http://elsevier.com/locate/permissions, and selecting Obtaining permission to use Elsevier material Notice No responsibility is assumed by the publisher for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions or ideas contained in the material herein. Because of rapid advances in the medical sciences, in particular, independent veriﬁcation of diagnoses and drug dosages should be made British Library Cataloguing in Publication Data A catalogue record for this book is available from the British Library ISBN: 978-0-7623-1342-6 ISSN: 1474-7863 (Series) For information on all JAI Press publications visit our website at books.elsevier.com Printed and bound in the United Kingdom 07 08 09 10 11 10 9 8 7 6 5 4 3 2 1

CONTENTS LIST OF CONTRIBUTORS

vii

PREFACE

ix

THE NEW PUBLIC MANAGEMENT AND EVALUATION Nigel Norris and Saville Kushner

1

CONSTITUTIVE EFFECTS OF PERFORMANCE INDICATOR SYSTEMS Peter Dahler-Larsen

17

POETRY, PERFORMANCE AND PATHOS IN EVALUATION REPORTING Leslie K. Goodyear

37

EVALUATING COMPLEX PUBLIC POLICY PROGRAMMES: REFLECTIONS ON EVALUATION AND GOVERNANCE FROM THE EVALUATION OF CHILDREN’S TRUSTS Chris Husbands

53

PROGRAMME EVALUATION IN A DYNAMIC POLICY CONTEXT Paul Mason

67

SCHOOL SELF-EVALUATION Ron Ritchie

85

v

vi

CONTENTS

CHANGING CONTEXTS AND RELATIONSHIPS IN EDUCATIONAL EVALUATION Katherine E. Ryan

103

ON THE IMPORTANCE OF REVISITING THE STUDY OF ETHICS IN EVALUATION Thomas A. Schwandt

117

NEW PUBLIC MANAGEMENT AND EVALUATION UNDER DECENTRALIZING REGIMES IN EDUCATION Christina Segerholm

129

EVALUATION AND TRUST Nigel Norris

139

CONTRIBUTORS TO THIS VOLUME AND THEIR CONTRIBUTIONS

155

LIST OF CONTRIBUTORS Peter Dahler-Larsen

Department of Political Science and Public Management, SDU-Odense University, Odense M, Denmark

Leslie K. Goodyear

Education Development Center, Inc., Newton, MA, USA

Chris Husbands

School of Education and Lifelong Learning, University of East Anglia, Norwich, UK

Saville Kushner

Faculty of Education, University of the West of England, Frenchay Campus, Coldharbour Lane, Bristol, UK

Paul Mason

Centre for Applied Social and Community Research, Institute of Applied Social Studies, University of Birmingham, UK

Nigel Norris

Centre for Applied Research in Education, University of East Anglia, Norwich, UK

Ron Ritchie

Faculty of Education, University of the West of England, Frenchay Campus, Coldharbour Lane, Bristol, UK

Katherine E. Ryan

College of Education, University of Illinois at Urbana-Champaign, IL, USA

Thomas A. Schwandt

College of Education, University of Illinois at Urbana-Champaign, IL, USA

Christina Segerholm

Pedagogiska Institutionen, Umea Universitet, Umea, Sweden

vii

This page intentionally left blank

PREFACE In our public institutions it sometimes feels as though the tectonic plates of social and political contracts are shifting. Familiar coordinates are displaced or left stranded as we survey new territorial conﬁgurations and have to work out again how to ﬁnd our way in public administration. The economic revolution embraced by neo-liberals and conservatives found its counterpart in a governance revolution in the very institutions that have always been designed to protect us from sharp historical and political lurches one way or another. Our public institutions – mostly coinciding with what we can call our professional institutions – which were once the ﬁlters of social and political change have become co-opted into social reform and are now all-too-frequently the conduits of that political change. As a result, police ofﬁcers, nurses, teachers, social workers and others are expected to be agents of changes which have been designed by proactive and assertive governments and not by or in concert with themselves. What bends them to this task is a newly assertive approach to governance of our institutions – what has been widely dubbed The New Public Management. The new public management is alleged to be ideological in essence – turning hitherto independent organizations into neo-liberal instruments promoting politicized models of market and consumer action. But it is also alleged to be free of ideology – a rationality that transcends values-based arguments and speaks of its own self-worth quite independent of your moral or political leanings. Either way, the revolution being wrought – the intensiﬁcation of competitive individualism, market discipline, performance-based accountability, resource-based management – seems to be shaking the roots of our democracies, making us look again at the social and political relations that gave rise to these lumbering institutional giants of the social world. Professional institutions, for example, somehow model social relations, putting on public show and subjecting to public scrutiny the values which underpin a collective approach to how we respect persons. Public service is about service delivery, but it also is expected to be a model of the contemporary expression of social justice – not just to do with what we are treated with, but how and why we are treated at all. Change the way public institutions relate to ix

x

PREFACE

themselves (e.g. think of accountability systems) and we change the way they interface with the citizenry – and then we ﬁnd we are offering a different vision of social justice and respect for persons. This is what seems to be happening with the new public management and this is why it is worth putting together a volume on this topic. The volume is titled Dilemmas of Engagement, for the new public management does, indeed, present many evaluators with dilemmas on how to engage with new institutional realities. Some of these realities conform to an ethic somewhat short of that expected in contemporary evaluation – respect for diversity and values-pluralism, independence, self-determination, common democratic rights. This is not to say that NPM is, of itself, against these things, but that in its practice these humanistic dimensions to human relations often have to yield to other priorities such as efﬁciency, corporate values, standardization and coercive leadership – for this is the model of public institution we have bought into, for good or ill. How, then, to engage with these institutions in ethical evaluative roles requires some careful thought and keen skills of negotiation. Not much of this volume addresses that question – how to engage. It is, however, designed to look in various ways at this institutional revolution – from inside its values base and from outside; addressing it directly in descriptive and analytical ways, or obliquely through reﬂections and case-based explorations. The focus, however, is on evaluation and how the practice is touched and affected by new forms of public management – how a small sample of evaluators is responding to the engagement challenge. Hence, the volume is part of the Elsevier series with the generic title, Advances in Programme Evaluation. It is not all the time that the following papers will be advancing our understanding of evaluation practice, but together they do, hopefully, advance our understanding of the context for that practice. The volume has a coherence given by the focused and frequently overlapping concerns of the authors, but has little discipline in seeking to ‘cover all the angles’. In that sense, this reﬂects the nature of the new public management ﬁeld – for the realities of NPM and its implications and consequences sprawl across social, political and organizational life – but across personal and inter-subjective lives, too, as we shall see. Indeed, all authors in this volume and probably most of its readers will be touched by the new public management and will likely have a view of it. We hope and trust that these papers will help reﬂect on and advance those views. Saville Kushner and Nigel Norris Editors

THE NEW PUBLIC MANAGEMENT AND EVALUATION Nigel Norris and Saville Kushner ECONOMICS, LOGIC AND DEMOCRACY David Marquand (2004) opens his celebrated book on the ‘decline’ of public service by arguing (pace Tawney) that there is a fundamental tension between capitalism and democracy – the former dedicated to an essential economic inequality (competition); the latter dedicated to rights-based equity derived from civic ideals. His book highlights public sector institutions as the arena in which the inevitable tussle between the two is played out. The neo-liberal movement provides the ideological vehicle for reform of public institutions, designed to ‘root out the culture of service and citizenship’ (p. 2) – i.e., those civic ideals. At the heart of the struggle is professionalism, the combination of competence, judgement and principle, and neo-liberalism takes careful aim at it. Of course, there are alternative views. Many see the new public management as the growth of public accountability and the breaking of the stranglehold of privileged and elitist professions – there is undoubtedly a populist vein running through new forms of public management. Whether in the No Child Left Behind legislation in the USA or in the work of OfSTED in the UK – both subjecting schools and teachers to corporatist and centralised control – the citizen is supposedly reassured that there is a more direct link

Dilemmas of Engagement: Evaluation and the New Public Management Advances in Program Evaluation, Volume 10, 1–16 Copyright r 2007 by Elsevier Ltd. All rights of reproduction in any form reserved ISSN: 1474-7863/doi:10.1016/S1474-7863(07)10001-6

1

2

NIGEL NORRIS AND SAVILLE KUSHNER

between their interests and the strictures placed upon professional practice and institution. Marquand is characterising the new public management as a historic battleground between competing moral systems. Where many analysts treat the new public management as a paradigm shift in the way post-industrial societies organise their public, private and governmental sectors and regulate their interrelationship – a response to new political and economic logics – he sees writ-large major historical developments on an operatic scale. Few leading writers in this ﬁeld discount the ideological underpinnings to new forms of public management and many express concerns about the commonly perceived threat to public service values that it poses. However, a persistent theme running through their writings is of the universal reach of this movement – a transcendent organisational logic that carries ideology and politics away. Lane (2000, p. 2), for example, argues that it represents: The convergence in styles of public sector management around the world, despite all cultural and religious differences.

And, too, many writers see – sometimes implicitly – that the new public management represents or demands a new institutional settlement, a new social contract with the professions, for example. As Power (1997) shows, much of the success of the new public management has been the consensus to break the hegemony of the professions, to discipline them with market competition and to use that competition as a surrogate for public accountability. But, again, Marquand brings our attention back to the underpinning political/economic constitution of society – the democratic base of government–institution–citizen power relationships. He calls, in the end, for a new constitutional settlement. Marquand may or may not have historical reason on his side – many of those reading this volume will have their own view, especially since many, if not most, will be actors directly involved with some version of the new public management – as willing recruits, as resisters or engaged in what has been called ‘creative compliance’. Whatever the case, there is clearly much at stake for our democracies as the institutions which make up democratic foundations are destabilised in the process of waves of reform. This, alone, highlights the importance of evaluation as the principal means for making such issues transparent. The link between evaluation and democracy has been long and well drawn – starting with MacDonald (1976) who counterpointed democratic evaluation with bureaucratic and autocratic evaluation (see below); and revisited more recently by House and Howe (1999) who emphasise the deliberative dimension. But in this context, evaluation comes

The New Public Management and Evaluation

3

to the fore since knowledge control lies at the heart of all new public management projects. Whether in terms of the rise of Power’s ‘audit society’, the internalisation of the evaluation function with an attendant erosion of evaluation independence or in the parallel growth of low-trust accountability regimes, a key element of new forms of public management is the proliferation of indicators and targets intended to make performance transparent, but also intensifying control over it.

THE FORMS AND FUNCTIONS OF NEW PUBLIC MANAGEMENT Much has been written about the public management reforms that began in the late 1970s, as a response to economic crises and fears about the growing costs of public services and welfare. In one sense, there is nothing new about what came to be known as the ‘new public management’. The ideas it was based on were well worn and taken together might be thought of as an ideology about how best to organise and control the public sector. At the ideological heart of the new public management was the conviction that the public sphere of services, organisations and institutions would be improved by the application of public choice theory, business values, micro-economics and market mechanisms. Raymond Callahan’s study of the social forces that shaped the administration of public schools in America provides a chilling account of the application of business values, aided by scientiﬁc management, to education between 1900 and 1930. ‘As the business-industrial values and procedures spread into the thinking and acting of educators’, says Callahan (1962, pp. 246–247), ‘countless educational decisions were made on economic or on non-educational grounds’. The ideology was given renewed emphasis in the mid-1970s and 1980s, by a variety of right wing think tanks such as the Centre for Policy Studies, the Adam Smith Institute and the Institute for Economic Affairs. Norris (1995, p. 271) observed that the extension of business practices to the public sector was having a ‘profound effect on public administration and the character of central and local government’. Christopher Hood who has written extensively about the new public management described it as comprising seven doctrines: ‘(i) hands on professional management in the public sector; (ii) explicit standards and measures of performance; (iii) greater emphasis on output controls; (iv) shift to disaggregation of units in the public sector; (v) shift to greater competition in the public sector; (vi) stress on private sector styles of management practice; (vii) stress

4

NIGEL NORRIS AND SAVILLE KUSHNER

on greater discipline and parsimony in resource use’ Hood (1991, pp. 4–5). Dunleavy and Hood (1994) describe the shift from old public administration to new public management as consisting of: reworking budgets to be transparent in accounting terms, with cost attributed to outputs not inputs, and outputs measured by quantitative performance indicators; viewing organisations as a chain of low-trust principal/agent relationships (rather than ﬁduciary or trustee–beneﬁciary ones), a network of contracts linking incentives to performance; disaggregating separable functions into quasi-contractual or quasi-market forms, particularly by introducing purchaser/provider distinctions, replacing previously uniﬁed functional planning-and-provision structures; opening up provider roles to competition between agents or between public agencies, ﬁrms and not-for-proﬁt bodies; deconcentrating provider roles to the minimum feasible size agency, allowing users more scope to ‘exit’ from one provider to another, rather than relying on ‘voice’ options to inﬂuence how public service provision affects them. And, writing a little later Hood (1995, p. 95) summarised the key features of the new public management as ‘the idea of a shift in emphasis from policy making to management skills, from a stress on process to a stress on output, from orderly hierarchies to an intendedly more competitive basis for providing public services, from ﬁxed to variable pay and from uniform and inclusive public service to a variant structure with more emphasis on contract provision y’. The development of the new public management has been characterised by Lane (2001) as a shift from long-term to short-term contracting. Longterm contracts may be implicit in the relationships that employees and clients have with a public bureau or they may be explicit in governance structures or the relationships that the bureau has with suppliers. By contrast short-term contracts are much more likely to be explicit and detailed. Indeed what gives principals conﬁdence and control is the ability to tightly deﬁne the work and its results in the contract such that performance can be monitored. Christensen and Laegreid (2002, p. 268) have noted the ‘one dimensional emphasis of the new public management on economic norms and values’. Echoing the signiﬁcance of economic thinking in the new public management, other writers have drawn attention to: the promotion of ‘consumer sovereignty’ (Aberbach & Christensen, 2005, p. 226); equating

The New Public Management and Evaluation

5

democracy with consumer choice (Box, Marshall, Reed, & Reed, 2001); the importance of economic incentives (Gregory & Christensen, 2004); and the introduction of market mechanisms (Van Berkel & Van der Aa, 2005). Ferlie, Ashburner, Fitzgerald, and Pettigrew (1996, p. 225) observe that ‘there had been a substantial transfer of private sector models and concepts into the public sector organizations in an attempt to make them more like ﬁrms’. In their comparative analysis of public management, reforms involving 10 countries, Pollitt and Bouckaert (2000, p. 159) found that one of the most frequent and powerful motives for reform was ‘tighter control of public expenditure’. Reporting on a study of performance management arrangements in Finland, the Netherlands, Sweden and the United Kingdom, Pollitt (2006, p. 41) concludes that ‘in northwestern Europe, performance measurement has become almost universal’. The new public management has also been described as going hand-in-hand with managerialism, (Clarke & Newman, 1993). Managerialism can be deﬁned as embodying ‘the right to manage’ – creating the political space for management – ‘the necessity of management’ – making managers manage – (Kettl, 1997), and undue faith in the heroic power of managers to transform productivity. New relationships between government and institutions, professions and service providers have greatly increased the demand and opportunities for evaluation and the development of evaluative systems. As Purdue (2005, p. 123) notes the new public management introduced a culture to the public sector ‘that was characterised by constant monitoring and the construction of targets and league tables for every public service’. Evaluation and evaluative systems are vital in dealing with the transaction costs involved when public services are organised through a variety of market and quasi-market arrangements. Transaction costs are the costs of exchange, including the governance structures within which transactions are organised (Williamson, 1981). Evaluation is also needed to better understand and manage the process of innovation, improve its products and disseminate and institutionalise new ideas and practices. It is given added impetus by the rhetoric of ‘modernisation’ that accompanies the new public management. In education, for example, the new public management ﬁnds expression in policies such as local ﬁnancial management and ﬁnancial delegation, vouchers and tax credits, increasing parental choice and encouraging quasi-markets, national testing and league tables, allowing schools to opt out of local control, charter and trust schools, outsourcing schooling to for-proﬁt and not-for-proﬁt organisations and public–private partnerships.

6

NIGEL NORRIS AND SAVILLE KUSHNER

THE INSTITUTIONALISATION OF EVALUATION AND THE NEW BUREAUCRACY There are probably many sources of nourishment for belief in evaluation. One consequence of the continued loss of tradition and growth of individualism is less shared certainty about desirable futures and less conviction that traditional institutions can be trusted to work effectively and efﬁciently for the public good. There too is a loss of faith in just what the public good might consist of in the face of rapid social change, value pluralism and the potential conﬂicts inherent in fragmented societies and competitive nation states. In times of uncertainty, evaluation can contribute to legitimacy and rational control. Evaluation and applied social research more generally offer some purchase on the complexity of modern social life. While the new public management has doubtless given impetus to the institutionalisation of evaluation (Segerholm, 2003), there are also other forces that have contributed its ubiquity. The decline of tradition has increased the importance of reﬂexivity to individuals and organisations and has given a special place to evaluative mechanism in social life more generally. Reﬂexivity can be thought of as processes of self-monitoring and self-developing. For individuals, it is important in the construction and presentation of self. For organisations, reﬂexivity is at the heart of successful adaptation to change. Evaluation, it could be argued, is essential for self-determination. But it is also a modern malaise. It used to be largely focused on innovation and social intervention programs. It is now embedded in institutions, governance structures and ﬁnancial allocation mechanisms. In one way or another, almost every one is touched by the technologies of evaluation. Over 30 years ago, Barry MacDonald in a much quoted paper, Evaluation and the Control of Education, delineated three distinct political forms of evaluation: autocratic, bureaucratic and democratic. He described bureaucratic evaluation as an unconditional service to government agencies that accepted their values and helped them accomplish their policy objectives. Today bureaucratic evaluation is woven into the fabric of public policy and public services through internal evaluation and external auditing, inspection and monitoring arrangements, performance management systems, and the collection and publication of performance indicators that have developed to evaluate institutions, services and programmes. The routinisation of evaluation, its incorporation into many aspects of social life, resonates with what Anthony Giddens (1991) calls the reﬂexivity of the self and with the individualism of market-based solutions to the organisation and delivery of public policy. The making and re-making of

The New Public Management and Evaluation

7

self-identity and the organisation and re-organisation of public services call for regular evaluative feedback to afﬁrm or otherwise progress. Through its routinisation evaluation has become hyper-rational. It is ironic that the new public management, heralded by some as the antidote to the evils of bureaucracy (lack of responsiveness and ﬂexibility, ineffectual, inefﬁcient and self-serving bureaus), should proliferate its own new bureaucratic structures and processes. Market-based public management reforms have necessitated new control mechanisms. Max Weber’s classic exposition of the characteristics of modern bureaucracy outlined the importance of rules, records, continuity, technical knowledge and the calculability of results (Weber, 1947, 1968). In his essay on bureaucratic structure and personality, Robert Merton (1940, p. 561), drawing from Max Weber, describes the chief merit of bureaucracy as ‘its technical efﬁciency, with a premium placed on precision, speed, expert control, continuity, discretion, and optimal returns on input’. He goes on to note that ‘the bureaucratic structure exerts a constant pressure upon the ofﬁcial to be methodical, prudent, disciplined’ and if ‘the bureaucracy is to operate successfully, it must attain a high degree of reliability of behavior, an unusual degree of conformity with prescribed patterns of action’. In important respects, this is exactly what modern evaluative mechanisms do. They exert a pressure for conformity with prescribed patterns of action and to a considerable degree they depend on the calculability of results. With the decline in the all-encompassing state monopolies and bureaus and the separation of policy-making and implementation and purchasing and providing, expertise, loyalty, discipline and conformity to organisational norms and values no longer act as such powerful forms of control. In their place, are the surrogates for market values; targets, indicators, benchmarks and systems of regulation including audit, monitoring and evaluation to motivate service providers, manage performance and pursue value for money and adherence to contracts. State bureau may have been hollowed out, but bureaucratic mechanisms remain. The need for evaluation is prompted by four conditions associated with the new public management. First, that contracts or framework agreements between government and providers are incomplete: not every step, outcome and eventuality can be anticipated. The incompleteness of contracts implies a need for evaluative scrutiny that moves beyond assuring contractual compliance. Second, bounded rationality and asymmetric information mean that principals or purchasers cannot be certain about potential and actual performance of the agent or service provider: efﬁciency and effectiveness are very difﬁcult to judge at face value. Evaluation is a resource to be used to

8

NIGEL NORRIS AND SAVILLE KUSHNER

increase ‘intelligence’ and ‘even up’ the distribution of information. Third, opportunism and strategic action on the part of agents and service providers is seen as a risk and evaluation has the potential to increase surveillance thereby possibly reducing opportunism and strategic action. Evaluation is thought to increase or promote trust. Interestingly, Perrow (1986) argues that a major source of bias in agency theory is that when dealing with agent/ principal relationships it is invariably assumed that it is agents not principals that will act opportunistically. Independent evaluation ought to have a role to play in reducing the likelihood of all forms of opportunistic behaviour, but is invariably focused solely on the work of agents rather than all parties to the contract. Hannson (2006, p.159) has described the rapid growth of evaluations in organisations as the growth of a new ‘bureaucratic instrument’. An important way in which evaluation has become institutionalised is through the creation of internal evaluation units and the development of policies, procedures and standard contractual conditions under which evaluation is to be undertaken, especially when commissioned by government agencies (House, Haug, & Norris, 1996). Anyone familiar with the requests for evaluation proposals or invitations to tender for evaluation contracts in the United Kingdom will be aware that many provide an extensive methodological commentary on the desired design amounting to the pre-speciﬁcation of the preferred approach to evaluation. Under such circumstances evaluation is an extension of the commissioning agency’s organisation, it is an outsourced service controlled by contract whereby the processes and products of evaluation are determined in advance and closely monitored on a regular basis. At its most extreme, the evaluation policies of government agencies can specify that the approach to evaluation must be the randomised experiment, what House (2006) calls ‘methodological fundamentalism’. Other expressions of the bureaucratisation of evaluation include the development of quality indicators for qualitative evaluations, research governance frameworks and good practice guidelines from professional evaluation associations (Kushner, 2005). Some evaluation specialists will also be familiar with what may be premature demands of politicians and administrators for ‘hard’ evidence of the results and/or impact of social intervention programs. The political patience needed for interventions to be properly tried and tested is in short supply. The priority given to economic-type calculations of performance has probably contributed to a climate of unrealistic expectations or wishful thinking about the ease and speed with which outcomes can be detected and measured.

The New Public Management and Evaluation

9

THE POLITICAL ECONOMY OF EVALUATION When Senator Robert Kennedy insisted on inserting a mandatory evaluation requirement in to Title One of the Elementary and Secondary Education Act 1965, he did so as a way of ensuring that Federal dollars were spent on programmes for disadvantaged children, rather than on general aid for schools (Norris, 1990, p. 19). The impulse to know that public money is spent as intended and to good effect is unremarkable, but overemphasis on a concern for accountability and value for money has far-reaching consequences. The call for effectiveness, efﬁciency and accountability create an almost inexhaustible demand for evaluation or evaluative mechanisms. As Robert Merton (1936) noted many years ago, planned social action can have unanticipated and unintended consequences. Evaluation is no exception. It is often institutionalised with insufﬁcient consideration of social, political and economic consequences. It is too easily assumed to be a good thing irrespective of context. However, institutionalisation does not mean that evaluation is fully integrated, accepted or acceptable (Laubli Loud, 2004). Very few people want to be evaluated. Indeed, individuals and organisations that are subject to evaluation, especially impersonal systems of measurement, usually ﬁnd ways of presenting themselves and their work to highlight success and reduce the visibility of shortfall or failure. There are costs associated with the careless or routine use of evaluation – deception, impression management, the distortion of organisational goals, stress and cynicism. Institutionalised evaluation may occasion a decline in legitimacy through the erosion of evaluation independence. One of the risks associated with the evaluative systems of the new public management is that organisations take on two competing forms; one that conforms to the auditable requirements of the primary stakeholders and monitoring bodies and another that is the more usual operational culture. The discrepancy between the organisation as auditable object and the lived culture of the organisation can be a source of inefﬁciency and cynicism. Systems of evaluation that are tied to rewards and punishments, creating winners and losers, have the potential to reduce openness and honesty in favour of carefully selected and presented countenances. Stephen Ball’s analysis of performative regimes in education indicates some of social consequences of institutionalised evaluative systems. He notes two paradoxes of performative regimes (Ball, 2003, p. 225). First, that ‘organizational fabrications are a way of eluding or deﬂecting direct surveillance’ yet ‘the work of fabricating the organization requires submission to the rigours and the disciplines of competition’. At its most extreme, authentic

10

NIGEL NORRIS AND SAVILLE KUSHNER

and meaningful work is undermined and professional values narrowed by a vigorous emphasis on target-oriented performance. Second, Ball notes that the evaluative systems ‘which appear to make public sector organizations more transparent may actually result in making them more opaque, as representational artefacts are increasingly constructed with greater deliberation and sophistication’. Given such paradoxes it cannot be surprising where there is a resulting weakening of trust and an intensiﬁcation of anxiety within the public sphere. While theories of evaluation may emphasise its role in informing democracy and societal learning, the day-to-day practice of evaluation weaken the conditions for learning from cumulative experience. Where organisations providing public services are forced to compete in order to simulate the discipline of market forces any commitment to shared learning is overshadowed by considerations of market advantage. Competitive forces may be thought to act as a spur to change as organisations try to stay ahead of their rivals. But at the same time, such forces may act as a deterrent to social experimentation. It is likely that in the wake of evaluation organisations and individuals become more risk averse representing a threat to long-term innovation. ‘Quieting reform’ was the poignant phrase that Robert Stake used to convey one of the effects of evaluation on social action programmes. With its emphasis on explicit performance standards and standardisation the new public management tends to reduce the diversity of practice in favour of ofﬁcially sanctioned constructs of ‘best practice’. The application of market principles and the assumed disciplines of business values to public goods such as education and health are clearly not without economic costs. Public sector organisations operate in conditions that are far from the circumstances and forces that make markets work and thus these forces have to be artiﬁcially created and sustained. One of the largely unacknowledged weaknesses of new public management practices is the tendency to externalise costs, thus giving the illusion of efﬁciency. Evaluation is one example of externalised costs. Some of the cost of evaluative regulation has been incorporated into the work of street level bureaucrats, professionals, public sector organisations and publicly funded bodies. And, as Janice Gross Stein (2002, p. 130) has observed: What governments do may be more costly in public markets than what they do as managers. The information they require to create and regulate public markets in heath care and education is considerable: they have to provide ‘consumer’ protection when they contract for services, certify providers, and extensively monitor and evaluate results. All of these require active – and often expensive – government action.

The New Public Management and Evaluation

11

With the ascendancy of neo-liberal economic thinking in the late 1970s the public sector and its professionals were viewed as monopolistic suppliers motivated by self-interest. This change in policy-makers’ beliefs about the nature and motivation of those in the public sector has been graphically described by Le Grand (2003) as a shift from ‘knights’ to ‘knaves’; from altruistic and communitarian values to the pursuit of self-interest but disciplined by empowering citizens as consumers and applying market mechanisms. The icon of homo economicus appeals to liberal values of autonomy and individual freedom while at the same time legitimating the need for surveillance and controls to harness the pursuit of self-interest. The sociologist Gary Marx (2004) has deﬁned contemporary surveillance as ‘scrutiny through the use of technical means to extract or create personal or group data, whether from individuals or contexts’. The development of systems of surveillance has accelerated rapidly since the early 1980s (Norris, McCahill, & Woond, 2004; Stanton & Stam, 2003; Marx, 2002; Norris & Armstrong, 1998); a trend that was given added impetus by the events on 9/11. Surveillance is now a socially accepted part of everyday life. Despite Michael Power’s (1997, p. 123) suggestions to the contrary, as evaluation has been absorbed into government and governance arrangements it has become a form of surveillance.

THE SOCIAL RELATIONS OF EVALUATION There is a notable discrepancy between the ways in which program evaluation is being conceptualised and the ways in which evaluation has been institutionalised in the public sphere. In her presidential address to the American Evaluation Association, Donna Mertens (1999) discussed how evaluators can better represent the interests and values of marginalised groups without recourse to naı¨ ve advocacy and without ignoring issues related to social justice. She drew on ‘transformative theory’ to suggest how evaluation can be more inclusive and attend better to issues of disadvantage. Notwithstanding the rhetoric of impartiality, participation and inclusion in evaluation theory, inequalities of power and status predominate in the everyday social practice of evaluation. The social relations of the new public management construct sharp distinctions between the actors in public services: executive decisions-makers and their agents, purchasers and providers, customers and professional, evaluators and the evaluated. Mostly those subject to evaluation such as teachers have little or no say about the way in which it is done. Similarly, the intended beneﬁciaries of public services are

12

NIGEL NORRIS AND SAVILLE KUSHNER

largely uninvolved in setting the agenda for evaluation. The power to deﬁne quality and quantity standards, how oversight will be exercised, where attention will be focused, what will and will not be taken into account lies with executive decision makers, their agents and their lawyers. It is often a matter of contract or service level agreement, speciﬁed in advance and tied to rewards and penalties. Crucially, quality and performance standards are (as Larson in this volume shows) constitutive of the organisation in two respects: they shape the organisation by emphasising the selective importance of some things over others; they come to stand for the organisation and the relative status of its members. Insofar as performance standards are internalised they can take the place of professional autonomy in shaping the form and direction of the organisation. The power to set the agenda and monitor compliance, the power of the purse and the power of internalised norms and values in line with performance targets add up to commanding social forces; nevertheless they rest on a certain quiescence, an acceptance of legitimacy that cannot simply be taken for granted. The minimal reciprocity, respect and trust that is too often evident in the new public management and the forms of evaluation it has occasioned, are also constitutive of creative organisational and individual responses to its social order. Social relations in this context are often calculative and strained by inequalities of power and distrust of motives. Bureaucratic evaluation may create social solidarity in individual organisations and groups, (for example, a British University putting on a collective front in the face of institutional audit by the Quality Assurance Agency), but the quasi-markets favoured by the new public management are as likely to engender competitive social relations between service providers as they are to encourage partnership and co-operation. The social relations of institutionalised evaluation are too frequently lacking in care. Following Blustein (1991) we can distinguish four different uses of ‘care’: to care for, to have care of, to care about and to care that. Much of what is claimed for evaluation has too little regard for the ﬁrst three senses of ‘care’. Having care of people and caring for and about people get little mention in discussions about evaluation, instead the discourse of evaluation emphasises the technical and political. In a way, that is so easy to overlook, caring for and about people is at the heart of evaluation. Most evaluation is avowedly practical in its aims. Ostensibly, it is oriented towards the improvement of social life and well being; whether it is focused on school improvement, improving the coordination of services for young children or enhancing the student experience in universities, it concerns people and the quality of their social relations. However, the role of

The New Public Management and Evaluation

13

evaluation in the new public management is to provide the information that enables the social relations of public services to be viewed and managed as economic relationships. A primary role for evaluation is thus to hone social relationships and purposes so that they are more closely aligned to management objectives and to increasingly surplus value. The danger is all too familiar: ends and means become separated; means become ends in themselves and ends are thought to justify means. In education, the focus on test scores and other measurable results, for example, leads to a lack of attention to the felt needs of young people and sometimes to their basic needs as well. The desirable end of improved individual and aggregate performance can all too easily take precedence over other important values such as enjoyment of learning, equality, inclusiveness, respect and care. Too tight a focus on school performance keeps out of view the conditions in the family and wider community that contribute to the child’s well being and educational outcomes.

REALISTIC EVALUATION In recent years, a signiﬁcant turn (or more accurately return) in evaluation has been towards realism (Byng, Norman, & Redfern, 2005; Pawson, 2002; Pawson & Tilley, 2001; Tilley, 2000; Leeuw, Van Gils, & Kreft, 1999; Ho, 1999). There is much to commend the evaluation work of those following in the critical realist tradition of Donald Campbell and his colleagues; especially in their lack of methodological dogmatism and their creative approaches to evaluation design. Campbell’s elaborations of the ideals and logic of the experimenting society, an evolutionary learning society, call to mind a very different relationship between evaluation and citizens than that offered by the new public management. According to Campbell and Russo (1999, p. 17) the experimenting society would be an ‘accountable, challengeable, due-process society’; in Karl Popper’s terms an open society. In contrast, the new public management envisages not an experimenting society but an efﬁcient and competitive society, with evaluation as its overseer and information quartermaster. The call to realism in evaluation is, however, deﬁcient in an important respect. Little is known about the practice of evaluation (Henry & Mark, 2003). Studies of evaluation practice are rare. There are some notable exceptions – House, Glass, McLean, and Walker (1978), Norris (1995), House et al. (1996), Christie (2003) and Kushner (2000), for example. But, overall the practice of evaluation lacks critical scrutiny; it lacks reﬂexivity.

14

NIGEL NORRIS AND SAVILLE KUSHNER

To put it another way there is no realistic theory of evaluation; that is, one that looks at the contexts of evaluation, different evaluative mechanisms (i.e., theories in action) and the pattern of values and outcomes. In this chapter, we have explored some of the negative connotations and consequences of evaluation in the service of new forms of public management. We have been deliberately critical. The purpose, however, is to begin to develop a different kind of realism in evaluation than the one that is typically associated with ‘critical realists’ or theory-driven evaluation. C. Wright Mills (1959, p. 106) observed that ‘if social science is not autonomous, it cannot be a publicly responsible enterprise’. Much of the day-to-day practice of evaluation is far from autonomous, or, we suspect, publicly responsible.

REFERENCES Aberbach, J. D., & Christensen, T. (2005). Citizens and consumers. Public Management Review, 7(2), 225–245. Ball, S. (2003). The teacher’s soul and the terrors of performativity. Journal of Education Policy, 18(2), 215–228. Blustein, J. (1991). Care and commitment. Oxford: Oxford University Press. Box, R., Marshall, G., Reed, B., & Reed, C. (2001). New public management and substantive democracy. Public Administration Review, 61(5), 608–619. Byng, R., Norman, I., & Redfern, S. (2005). Using realistic evaluation to evaluate a practicelevel intervention to improve primary healthcare for patients with long-term mental illness. Evaluation, 11(1), 69–93. Callahan, R. (1962). Education and the cult of efficiency. Chicago: University of Chicago Press. Campbell, D. T., & Russo, J. (1999). Social experimentation. London: Sage. Christensen, T., & Laegreid, P. (2002). New public management: Puzzles of democracy and the inﬂuence of citizens. Journal of Political Philosophy, 10(3), 267–295. Christie, C. (2003). What guides evaluation? A study of how evaluation practice maps onto evaluation theory? New Directions for Evaluation, 97(Spring), 7–35. Clarke, J., & Newman, J. (1993). The right to manage: A second managerial revolution? Cultural Studies, 7(3), 427–441. Dunleavy, P., & Hood, C. (1994). From old public administration to new public management. Public Money & Management, July-September, 9–16. Ferlie, E., Ashburner, L., Fitzgerald, L., & Pettigrew, A. (1996). The new public management in action. Oxford: Oxford University Press. Giddens, A. (1991). Modernity and self identity. Cambridge: Polity Press. Gregory, R., & Christensen, J. G. (2004). Similar ends different means: Contractualism and civil service reform in Denmark and New Zealand. Governance: An International Journal of Policy, Administration, and Institutions, 17(1), 59–82. Hannson, F. (2006). Organizational use of evaluations governance and control in research evaluation. Evaluation, 12(2), 159–178.

The New Public Management and Evaluation

15

Henry, G., & Mark, M. (2003). Towards an agenda for research on evaluation. New Directions for Program Evaluation, Spring(97), 69–80. Ho, S. Y. (1999). Evaluating urban regeneration programmes in Britain. Evaluation, 5(4), 422–438. Hood, C. (1991). A public management for all seasons. Public Administration, 69(Spring), 3–19. Hood, C. (1995). The ‘‘New Public Management’’ in the 1980s: Variations on a theme. Accounting Organizations and Society, 20(2/3), 93–109. House, E. (2006). Democracy and evaluation. Evaluation, 12(1), 119–127. House, E., Haug, C., & Norris, N. (1996). Producing evaluations in a large bureaucracy. Evaluation, 2(2), 135–150. House, E., & Howe, K. (1999). Values in evaluation and social research. Thousand Oaks, CA: Sage. House, E. R., Glass, G. V., McLean, L. D., & Walker, D. F. (1978). No simple answer: Critique of the follow-through evaluation. Harvard Education Review, 48, 128–160. Kettl, D. F. (1997). Revolution in public management: Driving themes, missing links. Journal of Policy Analysis and Management, 16(3), 446–462. Kushner, S. (2000). Personalising Evaluation. London: Sage. Kushner, S. (2005). Qualitative control: A review of the framework for assessing qualitative evaluation. Evaluation, 11(1), 111–122. Lane, J. E. (2000). New public management. London: Routledge. Lane, J. E. (2001). From long-term to short-term contracting. Public Administration, 79(1), 29–47. Laubli Loud, M. (2004). Setting standards and providing guidelines – The means to what end. Evaluation, 10(2), 237–245. Le Grand, J. (2003). Motivation, agency and public policy of knights & knaves, pawns & queens. Oxford: Oxford University Press. Leeuw, F., Van Gils, G., & Kreft, C. (1999). Evaluating anti corruption initiatives underlying logic and mid-term impact of a World Bank program. Evaluation, 5(2), 194–219. MacDonald, B. (1976). Evaluation and the control of education. In: D. Tawney (Ed.), Curriculum evaluation today: Trends and implications. London: Macmillan. Marquand, D. (2004). Decline of the public. Cambridge: Policy Press. Marx, G. T. (2002). What’s New about the ‘‘New Surveillance’’? Classifying for Change and Continuity. Surveillance & Society, 1(1), 9–29. http://www.surveillance-and-society.org/ Marx, G. T. (2004). Surveillance & Society. http://web.mit.edu/gtmarx/www/surandsoc.html Mertens, D. (1999). Inclusive evaluation: Implications of transformative theory for evaluation. American Journal of Evaluation, 20(1), 1–14. Merton, R. K. (1936). The unanticipated consequences of purposive social action. American Sociological Review, 1(6), 894–904. Merton, R. K. (1940). Bureaucratic structure and personality. Social Forces, 18(4), 560–568. Mills, C. W. (1959). The sociological imagination. Oxford: Oxford University Press. Norris, C., & Armstrong, G. (1998). Introduction: Power and vision. In: C. Norris, J. Moran & G. Armstrong (Eds), Surveillance, closed circuit television and social control. Aldershot: Ashgate. Norris, C., McCahill, M., & Woond, D. (2004). The growth of CCTV: Global perspectives on the international diffusion of video surveillance in publicly accessible space. Surveillance & Society, 2(2/3), 110–135. http://www.surveillance-and-society.org/index.htm Norris, N. (1990). Understanding educational evaluation. London: Kogan Page. Norris, N. (1995). Contracts, control and evaluation. Journal of Education Policy, 10(3), 217–285.

16

NIGEL NORRIS AND SAVILLE KUSHNER

Pawson, R. (2002). Evidence-based policy: The promise of realist synthesis. Evaluation, 8(3), 340–358. Pawson, R., & Tilley, N. (2001). Realistic evaluation bloodlines. American Journal of Evaluation, 22(3), 317–324. Perrow, C. (1986). Economic theories of organization. Theory and Society, 15(1/2), 11–45. Pollitt, C. (2006). Performance management in practice: A comparative study of executive agencies. Journal of Public Administration, 16(1), 25–44. Pollitt, C., & Bouckaert, G. (2000). Public management reforms: A comparative analysis. Oxford: Oxford University Press. Power, M. (1997). The audit society. Oxford: Oxford University Press. Purdue, D. (2005). Performance management for community empowerment networks. Public Money & Management, April, 123–130. Segerholm, C. (2003). Researching evaluation in national (state) politics and administration: A critical approach. American Journal of Evaluation, 24(3), 353–372. Stanton, J., & Stam, K. (2003). Information technology, privacy and power within organizations: A view from boundary theory and social exchange perspectives. Surveillance & Society, 1(2), 152–190. http://www.surveillance-and-society.org/index.htm Stein, J. G. (2002). The cult of efficiency. Toronto: Anansi Press. Tilley, N. (2000). Realistic Evaluation: An Overview. Paper presented at the founding conference of the Danish Evaluation Society, September 2000. http://www. danskevalueringsselskab.dk/Materiale_fra_DES.asp Van Berkel, R., & Van der Aa, P. (2005). The marketization of activation services: A modern panacea? Some lessons from the Dutch experience. Journal of European Social Policy, 15(4), 329–343. Weber, M. (1947). The theory of social and economic organization. New York: Free Press. Part 1 of Max Weber’s Wirtschaft und Gesellschaft, translated by A. M. Henderson and Talcott Parsons Weber, M. (1968). Economy and Society. New York: Bedminster Press. A translation of Wirstschaft und Gesellschaft. Grundriss der Verstehenden Soziologie by Fischoff et al. and edited by Guenther Roth and Claus Wittich. Williamson, O. E. (1981). The economics of organization: The transaction cost approach. American Journal of Sociology, 87(3), 548–577.

CONSTITUTIVE EFFECTS OF PERFORMANCE INDICATOR SYSTEMS Peter Dahler-Larsen INTRODUCTION Evaluation in general and performance indicator systems in particular play an increasing role in society. We do not have a long historical set of experiences which helps us understand what exactly happens when, say, performance data for schools are made public on the internet and in news, because the emerging rules of the game in what some observers call the ‘‘knowledge society’’ (Stehr, 1994, 2001) and ‘‘reﬂexive modernization’’ (Beck, 1997a, 1997b) have inaugurated new relations between evaluation and performance data on the one hand and political, organizational and practical realities on the other. More speciﬁcally, the social and political signiﬁcance of evaluation and performance data increases due to the following changes. First, the ideology of new public management (NPM) has advanced the idea that the ongoing performance measurement of public institutions should constitute an important input into political decision making and resource allocation. This ideology translates into techniques which are designed to enhance the rationality and accountability of each institution. As a corollary, every

Dilemmas of Engagement: Evaluation and the New Public Management Advances in Program Evaluation, Volume 10, 17–35 Copyright r 2007 by Elsevier Ltd. All rights of reproduction in any form reserved ISSN: 1474-7863/doi:10.1016/S1474-7863(07)10002-8

17

18

PETER DAHLER-LARSEN

teacher and every nurse should behave with the awareness of being under organizational scrutiny. Second, at the same time, this and other ‘‘recipes for good and modern organization’’ (Røvik, 1998) suggest that evaluations should no longer be stand-alone activities, but should be organizationally mainstreamed and integrated in the ongoing organizational processes of planning, decision making, strategy making, learning and development. This discourse suggests ways to amplify and multiply the effects of performance measurements as these are carried out more systematically and routinely than before. Managerial and legal regimes help integrate evaluation technology into an institutional order. Third, the public arena and the speciﬁc rules in this arena are increasingly important for the social representation of evaluative information and its implication. The public arena not only makes performance data available, for example, in the form of user-oriented data on the internet but also interprets, edits and presents evaluative data (Boyle, Breul, & Dahler-Larsen, 2006) in the form of league tables or headline news when international comparisons of national school data are presented with reference to dramatic imagery and metaphors from sport or war (Stronach, 1999). The public arena enacts societal games about the consequences of evaluation, not only in terms of direct political decisions, but also in terms of regimes which produce and attribute societal and institutional risks (Rothstein, Huber, & Gaskell, 2006) and distribute blame (Hood, 2002). As a result, performance data become more visible, more widespread, more controversial and more ripe with socially productive power and political signiﬁcance. A higher number of stakeholders in capacities such as ‘‘regulators’’, ‘‘politicians’’, ‘‘users’’, ‘‘professionals’’, ‘‘citizens’’, etc. are likely to be affected by or even deﬁned by performance data. In our attempts to understand evaluation processes (including the ‘‘use’’ of evaluation) which may be long, complex and nonlinear, we cannot trust, at face value, the rhetoric undergirding new public management. Keywords such as ‘‘transparency’’, ‘‘visibility’’, ‘‘documentation’’ and ‘‘measurement’’ are promoted along with indicators, but not a great deal is mentioned about their wider practical and social consequences. Anecdotal evidence and scattered observations suggest that there are unconventional and perhaps surprising effects of performance indicators. Consider the following observations: Example 1. Waiting times are long for incoming patients in emergency rooms. An indicator system is set up which monitors the time from

Constitutive Effects of Performance Indicator Systems

19

a patient enters the emergency room until he or she ﬁrst meets a member of the staff. As a result, the hospitals hire ‘‘hello nurses’’ who immediately approach incoming patients with a greeting and say, ‘‘Hello, my name is Elizabeth’’. Not much else is changed. Example 2. Hospitals make a contract with top administrators according to which waiting lists shall be reduced in return for increased budgets. An indicator for ‘‘waiting lists’’ is constructed. It measures the number of days patients wait from their ﬁrst consultation at the hospital to their actual hospitalization. Hospitals manage to reduce waiting lists dramatically. They do so by reducing the number of patients accepted for ﬁrst consultation. Example 3. Schools are compared with respect to average grades in league tables without statistical control for socioeconomic and other differences between the pupils. This indicator of school quality is criticized for low validity by school professionals and statistical experts. Nevertheless, one of its effects may be to make it easier for some parents to identify and choose schools where well-scoring pupils attend regardless of whether the schools are statistically responsible for the good grades. The publication of these data may thus lead to increased social segmentation among schools. Based on observations like these, the basic proposition in this chapter is that performance monitoring systems are not only ways of measuring and seeing, but should be fundamentally understood as ways of doing something. This chapter examines the constitutive effects1 of indicator systems. Constitutive effects exemplify what researchers are beginning to identify as a ‘‘performance paradox’’ (van Thiel & Leeuw, 2002); an umbrella term for situations where more measurement of quality may lead to everything else but better quality. Constitutive effects cover the many subtle and not so subtle ways in which performance indicator systems guide particular values, orientations, interpretations and practices in the direction of some constructions of social reality rather than others. Constitutive effects include reactions to evaluation. I seek to demonstrate the relevance of the concept, ﬁrst by demonstrating the limitations of some conventional views on the effects of indicator systems. Then I will outline my conceptual perspective with reference to literature on the role of knowledge in contemporary society. Next, I will carve out speciﬁc variations of constitutive effects with illustrative examples. Finally, I will discuss implications for evaluators.

20

PETER DAHLER-LARSEN

For linguistic variation and theoretical inspiration I shall sometimes refer to ‘‘evaluation’’, ‘‘evaluative knowledge’’ or broadly theories of ‘‘knowledge’’, although my thematic focus and most of my examples speciﬁcally concern effects of performance indicator systems, i.e., institutionalized production of quantiﬁed and standard-based measures of some deﬁnition of quality to be used for political, managerial and administrative purposes. It is left to the reader to consider whether various forms of evaluation have similar effects.

THE LIMITATIONS OF CONVENTIONAL CRITIQUES OF EFFECTS OF PERFORMANCE INDICATORS Because of the centrality of ‘‘use’’ to the deﬁnition of evaluation (Vedung, 1997), the quantitative and qualitative variations of use are of immense interest to evaluators. Controversy over use has been the most important driving force behind the development of the ﬁeld (Hellstern, 1986). However, the repertoire of concepts describing use is altogether unclear (Mark & Henry, 2002), murky and paradoxical. An increasing number of categories of use have occurred, such as accountability, learning, conceptual, strategic, tactical, symbolic and process use, but these are rarely strictly deﬁned, they relate to different aspects of evaluation and to different aspects of outcomes, and they are partly overlapping in unclear ways (Mark & Henry, 2002). Despite this ‘‘overgrown’’ typology, many evaluators still have a blind spot concerning the many uses of evaluation which are not planned, and not instrumental, because the very deﬁnition of evaluation often promises ‘‘use’’ or ‘‘good use’’ (Mark & Henry, 2002). This dependence of this idea on a tacit normative framework is not analytically satisfactory, especially if it inhibits a fuller understanding of the actual rather than promised effects of evaluation. When evaluators take positions in relation to potential unconventional, unforeseen and perhaps complex effects of indicator systems, they sometimes unthinkingly categorize new phenomena into old categorical boxes. I shall therefore deal at some length with the inadequacies of conventional typiﬁcations of such effects. For example, some suggest that there are, under some circumstances, ‘‘diseases’’ in indicator systems (Bouckaert & Balk, 1991). This terminology assumes, alternatively, ‘‘healthy’’ ways of measuring quality. This medical/biological metaphor is analytically inadequate. It is not evident what ‘‘healthy’’ evaluation might mean. A similar argument can be made about ‘‘dysfunctional consequences of performance measurements’’ which is also the contrary of a functionalist utopia where uncontroversial

Constitutive Effects of Performance Indicator Systems

21

‘‘goals of the organization’’ (Ridgway, 1956) are supposedly unreﬂectedly implemented. The distinction between destructive and constructive uses of performance indicators (Munro, 2004) is also based on a similarly opposite, taken-for-granted and truly implicit positive norm. The same is true for ‘‘perverse effects’’ (Munro, 2004). Others argue that performance indicators can be ‘‘misused’’ (Perrin, 1998). Again, the distinction between use and misuse is based on a subjective, normative framework which is often not made clear. In strict analysis, however, one person’s use can be another person’s misuse (Shulha & Cousins, 1997). Furthermore, applying derogatory terms to certain consequences does not help us understand these consequences and explain their origins. In the same family of judgmental thinking there is a critique, which suggests that performance indicators are expressions of a particular neoliberal ideology, which correspond to the interests or ‘‘hidden agendas’’ of powerful decision makers. Of course, there are ideological overtones and power differentials in evaluation, and hidden agendas do exist. However, it is one thing to identify these phenomena on the one hand and another thing to empirically demonstrate that all outcomes of particular forms of evaluation can be explained with reference to these ‘‘hidden agendas’’. Such an explanation would assume a uniformity and linearity in social and political processes which is inconsistent with more complex paradigms held by those who actually study such processes (March & Olsen, 1976; Beck, 1997a, 1997b; Rothstein et al., 2006). Interestingly enough, perhaps the most frequent critique of performance indicator systems maintain that these systems have many unintended effects (McBriarty, 1988; Courtney, Needell, & Wulczyn, 2004; Smith, 1985) – a logic almost the opposite of the above ideological stance. Instead, the critical point is not so much that the effects are inherently despicable, because unintended effects may in fact be both positive and negative. Instead, the observation is that indicator systems have ramiﬁcations which are not and often cannot be foreseen by their architects. Unintended effects is a classical theme in sociology and organization, and given the complex nature of the contemporary social, organizational and political contexts in which indicator systems operate, the unintended effects of indicators are likely to be many and diverse. Although this hypothesis is charming and seems to be consistent with practical experiences, it might be built on conceptual quicksand. Does unintended mean simply ‘‘not intended’’ or ‘‘counterintentional’’? If unintended is the logical opposite of intended, which is exactly the set of intentions which an observer applies as an analytical standard?

22

PETER DAHLER-LARSEN

Assume the analyst identiﬁes with the architect of a particular indicator system, or with the ‘‘principal’’ in a principal-agent model of public management (Smith, 1995, p. 283). Do only the intentions of that architect at a given point in time (before the implementation of the indicator system) count? If yes, are these intentions empirically mapped or simply assumed? If empirically mapped, how are unofﬁcial intentions registered? If assumed, what are the theoretical assumptions about the intentions of the policy architect and his or her degree of rationality? If the architect is a political– organizational machinery rather than a speciﬁc person, do the same assumptions hold? How clearly do we expect the architect to omnisciently predict all effects of his/her construction in order to say the effects are ‘‘intended’’? If we lift the restriction in time, do we allow the architect of a performance indicator system to learn over time and develop new preferences as his/her experiences with the indicator system evolve? If we lift the restriction on who holds intentions, whose intentions count? Perhaps the most fundamental problem with the idea of unintended effects, however, is that it is practically impossible to empirically determine a certain effect as ‘‘un-intended’’. This would require a knowledge about intentions which is out of reach. Consider Example 3 above. Suppose parents use data with poor validity to choose schools for their children. Suppose the indicators make it easier for parents to make this choice. Perhaps it is consistent with the preferences of socioeconomically privileged people to be able to identify schools where other socioeconomically privileged people send their children, too. Can an analyst guarantee that this effect was intended? Or unintended? Why should an analytical observer remove the tension, the controversy and the reality from this situation just to be placing it totally on either side of this distinction? Both the concept of ‘‘intended’’ effects and ‘‘unintended effects’’ assume a distinction between planning and outcome where ideas and actions are clearly separated. That may not be how the world works. Let me illustrate this point with reference to what van Thiel and Leeuw (2002) call ‘‘measure ﬁxation’’. Measure ﬁxation happens when practitioners focus on exactly what is being measured as an indicator of quality, often at the expense of genuine quality. This may be termed a ‘‘trivial’’ form of measure ﬁxation because it still operates with a sense of ‘‘genuine quality’’ from which the measure ﬁxation deviates. In this sense, measure ﬁxation is ‘‘unintended’’. In contradistinction, under advanced measure ﬁxation, the indicator provides a deﬁnition of quality along with an indicator of how to measure quality. For example: ‘‘intelligence is what we measure in

Constitutive Effects of Performance Indicator Systems

23

an intelligence test’’. Or: ‘‘Nobody knows what school quality is, but in our county we think that good test scores constitute an important aspect of school quality’’. With advanced measure ﬁxation it is not possible to demonstrate a cleavage between genuine quality and quality measured by an indicator, since the latter helps deﬁne the former. This is a constitutive effect which should be taken seriously and which is not properly understood when categorized as ‘‘pathological’’ or ‘‘unintended’’. The trick is that the while the indicator is socially constructed, the phenomenon it is supposed to measure is also socially constructed. A critique of the measure, which holds the phenomenon constant as something we all know and agree on or rationally intend, is therefore insufﬁcient. In addition, it is generally inadequate to capture such an interesting phenomenon only as constitutive effects and only in terms of what they are not. Constitutive effects deserve to be studied in their own right in terms of what they are. To understand the socially constructed nature of effects of performance indicator systems, references to the literature on the role of knowledge in contemporary society will be useful.

KNOWLEDGE AS SOCIALLY PRODUCTIVE A constitutive perspective on knowledge suggests that knowledge is not an essence, but a social accomplishment. The same is true for the effects of knowledge. Knowledge is ‘‘open-ended’’ in the sense that its ‘‘use’’ is not an inherent property, but depends on the articulation, representation and appropriation of knowledge in particular contexts (Woolgar, 2004). Effects thus mean social accomplishments rather than causal outcomes. Stehr (1994, p. 95) deﬁnes knowledge as a capacity for social action. He emphasizes the socially productive role of knowledge in contemporary society with regard to not only ‘‘the appropriation of appropriated nature’’ (p. 103) but also to the organization of the social order (rather than the material interaction with nature). Brieﬂy stated, under modernity knowledge shapes social relations. However, he immediately points out that knowledge as a capacity for action often assumes a bureaucratic or otherwise smoothened social order with linear structures which are ‘‘prepared for data processing’’ (p. 103). This assumption is, of course, often out of place. Therefore, knowledge rarely produces opportunities for perfect planning of a particular social intervention. Instead, the typical result is an increasingly fragile social order (Stehr, 2001; Giddens, 1990, p. 45). This means an order which

24

PETER DAHLER-LARSEN

is constantly ripe for change due to its ongoing integration and interaction with knowledge. Much in the same spirit, Giddens (1994) explains how social practices are described and transformed in the light of incoming knowledge. Formal political and administrative systems, as well as a number of informal reﬂexive processes, operate with a mutual interplay between knowledge, data and social relations. Modernity is already fundamentally reﬂexive and sociological, to paraphrase Giddens (1994, p. 43). A number of factors help explain why the transmission of knowledge into changing social relations is far from linear and straightforward. The ﬁrst is unequal distributions of power. Another factor is the vast inﬂuence from backﬁring consequences of earlier applications of knowledge. Another is that changes in values follow other logics than changes in the systems which produce knowledge. Therefore streams of knowledge are confronted with value changes with which they are never fully synchronized. Finally, as a sort of synthesizing point, the reﬂexivity of modern social relations is itself a nonlinear and thus de-stabilizing factor (Giddens, 1994, pp. 44–45). While systematically produced knowledge could earlier play a social role similar to that of traditional religion with respect to authority and certainty, uncertainties in knowledge are becoming clearer today. In today’s knowledge society, it also becomes obvious that each new piece of knowledge does not always reduce ignorance and exclude alternative views. Instead, there are complex social forces seeking control over different knowledge productions, and knowledge is not only accumulated to approximate truth. Some pieces of knowledge are partly contradictory and suggesting or recommending different social pathways. Different knowledge-producing systems on societal or institutional levels (such as governments and schools) interact in spiralling regulatory logics which paradoxically produce more risk and more need for control (Rothstein et al., 2006). Applying these general observations about knowledge and fragile social orders to evaluation processes, it becomes clear that classical notions of use of evaluative knowledge operate with very restricted assumptions. Instrumental use is a feed back from evaluation which relates selectively to only one particular aspect of social reality, i.e., the one which has to do with decisions to improve the quality of the object under evaluation. The reﬂexivity and social productivity of knowledge suggested by Giddens and Stehr indicate a number of other possible options. People under evaluation are not only objects but also subjects who construct their own knowledge about the evaluation, its process and results. The following section hypothesizes about ways in which knowledge from and in evaluation reﬂexively feeds back into

Constitutive Effects of Performance Indicator Systems

25

social reality in a number of ways. These will be illustrated with examples of speciﬁc, constitutive effects.

MANIFESTATIONS OF CONSTITUTIVE EFFECTS Indicators Define Interpretive Frames and World Views In a very fundamental sense, indicators are, like evaluation in general, a form of ‘‘assisted sense-making’’ (Mark, Henry, & Julnes, 2000). They offer interpretive keys which draw attention, deﬁne discourse and orient actions in certain directions. Their most basic function is not to verify what goes on in a particular area of activity, but to construct a deﬁnition of that activity so that it can be veriﬁed. An illustrative example is a survey instrument which helps managers check how their families evaluate their presence in the family (Hochshild, 2004). Although presented as a progressive tool to remind managers of the importance of family life, the survey also helps constitute family life as something reduced to a limited number of rationalized, quantiﬁable and manageable dimensions. It also introduces the idea that the focal person can, will or should change his family behavior with the intention of inﬂuencing his or her survey scores. Evaluative knowledge may thus colonize areas of social life which were otherwise not prone to such explicit discoursive speciﬁcation and quantiﬁcation of its inherent value. A similar constitutive, transformative (and thus threatening) effect of evaluative knowledge upon the idea and self-understanding of communities is discussed by Schwandt and Dahler-Larsen (in press). A very important political aspect of indicators rests not with measurement or veriﬁcation, but with the very deﬁnition of dimensions of what should be registered. Indicators are Constitutive of ‘‘Content’’ Indicators deﬁne what is central in work. Evaluation criteria help determine what actors should strive to accomplish in a given activity. Evaluation sometimes forces otherwise more intuitive and implicit forms of practice to formulate explicit criteria of success (Munro, 2004, p. 1082). As evaluation is the occasion to do this for the ﬁrst time in many areas of activity, evaluation has a constitutive effect on deﬁning the central explicit dimensions of work activities.

26

PETER DAHLER-LARSEN

When testing becomes a widespread social practice, it may trigger a lot of remarkable reaction patterns among teachers and pupils due to its capacity to deﬁne what counts in successful teaching. A concept such as ‘‘teaching to the test’’ describes some of these reactions. As an example a pre-test is given in which the teacher places many of the questions that he/she knows will be in the real test. After the pre-test the pupils have the possibility of seeing the correct answers. In the real test, the students’ ability to remember the answers to the questions given in the pre-test is then rewarded. Thus teaching is about how to learn to pass tests. Explicit criteria sometimes invites ‘‘short cuts’’ and gaming in order to score better with regard to established indicators (Courtney et al., 2004) (see also Example 2 above). In fact, the more political and organizational importance is invested in a particular indicator of the central aspect of work, the more the indicator is prone to be the object of complex social processes which sometimes threaten the validity and reliability of the indicator, making it more difﬁcult to understand what the indicator actually means (Vulliamy & Webb, 2001). The political and organizational investments in the indicator may even make it more difﬁcult to openly discuss its interpretation. When important, indicators sometimes evoke a range of irrational activities aiming solely to affect the scores on indicators, such as skimming or creaming (Fountain, 2001, p. 66) but not aiming at the quality in the performance itself. However, only if it is known what quality is can particular indicators be criticized for low validity or ‘‘corruption’’ of practice. In an earlier section, this was referred to as trivial measure ﬁxation. On the other hand, with advanced measure ﬁxation, test scores become a socially privileged form of knowledge and help establish socially dominating deﬁnitions of quality. Under such circumstances, pre-tests with incorporated test questions and teaching how to pass tests might be a strategically meaningful way to improve school quality as it is deﬁned. If the reduction of a waiting list is deﬁned exactly as in Example 2 above, then actors in the evaluated system were not corrupt but took this speciﬁc evaluation criterion very seriously. In this light, evaluation criteria are templates for knowledge which count as political communication about what is desirable within an area of work and which are highly socially productive. Perhaps politicians, managers, consultants and administrators do not comprehend the full political implications of their sometimes careless deﬁnition of some performance indicators. They may get what they want without understanding the ramiﬁcations. In fact, evaluation criteria may become goals in themselves. They may attract all attention to their own icons or symbols of quality and deﬁne what

Constitutive Effects of Performance Indicator Systems

27

is not measured as non-existing – so-called tunnel vision (van Thiel & Leeuw, 2002). ‘‘Quality’’ is perhaps an ‘‘essentially contested concept’’ (Gallie, 1956) open for ongoing social construction, but when a given indicator system operationalizes quality in a certain way, operationalization itself takes the place of a socially valid deﬁnition. For example, in Denmark in these years many schools focus on so-called ‘‘soft qualiﬁcations’’ and ‘‘social competence’’ along with the easily and conventionally quantiﬁed measures of quality, but these new concepts are difﬁcult to get hold on. As strategies and techniques for the operationalization and measurement of ‘‘soft qualiﬁcations’’ emerge, practical deﬁnitions of this phenomenon become available and start to circulate in the ﬁeld. An apparently methodical/technical operationalization, i.e., the choice of an indicator, is thus an essentially interpretive action with socially productive implications. It cannot be criticized for low validity to the extent that it co-deﬁnes the conceptual deﬁnition of what it intends to measure at the same time as it measures it. Some indicators have more cultural and institutional back-up than others. Indicators which redeﬁne the value of plublic activities in the light of customer satisfaction seem to be in line with contemporary expectations. In fact, performance indicator systems emphasizing consumer values may be one of the most important mechanisms for importing market discourse into public organizations, despite their undermining effect on civic virtues (Fountain, 2001, p. 71). Indicators are Constitutive of Social Identities and Relations Indicators stipulate social identities and their mutual relations. For each social category deﬁned by indicator systems, norms are often set in a standardized way. A simple but powerful constitutive effect ﬂows from a comparison of indicators across social units such as institutions, groups or individuals. While good standards are inherently difﬁcult to set based on logic argument (Stake, 2004), some indicator systems simply let the mathematical average set the norm against which individual units are compared. Although matematically simple and philosophically unsophisticated, this approach has remarkable social repercussions. This form of comparison implies not only that the identities of the different units are of the same nature, but also that they stand in a competitive relationship which extends so far as to install in each of them an interest in lowering the scores of the competitors because the average deﬁnes the standard. In a performance appraisal of personnel, a forced distribution of ratings had

28

PETER DAHLER-LARSEN

severe motivational effects on that half of the staff which were categorized as average or below (McBriarty, 1988). There are thus interesting constitutive links between the properties of measurement scales and social identities and relations. When a quantitative comparison is made, differences in scores emerge in ways which depend on the nature of the measurement. From the mathematician Mandelbrot it is known that with increasing sensitivity of measurement systems, more ﬁne-grained differences will be found (Bouckaert & Balk, 1991; van Thiel & Leeuw, 2002). Such differences often command political, managerial and professional attention. In this way, performance indicator systems help constitute the problem or the risk they are supposed to monitor (Rothstein et al., 2006), and in this light they reconﬁgure the relations between the involved social actors. Indicator systems are often followed by recommendations about steps which each unit can take to improve quality, thus paving the way for an installment of a set of common steering techniques among the units (Rose, 2003). As Foucault (1991) suggests, evaluative discourse paves the way for strategies and techniques that allow for self-deﬁnition and self-regulation of subjects. Indicators help constitute the involved actors in particular capacities such as ‘‘patients’’, ‘‘clients’’ or ‘‘consumers’’. Indicators of ‘‘consumer satisfaction’’ may tacitly promote the identity of a ‘‘consumer’’ in higher education at the expense of other identities such as ‘‘student’’ (Cheney, McMillan, & Schwartzman, 1997). The apparent freedom of the consumer is bought at a price. The consumer does not have any responsibility for the formation of policies. The consumer is also the object of a range of control – socialization – and normalizing mechanisms (Bauman, 1983) and his or her preferences are often kept under strict quantitative control in performance indicator systems. Perhaps as a reciprocal role to the consumer, indicator systems also inﬂuence the very deﬁnition of a professional if not the reduction of the role of professional. More often than not, standardized indicators reduce professional discretion in teaching practices. The balance between professionals and managers may shift as many indicator systems emphasize how organizations internally control themselves rather than make direct examination of the practice itself (Munro, 2004, p. 1079). In fact, indicators based on customer satisfaction can often not be understood on the basis of a substantial insight into the characteristics of activities, but instead make sense as a strategic alliance between managers and customers at the expense of professionals.

Constitutive Effects of Performance Indicator Systems

29

The colleagial relations between professionals such as teachers are affected to the extent that the indicator system cluster some teachers (in schools or groups) and set the clusters up against each other. The effects of performanced indicators on professional self-understanding, commitment and morale is perhaps one of the most critical issues. Indicators are Constitutive of Time Frames The constitutive effect of an indicator system does not only manifest itself in the deﬁnition of the content of the work but does also establish time frames within which the outcomes of an activity are expected to manifest themselves. Indicators with narrow time frames may privilege some activities at the expense of others. Testable qualiﬁcations may defeat ‘‘Bildung’’, and fast cure may defeat long-term prevention. In a drama, the ‘‘content’’, the ‘‘timing’’ and the ‘‘characters’’ are related. The same is true for the corresponding types of constitutive effects of performance indicators. Højlund and la Cour (2001) demonstrate how digital time measurement of care for the elderly fundamentally changes both the content of the services and the relations between the caretaker and the recipient. In a similar vein, we have already noted how time frames, deﬁning what the ‘‘normal child’’ is supposed to achieve at a certain age, co-produce a distinction between ‘‘normal’’ and ‘‘underachieving children’’. When consumer satisfaction is introduced as an evaluation criterion in higher education, the idea of instant gratiﬁcation often comes with it, and together these factors tend to redeﬁne the role of the teacher towards students. Time is often a privileged dimension to measure, because it is general and abstract (like money). For this reason, indicator systems often transform more substantive dimensions of quality into measures of the speed of delivery (Munro, 2004, p. 1080), but as we have seen, this move has washback effects on the content of work and on social relations. Constitutive Effects May be Displaced Over Time and Across Levels of Analysis Test cultures in educational institutions may have broad and large-scale effects hitting back on wider cultures and mentalities in a society. An interesting study of examination systems in China has demonstrated century-long effects on culture and social structure. Particular examination criteria also trapped the whole country into a particular educational strategy

30

PETER DAHLER-LARSEN

which over the years proved to be competitively unproductive (Suen, 2003). Test systems and indicator regimes have so-called ‘‘washback-effects’’ (Bailey, 1996) on curricula, educational planning and teaching practices. Among these effects one might also count new political and institutional arrangements. For example, in Denmark, an international wave of educational tests paved the way for the establishment of a national evaluation centre. Over the years, this center expanded its tasks and became a signiﬁcant player in the ﬁeld of education as such. Recently, regimes describing the publication of school data were legally introduced, and regimes to measure the quality of individual teachers were put on the drawing table by a think tank.

DO CONSTITUTIVE EFFECTS OCCUR DETERMINISTICALLY? At a presentation I gave, some doctors said that publication of indicators for individual doctors would lead the profession to cheat with the reporting of data, I replied that cheating could not be assumed to happen automatically. Cheating had to depend on the values and ethics of the profession and its members. They could not disagree. Some critics of indicator systems assume that these systems have deterministic negative effects. Their underlying ontology is surprisingly similar to performance indicator gurus who promise automatic positive effects of these systems. None of these assumptions, however, are consistent with the socially constructivist view on knowledge advocated in this chapter. According to this view, the effects of indicator systems are complex and depend on interpretations, relations and contexts. In a Foucaultian perspective, discourses, strategies and techniques deﬁnitely pave the way for some social constructions rather than other, but they are not fully imposed and implemented upon the social body as a whole. Techniques and their effects are of different orders (Gordon, 1980, p. 246). Why do constitutive effects occur more strongly in some contexts than in others? A laundry list of factors one might study further comprises the following. First, performance indicator systems are backed up by varying degrees of institutionalization. Some are normatively recommended, while others are supported by legal frameworks and/or ﬁnancial incentives. Voluntary innovations are adopted with more enthusiasm in organizations (Scott, 1987), but compulsory institutional ones are backed by tougher sanctions. Binding mechanisms without normative support may lead

Constitutive Effects of Performance Indicator Systems

31

to organizational hypocrisy, i.e., a discrepancy between talk and action, between external procedures and internal practices. Each organization in each context may strike a particular negotiated compromise between the ofﬁcial rhetoric about performance indicators and their actual colonization of organizational practices. Second, performance indicator systems operate at different levels of analysis. Some studies suggest that indicators focussing on individuals rather than groups or organizations have a stronger behavioral impact, perhaps at least in individualistic cultures. Third, publication of data may enforce the constitutive effects of performance indicators. However, it makes a difference in whose name the data are published (Andersen & Dahler-Larsen, 2006). Furthermore, the nature and quality of a public debate about the data and their implications should make a difference. Fourth, no performance indicator system stands alone. Many public agencies seek to promote documentation and transparency through evaluation, each from their own perspective (Sahlin-Andersson, 2004). Control of public institutions is often redundant, overlapping and partly confusing. Along with formal systems, a number of informal norms, professional values, conscience and trust regulate public institutions to varying degrees in various contexts. Paradoxically, what is from the perspective of new public management an optimal mix of these factors (strong incentives, individual data, publication and an exclusive focus on indicators) might exactly be those conditions which produce the most intensive constitutive effects of indicator systems. Alternatively, the reason why some performance indicator systems sometimes do not create strong constitutive effects may be that some of the conditions are relaxed compared to what is prescribed by NPM. In some countries, NPM regimes are introduced with a weak or missing link to ﬁnancial incentives. Professional ethics, conscience and traditions, may prevent NPM regimes from having far-reaching constitutive effects. However, paradoxically, the ideology NPM does not recognize the importance of such ‘‘soft’’ values.

CONCLUSION Constitutive effects of performance indicators exist. They are complex, relational and contextual. They emerge through deﬁnitions, distinctions and measurements which enhance some social constructions of reality rather

32

PETER DAHLER-LARSEN

than others. Indicators do not communicate only about an activity, but also meta-communicate about the identities and roles of human beings and their relations. Indicator systems produce socially relevant labels which ‘‘stick’’ to practices and people, and which help organize social interaction in particular ways. Indicators are ‘‘societing’’ (Woolgar, 2004, p. 454). Indicators are a way of making politics. However, politics can be regarded as an everpresent aspect of evaluation (Karlsson, Vestman, & Conner, 2006). It is not a ‘‘dark side of life’’ which should be ‘‘expelled’’ from the ‘‘rational, well-planned and well-intended’’ noble art of evaluation. To categorize constitutive effects of evaluation as unintended, negative, pathologic and dysfunctional is to assume a technical, pure and unrealistic model of how knowledge can be applied to social realities. There is an open end and non-determinism in the concept of constitutive effects. It suggests that the ﬁnality of evaluation is not determined by the data as such nor by intentions behind evaluations, but by complex processes of social construction. This view suggests a number of important tasks for evaluators and evaluation researchers who wish to maintain and contribute to a sense of social responsibility. An important task is to study the constitutive effects of performance indicator systems in practice. Compared to the widespread beliefs in the value of performance indicator systems as political and administrative regimes, relatively little is known about their extrinsic impact on social realities. While this chapter, like Modell (2004), suggests that performance measurement does not work according to its own myth, much remains to be known about what else is happening. It would be helpful to know more about the socially constructed nature of the validity of indicators. Indicators can be developed with more or less advanced substantial understanding of the relationships between policy, programs and outcomes in particular ﬁelds (Courtney et al., 2004). The reﬂexive nature of the relation between indicators and professional practices should be taken into account. It would also be informative to map some of the washback effects of indicator systems on larger cultural, organizational and democratic systems. Of special interest is how knowledge ﬂowing from indicator systems is handled in media discourse in the public arena. Of special interest is also the extent to which performance indicators reﬂect concerns and interests of citizens and the society at large, although none of these of course constitute a golden standard, since they are debatable in democratic discourse. For all these reasons, it is important that evaluation researchers (or at least some of them) remain in the public arena. Without too much reductionism and simplication it is fair to say that some evaluators have

Constitutive Effects of Performance Indicator Systems

33

responded to what they perceive as unfair, corrupt, manipulative, managerial and control-oriented performance indicators in society by developing participatory, emancipatory, empowering, deliberative, dialogue-oriented, user-oriented and responsive forms of evaluation. These moves have often implied transporting evaluation into smaller social contexts such as neighbourhoods and schools where face-to-face dialogue can thrive and personal, dialogue-oriented relations develop. Many of these initiatives have respectable normative agendas and have contributed to the vitality of the ﬁeld of evaluation, but many of them have substituted community for society as their arena for evaluative practice and self-development for societal autonomy (Castoriadis, 1987) as the object of their normative concern. It is important that some evaluators ﬁnd it worthwhile to remain clearly grounded in the public arena in society in order to deliver research about how public regimes actually function, educate journalists about how to understand evaluative data, qualify the public debate about the meaning of indicators and discuss with citizens and politicians how performance indicators play subtle and not so subtle roles in the handling of democracy.

NOTE 1. Both theoretically and empirically constitutive effects extend beyond conventional types of use such as ‘‘accountability’’, ‘‘learning’’, ‘‘enlightenment’’, ‘‘strategic use’’, etc.

REFERENCES Andersen, V. N., & Dahler-Larsen, P. (2006). The framing of public evaluation data. Transparency and openness in Danish schools. In: R. Boyle, J. Breul & P. Dahler-Larsen (Eds), Open to the Public. Evaluation in the public arena. New Brunswick: Transaction. Bailey, K. M. (1996). Working for washback: A review of the washback concept in language testing. Language Testing, 13(3), 257–279. Bauman, Z. (1983). Industrialism, consumerism, and power. Theory, Culture and Society, 1(3), 32–43. Beck, U. (1997a). Risk society: Towards a new modernity. London: Sage. Beck, U. (1997b). The reinvention of politics. Rethinking modernity in the global social order. Cambridge: Polity Press. Bouckaert, G., & Balk, W. (1991). Public productivity measurement: Diseases and cures. Public Productivity and Management Review, 15(2), 229–235. Boyle, R., Breul, J., & Dahler-Larsen, P. (Eds) (2006). Open to the Public. Evaluation in the public arena. New Brunswick: Transaction. Castoriadis, C. (1987). The imaginary: Creation in the social-historical domain. Stanford: Stanford University Press.

34

PETER DAHLER-LARSEN

Cheney, G., McMillan, J. J., & Schwartzman, R. (1997). Should we buy the student as consumer metaphor? Working paper at: http://mtprof.msun.edu/Fall1997/Cheney.html Courtney, M. E., Needell, B., & Wulczyn, F. (2004). Unintended consequences of the push for accountability: The case of national child welfare performance standards. Children and Young Services Review, 26, 1141–1154. Foucalt, M. (1991). Governmentality. In: G. Burchelle, C. Gordon & P. Miller (Eds), The Foucault effect: Studies in governmentality (pp. 87–104). Chicago: University of Chicago Press. Fountain, J. E. (2001). Paradoxes of public sector customer service. Governance, 14(1), 55–73. Gallie, W. B. (1956). Essentially contested concepts. Proceedings of the Aristotelian Society. Vol. 56. Giddens, A. (1990). The consequences of modernity. Cambridge: Polity Press. Giddens, A. (1994). Modernitetens konsekvenser. [Consequences of modernity.] Copenhagen: Hans Reitzels Forlag. Gordon, C. (Ed.). (1980). Power/Knowledge: Selected interviews and other writings 1972–1977. Brighton, Harvester Press. Hellstern, G. M. (1986). Assessing evaluation research. In: F. X. Kaufmann, G. Majone & V. Ostrom (Eds), Guidance, control and evaluation in the public sector (pp. 279–313). Berlin: de Gruyter. Hochshild, A. (2004). Gennem sprækker i tidsfælden. In: M. H. Jacobsen & J. Tonboe (Eds), Arbejdssamfundet (pp. 109–130). Copenhagen: Hans Reitzel. Hood, C. (2002). The risk game and the blame game. Government and Opposition, 37(1), 15–37. Højlund, H., & la Cour, A. (2001). Standardiseret omsorg og lovbestemt ﬂeksibilitet – organisationsændringer pa˚ et kærneomra˚de i velfærdsstaten. Nordiske Organisasjonsstudier, 2(3), 91–117. Karlsson Vestman, O., & Conner, R. F. (2006). The relationship between evaluation and politics. In: I. Shaw, J. C. Greene & M. M. Mark (Eds), Handbook of evaluation: Policies, programs and practices. London: Sage. March, J., & Olsen, J. P. (1976). Ambiguity and choice in organizations. Oslo: Universitetsforlaget. Mark, M., & Henry, G. (2002). A theory of evaluation inﬂuence: The multiple mechanisms and pathways through which evaluation can inﬂuence attitudes and action. Paper presented at the EES conference in Seville, October 12, 2002. Mark, M., Henry, G., & Julnes, G. (2000). Evaluation. An integrated framework for understanding, guiding, and improving public and nonprofit policies and programs. San Francisco: Jossey-Bass. McBriarty, M. A. (1988). Performance appraisal: Some unintended consequences. Public Personnel Management, 17(4), 421–433. Modell, S. (2004). Performance measurement myths in the public sector: A research note. Financial Accountability and Management, 20(1), 39–55. Munro, E. (2004). The impact of audit on social work practice. British Journal of Social Work, 34, 1075–1095. Perrin, B. (1998). Effective use and misuse of performance measurement. American Journal of Evaluation, 19, 367–379. Ridgway, V. F. (1956). Dysfunctional consequences of performance measurements. Administrative Science Quarterly, 2, 240–247. Rose, N. (2003). At regere friheden – en analyse af politisk magt i avanceret liberale demokratier. In: C. Borch & L. T. Larsen (Eds), Perspektiv, Magt og Styring. Luhmann og Foucault til discussion (pp. 180–199). Copenhagen: Hans Reitzels Forlag.

Constitutive Effects of Performance Indicator Systems

35

Rothstein, H., Huber, M., & Gaskell, G. (2006). A theory of risk colonization: The spiralling regulatory logics of societal and institutional risk. Economy and Society, 35(1), 91–112. Røvik, K. A. (1998). Moderne organisasjoner. Trender i organisasjonstenkingen ved tusena˚rsskiftet. Bergen-Sandviken: Fagbokforlaget. Sahlin-Andersson, K. (2004). Presentation at the conference of the Swedish Evaluation Society (SVUF) in Stockholm in May. Schwandt, T., & Dahler-Larsen, P. (in press). Evaluation and community. Evaluation. Scott, W. R. (1987). The adolescence of institutional theory. Administrative Science Quarterly, 32, 493–511. Shulha, L. M., & Cousins, B. (1997). Evaluation use. Theory, research and practice since 1986. Evaluation Practice, 18, 195–208. Smith, P. (1995). On the unintended consequences of publishing performance data in the public sector. International Journal of Public Administration, 18, 277–310. Stake, R. E. (2004). Standards-based and responsive evaluation. Thousand Oaks: Sage. Stehr, N. (1994). Knowledge societies. London: Sage. Stehr, N. (2001). The fragility of modern socieities – Knowledge and risk in the information age. London: Sage. Stronach, I. (1999). Shouting theater in a crowded ﬁre: ‘‘Educational Performance’’ as cultural performance. Evaluation, 5(2), 173–193. Suen, H. K. (2003). Some very long-term and some chronic effects: The invisible dimension of the consequential basis of validity. Paper presented at the 2003 International Symposium on Developing High-stakes Testing. Seoul, Korea. van Thiel, S., & Leeuw, F. (2002). The performance paradox in the public sector. Public Performance and Management Review, 25(3), 267–281. Vedung, E. (1997). Public policy and program evaluation. New Brunswick: Transaction. Vulliamy, G., & Webb, R. (2001). The social construction of school exclusion rates: Implications for evaluation methodology. Educational Studies, 27(3), 357–370. Woolgar, S. (2004). Marketing ideas. Economy and Society, 33(4), 448–462.

This page intentionally left blank

POETRY, PERFORMANCE AND PATHOS IN EVALUATION REPORTING Leslie K. Goodyear Language is constitutive. ‘‘Language does not ‘reﬂect’ social reality, but produces meaning, creates social reality. y Language is how social organization and power are deﬁned and contested and the place where our sense of selves, our subjectivity, is constructed’’. Richardson (2000), p. 929, italics in original.

INTRODUCTION In this chapter, I present a methodological ‘‘response to the increasingly de-personalized managerialist program and evaluation world’’ (Kushner, personal communication, 10/11/2005). This response highlights and addresses important evaluation challenges regarding the presentation of ﬁndings to audiences (representation), evaluation use, accountability and authority. Decisions regarding representation of evaluation ﬁndings are often afterthoughts, coming at the end of an evaluation process and relying on commonly held assumptions about data, ﬁndings, stakeholders, communication and purposes for evaluation. In this chapter, I will

Dilemmas of Engagement: Evaluation and the New Public Management Advances in Program Evaluation, Volume 10, 37–52 Copyright r 2007 by Elsevier Ltd. All rights of reproduction in any form reserved ISSN: 1474-7863/doi:10.1016/S1474-7863(07)10003-X

37

38

LESLIE K. GOODYEAR

consider alternative ways to think about creating and presenting evaluation ﬁndings. This new point of view challenges the assumptions that underlie more conventional notions of evaluation reporting and brings new questions regarding the representation of evaluation ﬁndings to the conversation. Issues of representational format in reporting touches on issues of: evaluation use – How, by whom and for what purpose are evaluations used? accountability – For what information and to whom are evaluations and evaluators accountable? authority – Who has the authority to present evaluation ﬁndings and to speak for those represented? My work on issues of representation in evaluation came out of a convergence of my interest in performance and poetry, a concern for evaluation use and a desire to bring some of the insights of postmodernism regarding representation to evaluation practice. As a graduate student in program evaluation, I became increasingly concerned that evaluation reports – whether presenting quantitative or qualitative data, formative or summative ﬁndings, for academic or practitioner use – were not able to convey the richness of program experiences as I saw them during the evaluation process. Often, these reports were ﬂat and onedimensional, reporting on the lives of participants as though they were tallied outcomes of programs; de-humanized programmatic elements to be counted, measured and objectiﬁed. I felt that the emotion behind the participants’ stories of their experiences – the highs and lows, the learnings and setbacks and the reality of their lives as they related them to the program – were missing from these reports and I wanted to ﬁnd a way to convey all of this (and more!) in evaluation reporting. I was interested in ﬁnding other ways to report evaluation ﬁndings that would honor participants’ complex lives and highlight the emotion, drama and subtleties of the stories told by participants. I wanted to know how evaluation audiences would react to a different kind of report; one that put them in closer contact, so to speak, with the participants in the program. I wondered if there was a way to take the systematic information gathered in an evaluation – in my case, a participatory evaluation – and somehow create a report that would offer a different kind of information, and therefore a different kind of experience to an audience. And I wondered if that audience would be differently

Poetry, Performance and Pathos in Evaluation Reporting

39

motivated to act on the information conveyed in a novel form of evaluation reporting. Could evaluation ﬁndings be presented poetically or performatively, and if so, what would it mean for understanding and action, and for evaluation use? So, I did what a good graduate student does when faced with such a question: I immersed myself in the literature! I found that I was not the ﬁrst to think of representing research as performance or poetry. Sociologists such as Laurel Richardson (1990, 1991, 1992, 1993, 1994, 1995a, 1995b, 2000), Richardson and St. Pierre (2005), Patti Lather (1990), Lather and Smithies (1997), Marianne Paget (1995), Norman Denzin (1994, 1997, 2000, 2003), Denzin and Lincoln (2000, 2005) and Corrine Glesne (1997) had been experimenting with such novel representations as poetic transcriptions, ﬁction and performance. Postmodernists had been confronting the ‘‘crisis of representation’’ (Dickens & Fontana, 1994, p. 2) and dealing with the challenges of re-presenting experience, authority and researcher presence for quite some time. These sociologists had experimented with new forms of representation mostly for their own academic interests and for audiences of peers. The evaluation literature, however, was less robust (read: virtually not existent, except for Denzin (2002) and MacNeil (2000)). How could I bring these theories and experiments to the more practical, applied realm of program evaluation? What experimentation could be done to understand what an audience might get from a performative representation of research or evaluation ﬁndings? In this chapter, I describe three unconventional representations1 of evaluation ﬁndings I created and presented to different audiences in an exploration of the connection between format for presentation of evaluation ﬁndings and the audience reactions and plans for possible use of those ﬁndings. As an analytic frame for the chapter, I ask the following three questions: – What should we consider when creating representations of evaluation ﬁndings? – How do audiences react to poetic and performative representations? – By what criteria should we judge these representations? With this piece, I hope to articulate what these representational approaches bring to evaluation and get beyond what is in my opinion a common perspective, that using a poem or performance is a nice way to engage audiences, to thinking methodologically and epistemologically about these creative representational approaches.

40

LESLIE K. GOODYEAR

WHAT SHOULD WE CONSIDER WHEN CREATING REPRESENTATIONS? How we are expected to write effects what we can write about. (Richardson, 1990, p. 16, italics in original)

My experimentation with creative representations of evaluation ﬁndings centered on using performance and poetry as representational formats. Over the course of a few years (1996–1999), I created two performances and one set of poems to represent an HIV/AIDS prevention program and associated datasets. I created the ﬁrst performance from verbatim transcripts of interviews of program participants who also took part in a participatory evaluation of the HIV/AIDS prevention program. For the second performance, I used multiple secondary data sources that were generated from a quasi-experimental study focused on the HIV/ AIDS prevention program. And for the poetry, I used other secondary data sources – national HIV/AIDS surveillance surveys of both adults and youth – and my own analytic understanding of the programs and issues in the ﬁeld of HIV/AIDS prevention (see Goodyear, 1997, 2001). As I thought about how I might represent the prevention program participants’ experiences using creative representational formats, I confronted many challenges, some common to evaluators, some more speciﬁc to attempts to poetically and performatively re-present. Although all evaluators make representational choices in the writing and presenting of their evaluation reports, I found that creating a performance or set of poems highlighted the challenging nature of these representational choices and brought the critical issues of the postmodern crisis of representation – e.g., authority, voice, researcher presence, etc. – front and center. What was I hoping to convey about the program and its participants to the audiences? What was I hoping the audience would learn? What elements of the participants’ experiences would I re-present using this format? How could I convey the sense of emotion and meaning making that I understood was present in the program? As an exploration of representational decision making, I consciously asked and answered a number of questions similar to these as I created the representations. The questions are represented in the following table, thematically organized according to the larger representational issues they address.

Poetry, Performance and Pathos in Evaluation Reporting

41

Framing Questions for Creating Representations Category

Questions

Purpose of presentation

What do I hope to achieve with this representation (performance/poem)? Who is the audience for the representation? What do I want the audience to learn, to understand or to do?

Data/textual issues

Should the performance be an actual conversation from the transcript? Or dialogue created from arranging excerpted quotes? Should those quotes be attributed to the actual participants who said them? Can I create dialogue from program reports and other documents? Should I create composites in order to enhance the drama or move the performance along? What should be the focus of the performed dialogue? The dramatic moments? Conversation that represents the range of topics addressed in the program? Should it represent one moment or event, or the progression of conversation over time? A typical conversation? Should the poetry be a poetic transcription or poetry I write from my analytic understanding? Should I, my voice, be included in the performance or poetry? If so, should my parts also be from transcripts? Do performances and poems always have to come from qualitative data?

Staging

Should this textual performance/poem be performed live? Or is it meant to be read? Who should perform this piece? People who represent the backgrounds of the program participants? Actors? Should I perform it? Should it be read aloud from a script, or memorized, like a play? Should there be props that simulate the program experience (table, chairs, coffee, bagels, etc.)? Should each character be identiﬁed with a name tag or other identiﬁer? Or should they remain unnamed?

42

LESLIE K. GOODYEAR

Continued Category

Questions Should the setting of the representation – the way in which the performance is staged – be familiar to the audience?

Background

What background information should be given to the audience about the program? About the program participants as a group or individually? About the evaluation? About the process of creating the poem or performance? About my role in the evaluation and creation of the piece?

Presenting identities

Who is being represented? What parts of people’s lives are foregrounded and what parts are ignored? What aspects of ‘‘us’’ (evaluators) and ‘‘them’’ (the program participants and stakeholders) are most important to present? In relation to what – the program? A speciﬁc activity? Each other? To me, the evaluator? To their backgrounds and expectations? What are the important elements of identity that need to be conveyed in the reporting of evaluation ﬁndings?

Voice

Whose voices are prominent? Whose are missing? Who is speaking? For whom?

Authority

How do I represent myself in the performance or poetry? Should I? How would my description/performance/poems differ than those created by the program participants? Or one that we might create together? How do we acknowledge our position as evaluators – a position of authority – and creators of representations without overshadowing the content or the voices of the participants? What should I tell the audience about my role in the evaluation and my interest in performance?

Poetry, Performance and Pathos in Evaluation Reporting

43

While some of these questions are common to evaluators no matter which representational format they choose, I wondered if I were to create a more traditionally styled evaluation report, would I ask and answer these or similar questions. Or, in other words, should evaluators always ask and answer questions such as these when creating representations of evaluation ﬁndings?2 Why are these questions important for evaluators to answer as they create representations of evaluation ﬁndings? Before I offer a response to these questions, let me ﬁrst describe how audiences reacted to the poetic and performative representations I presented. Their reactions highlight some of the issues raised by the questions I asked during the creation of the representations.

HOW DO AUDIENCES REACT TO CREATIVE REPRESENTATIONS? After creating the representations, I wanted to know how audiences would react to such presentations of evaluation ﬁndings and whether these formats would be acceptable, challenging or even dismissed out of hand. I presented these representations of evaluation ﬁndings to multiple audiences who represented groups of people who would have a stake in the information conveyed in the presentations. These groups included: evaluators; HIV/ AIDS researchers; staff of the program; and program participants. After each presentation, I facilitated a group discussion focusing on their reactions to the presentation, what they learned from it and how they thought they might use the information. After recording, transcribing and analyzing the data from these conversations, I organized their responses into themes that represented the issues they expressed in the group discussions. A discussion of some of these themes follows. The Pull of the Drama: Form vs. Content In each audience discussion, points were raised about the tension between the inherent drama of a performance or reading of poetry and resisting that drama by reading from a data-based text. Many of the comments that followed the performances included suggestions about how to make this a better performance: memorize the script; have actors play the parts; do not move so much; stand behind a screen; use different markers for the participants such as different colored cards, or a hat on one and a scarf on another. Some

44

LESLIE K. GOODYEAR

audience members felt that with performance, the content came through in a stronger way. Others felt that the action in the performance made them lose much of the content of the experiences that were being portrayed. Some also felt that although they were less able to focus on the content than if they were to read a representation of the same experiences, they were able to understand the emotional nature of the experiences more clearly through performance or poetry. Regardless of the reaction to the performance – whether it was too dramatic or not dramatic enough – the discussion audience members engaged in was robust and controversial regarding this issue of drama.

What Evaluation Should and Should Not Be, and How It Should Be Presented This framework, articulated by audience members, included expectations that, in order to be evaluation, certain kinds of information needs to be presented, including information regarding the program and summaries of the impact of the program on its participants. In addition, according to many in the stakeholder audiences (particularly the evaluators who participated), the presentation of evaluation ﬁndings needs to attribute value to the information presented and render a judgment about the program in the representation. In response to the representation of survey data as poetry, the discussion relating to the importance of including a summary of program impact was particularly heated. Many stakeholders felt that the presentation of evaluation ﬁndings as poetry – even if it conveyed a new or different, more emotional or artistic, aspect of familiar information – was hard to access without some summary information to grasp onto. One participant articulated some of these concerns when he said, ‘‘I think I heard the emotion and the feeling, but I didn’t really hear information.’’ Many audience members felt as though without a structure on which to hang their understandings, e.g., in the form of a summary, something was missing from the presentation. Similarly, for the performance that represented the participants’ experiences in the HIV/AIDS prevention program, some evaluator audience members felt as though the presentation conveyed important information, such as ‘‘a lot of the main cultural, personal and living experiences of dealing with the threat of AIDS,’’ but that it was lacking key elements that would make it evaluative. One clariﬁed, ‘‘I thought it conveyed a lot about the women, but basically nothing about the program,’’ while another added that he had thought something similar, hearing, ‘‘stories about the

Poetry, Performance and Pathos in Evaluation Reporting

45

participants, but nothing that was evaluative, or context.’’ More than one audience member voiced concern that important aspects of evaluation were missing from this presentation.

The Audience Becomes the Evaluator Many evaluator audience members felt as though they were left hanging at the end of the performance and poetry presentations, without information from the presenter that would help them to draw any meaning or conclusions from the information. Some felt as though this meant that it was left in the hands of the audience to make those connections and generate conclusions, that in a way, ‘‘the listener became the evaluator,’’ which, to them, was problematic. (see Abma, 1997; Rosenau, 1992, for a discussion on the concept of readerly and writerly texts.) According to audience members, representing evaluation ﬁndings performatively or poetically offers an opportunity for dialogue about the program, about the meaning of the representation, about the information conveyed and about the quality of the evaluation. It is through these discussions that audience members become the evaluators. In addition, creative representations such as these offer the opportunity for program participants to be audience for their own work, having it represented back to them for reﬂection and discussion.

Evaluation’s Rigor vs. Evaluation’s Accessibility This tension between needing to make evaluation understandable to multiple audiences and maintaining high standards for evaluation reports – e.g., validity, reliability – was related to an audience member’s position within the ﬁeld of evaluation or position within a program or agency. Position, either within the ﬁeld or within an organization, can also serve as a marker for educational level, relationship to research and evaluation and even philosophical stance. The positions represented by those who participated in the group discussion/interviews included evaluator, researcher, program staff and program participant.3 When it came to differences in how audience members perceived the acceptability of these creative forms of representation, I found that audience members’ position – as an evaluator, program staff person, program participant or researcher – served as a lens through which they viewed issues of representation in evaluation.

46

LESLIE K. GOODYEAR

Within the audience discussions, there were differences in the ways evaluation professionals and program staff and participants talked about issues that affect the presentation of evaluation ﬁndings. Many evaluator audience members were not willing to give up some of the ‘‘standards,’’ as they saw them, of evaluation and research, such as objectivity and value assessment, validity and generalizability, while for those who were program staff and participants, these ‘‘standards’’ were not at issue. In the minds of some of the evaluator audience members, the unconventional representations were interesting, but would not ﬁt for representations of evaluation ﬁndings because they did not possess the elements that they believe are necessary standards for evaluation. For those audience members who were not evaluators, these issues were less likely to be of concern. In fact, for those who were program participants, these were not concerns at all. Program staff were more likely to discuss whether the presentation, or information included in the presentation, could be used to recruit participants, to convince funders of the importance of the program’s process and results, to improve the program by giving feedback on its implementation to program staff and administrators or to offer community members a view into the ways in which the program operates. In other words, while evaluators who participated in the interviews were primarily concerned with issues of research validity and reliability – at the possible expense of audience understanding – the program staff and parents prioritized understanding and ‘‘buy-in’’ over meeting standards of postpositivist social inquiry.

Looking for More Connection to Program Participants/Beneficiaries (Wanting Something More Personal) Audience members who were parents in the HIV/AIDS prevention program articulated another perspective on the importance of making the presentation of evaluation ﬁndings meaningful. They talked about looking for presentations that contain information that comes from the heart, are moving, even disturbing or colorful in how the ﬁndings are presented. They suggested that a presentation of evaluation ﬁndings should give the audience ‘‘food for thought that gives somebody something to say, hmmm.’’ For these stakeholders, valuable, meaningful program evaluation information is presented not in the form of sterile, scientiﬁc information to be used in a research or evaluation setting but in a form that conveys the emotion and urgency of the problem and those whose lives are affected by it.

Poetry, Performance and Pathos in Evaluation Reporting

47

Exemplifying this kind of understanding, an evaluator related what she understood from the representation, saying, ‘‘I got out of it a sort of rising awareness among the women, about the meaning of the training for their personal lives. I also got out of it that they have become political. In the sense that they have become very concerned about community y and concerns they have about sharing the information with the kids.’’ The human qualities explicit in the representations allowed the respondents to personalize the information presented. The performance represented, as articulated by one of the program staff, ‘‘the living example of how the program works.’’ As people who implement, evaluate and participate in AIDS prevention programs, these stakeholders said they could relate to the way the unconventional representations portrayed ‘‘the complexity of process and reaction to a program,’’ and the ‘‘feelings associated with various aspects of the HIV/AIDS issue.’’ For example, as one researcher interview participant remarked, ‘‘But it really is an interesting approach and I think it shows an attempt to connect, which might actually get to people in a way that a bunch of charts and so forth might not.’’ A program staff audience member added that she thinks that a positive element of unconventional representational formats is that they are ‘‘more inclusive, allowing others to have more ﬂexible interpretations.’’

The Power of the Presentation: Its Possibilities and Its Cautions A researcher audience member was concerned interpretations of data and interpretations of presentations be distinguished, stating, ‘‘I, as researcher and presenter, may ﬁnd it reasonable for my audience to reach ‘various and conﬂicting’ interpretations of the data that I have gathered, but I don’t want them to have various and conﬂicting interpretations of my conclusions from the research y . At a coffee house reading I can sip my mochalattifrappuchino and sagely mumble how poems are free agents and we never know which truth they will unleash to bite us in the ass. In a presentation of results that laissez-faire attitude is not acceptable.’’ I have included these reﬂections from audience members of my presentations to highlight the ways in which evaluation stakeholders accept or reject the possibility of using creative formats for representing evaluation ﬁndings. These reactions highlight the role position – whether one is an evaluator, program staff person or program participant, for example – plays in accepting new formats for evaluation reporting. They also bring to life some of the evaluation issues – evaluation use, accountability, power and

48

LESLIE K. GOODYEAR

democratic dialogue – that are addressed by the representational questions I raised in the previous section of this chapter. Obviously, there are no right answers for any of the questions I raise; these are questions posed for consideration and for framing the important aspects of representations of evaluation ﬁndings. Another framing of these representational issues brings them to a new, more general level, addressing challenging issues in the ﬁeld including: Evaluation use: What is the purpose of the representation? What do I want the audience to learn or do as a result of this representation? How can format for representing evaluation ﬁndings contribute to evaluation use? Accountability: Should the representation be an actual conversation from interview transcripts? Who should perform the piece? Who is being represented and who is doing the representing? How do I represent myself in the representation? To whom is this representation accountable? To those it portrays? To those who funded it? How does the representation of evaluation ﬁndings demonstrate to whom the evaluation and the evaluator are accountable? Power: Who should present the representation? Whose voice is prominent? Whose voice is missing? Who is speaking? For whom? How would my description, performance or poems differ than those created by the program participants? Or one that we might create together? Am I ‘‘speaking for,’’ or ‘‘speaking with’’ (Alcoff, 1994)? How does an evaluator exercise power through the representation of evaluation ﬁndings? How are issues of power addressed, highlighted or ignored depending on the representational format for presenting evaluation ﬁndings? Democratic dialogue: What do I hope to achieve with this representation? What do I want the audience to do with the information presented in this format? How can this representation bring the audience into dialogue with the program participants and other stakeholders? How can evaluators leverage the format for presenting evaluation ﬁndings to promote democratic dialogue?

BY WHAT CRITERIA SHOULD WE JUDGE THESE REPRESENTATIONS? However, these questions, and the issues they raise, beg a larger question: what should be the criteria by which representations of evaluation ﬁndings

Poetry, Performance and Pathos in Evaluation Reporting

49

are judged? While existing criteria for judging the quality of an evaluation report exist – articulated in the American Evaluation Association’s Guiding Principles for Evaluators (2004) and the Joint Committee on Educational Standards’ Program Evaluation Standards (1994) – these are not sufﬁcient for determining the value and quality of poetic, performative or other forms of creative representations of evaluation ﬁndings. I would argue that any criteria designed speciﬁcally to judge the quality of creative formats for representing evaluation ﬁndings supplement, not supplant, already existing quality criteria just as creative representations of evaluation ﬁndings cannot, and should not, take the place of other formats for representing evaluations (Sparkes, 1995). These new formats should be seen as additions to the lexicon of possibilities for presenting evaluation ﬁndings to audiences. And new criteria for judging the quality of representations should build upon the already accepted criteria for good evaluation, which include: systematically collected data; evaluator competence; integrity and honesty; respect for people; and responsibilities for the general and public welfare (Guiding Principles for Evaluators, 2004); utility; feasibility; propriety; and accuracy (Program Evaluation Standards, 1994). A starting place for thinking about what these criteria might look like is the criteria that Richardson (2000) outlined for judging the quality of creative analytical processes ethnography. Criteria for evaluating CAP ethnographies (Richardson, 2000): Substantive contribution – Does this piece contribute to our understanding of social life? Does the writer demonstrate a deeply grounded (if embedded) social scientiﬁc perspective? Aesthetic merit – Does this piece succeed aesthetically? Is the text artistically shaped, satisfying, complex and not boring? Reﬂexivity – How has the author’s subjectivity been both a producer and a product of this piece? Impact – Does the piece affect me emotionally? Intellectually? Does it generate new questions? Move me to write? Move me to try new research practices? Expression of a reality – Does this piece embody a ﬂeshed out, embodied sense of lived experience? Does it seem ‘‘true’’? However, it is important to note that those who have experimented with creative formats for representing research in the social sciences – Richardson’s CAP, for example – have focused on the relationship of the

50

LESLIE K. GOODYEAR

creator to the creation and the process of creating the text/performance/ poem, not on how an audience understands or engages with the piece. The criteria outlined above begin to touch on the relationship between the performer/presenter and the audience, but for the representation of evaluation ﬁndings, criteria for audience understanding might need to go beyond questions of how an audience is moved emotionally by a presentation to how that experience is translated into action and change. What is at stake if we decide not to engage in a conversation about the role of representational format in the presentation of evaluation ﬁndings? What assumptions – about evaluation as a profession, about data and evaluation ﬁndings, about reporting, about objectivity and the way we ‘‘do’’ evaluation – are challenged by thoughtfully engaging in the poetic, performative or creative representation of evaluation ﬁndings to audiences? When we question the basic premises of the way we represent evaluation ﬁndings, what surfaces are new ‘‘platforms from which to examine unexamined assumptions’’ (Eisner, 1990, p. 89) allowing for new discussions to emerge regarding the place of representation in evaluation and the place of evaluation in social programs, social policy and life, generally. How can we balance the power of the presentational form with important aspects of evaluation reports, such as validity and transparency – particularly methodological transparency – and evaluation use? What should we expect of evaluation, for whom, and how can we use the representation of evaluation ﬁndings to deliver on our expectations? How can we use new representational formats to help audiences of evaluation stakeholders come to new understandings about the programs being evaluated, the people served by those programs, and the social world that creates and maintains those social programs? Maybe it is to inciting stakeholders’ new understandings of programs and program participants – and to challenging representational assumptions – that evaluation and evaluators should be held accountable.

NOTES 1. Richardson (2000) coined the term ‘‘creative analytical processes’’ ethnography to describe what I called ‘‘unconventional representations’’ in my dissertation. The qualities of these representations include: the author has moved outside conventional social scientiﬁc writing; they invite people in and open spaces for thinking about the social; they adapt to the social/political world we inhabit; and the writing process and product are intertwined (Richardson & St. Pierre, 2005).

Poetry, Performance and Pathos in Evaluation Reporting

51

2. Even the language I use here divulges my perspective. I use the word ‘‘create’’ when talking about evaluation ﬁndings, which implies that ﬁndings, or evaluation reports, do not automatically appear from data, but that the evaluator has choices regarding what and how to represent his or her work, and is therefore the author of the representation. This means, then, that the evaluator is also choosing to ‘‘author’’ representations of people’s lives and experiences. This analysis inserts the evaluator into the mix as an authorial, or even authoritative, presence. Very postmodern. 3. I would also extend the analysis to people in positions as funders, administrators of government agencies and social inquirers.

REFERENCES Abma, T. (1997). Sharing power, facing ambiguity. In: L. Mabry (Ed.), Evaluation and the postmodern dilemma (pp. 105–119). Greenwich, CT: JAI Press. Alcoff, L. (1994). The problem of speaking for others. In: S. O. Weisser & J. Fleischner (Eds), Feminist nightmares. Women at odds: Feminism and the problem of sisterhood (pp. 285–309). New York: New York University Press. American Evaluation Association. (2004). Guiding principles for evaluators. Retrieved from http://eval.org/publications.asp#Guiding%20Prin, 6/13/2006 Dickens, D. R., & Fontana, A. (Eds). (1994). Postmodernism and social inquiry, New York: Guilford Press. Denzin, N. K. (1994). The art and politics of interpretation. In: N. K. Denzin & Y. S. Lincoln (Eds), Handbook of qualitative research (pp. 500–515). Thousand Oaks, CA: Sage. Denzin, N. K. (1997). Performance texts. In: W. G. Tierney & Y. S. Lincoln (Eds), Representation and the text: Re-framing the narrative voice (pp. 179–217). Albany: State University of New York Press. Denzin, N. K. (2000). Rock creek history. Symbolic interaction, 23(1), 71–81. Denzin, N. K. (2002). Performing evaluation. In: K. Ryan & T. Schwandt (Eds), Exploring evaluator role and identity (pp. 139–165). Greenwich, CT: Information Age. Denzin, N. K. (2003). Performing ethnography: Critical pedagogy and the politics of culture. Thousand Oaks, CA: Sage Publications. Denzin, N.K., Lincoln, Y.S. (Eds). (2000). Handbook of qualitative research (2nd ed). Thousand Oaks, CA: Sage Publications. Denzin, N.K., Lincoln, Y.S. (Eds). (2005). The Sage handbook of qualitative research (3rd ed). Thousand Oaks, CA: Sage Publications. Eisner, E. W. (1990). The meaning of alternative paradigms for practice. In: E. Guba (Ed.), The paradigm dialog (pp. 88–102). Newbury Park, CA: Sage. Glesne, C. (1997). That rare feeling: Re-presenting research through poetic transcription. Qualitative Inquiry, 3(2), 202–221. Goodyear, L. (1997). ‘‘A Circle that it’s time to open’’: Using performance as a representation of a participatory evaluation. Unpublished master’s thesis, Cornell University, Ithaca, NY. Goodyear, L. (2001). Representational form and audience understanding in evaluation: Advancing use and engaging postmodern pluralism. Unpublished doctoral dissertation, Cornell University, Ithaca, NY. Joint Committee on Standards for Educational Evaluation. (1994). The program evaluation standards. Thousand Oaks, CA: Sage.

52

LESLIE K. GOODYEAR

Lather, P. A. (1990). Reinscribing otherwise: The play of values in the practices of the human sciences. In: E. Guba (Ed.), The paradigm dialog (pp. 315–332). Newbury Park, CA: Sage Publications. Lather, P., & Smithies, C. (1997). Troubling the angels: Women living with HIV/AIDS. Boulder, CO: Westview Press. MacNeil, C. (2000). The prose and cons of poetic representation in evaluation reporting. American Journal of Evaluation, 21, 359–367. Paget, M. (1995). Performing the text. In: J. Van Maanen (Ed.), Representation in ethnography. Thousand Oaks, CA: Sage Publications. Richardson, L. (1990). Writing strategies: Reaching diverse audiences. Newbury Park, CA: Sage Publications. Richardson, L. (1991). Postmodern social theory: Representational practices. Sociological Theory, 9(2), 173–179. Richardson, L. (1992). The poetic representation of lives: Writing a postmodern sociology. Studies in Symbolic Interaction, 13, 19–29. Richardson, L. (1993). Poetics, dramatics, and transgressive validity: The case of the skipped line. The Sociological Quarterly, 34(4), 695–710. Richardson, L. (1994). Writing: A method of inquiry. In: N. K. Denzin & Y. S. Lincoln (Eds), Handbook of qualitative research (pp. 516–529). Thousand Oaks, CA: Sage. Richardson, L. (1995a). Narrative and sociology. In: J. Van Maanen (Ed.), Representation in ethnography. Thousand Oaks, CA: Sage Publications. Richardson, L. (1995b). Writing stories: Co-authoring ‘‘The Sea Monster,’’ a writing-story. Qualitative Inquiry, 1(2), 189–203. Richardson, L. (2000). Writing: A method of inquiry. In: N.K. Denzin & Y.S. Lincoln (Eds), Handbook of qualitative methods (2nd ed, p. 30). Thousand Oaks, CA: Sage Publications. Richardson, L., & St. Pierre, E. (2005). Writing: A method of inquiry, In: N.K. Denzin & Y.S. Lincoln (Eds), The Sage handbook of qualitative methods (3rd ed., pp. 959–978). Thousand Oaks, CA: Sage Publications. Rosenau, P. (1992). Post-modernism and the social sciences: Insights, inroads and intrusions. Princeton, NJ: Princeton University Press. Sparkes, A. C. (1995). Writing people: Reﬂections on the dual crises of representation and legitimation in qualitative inquiry. Quest, 47, 158–195.

EVALUATING COMPLEX PUBLIC POLICY PROGRAMMES: REFLECTIONS ON EVALUATION AND GOVERNANCE FROM THE EVALUATION OF CHILDREN’S TRUSTS Chris Husbands COMPLEX PUBLIC POLICY PROGRAMMES In this paper, I offer a series of reﬂections on some issues involved in evaluating complex public policy programmes in a context of rapid change and against the background of an activist government setting exacting standards for the delivery of public service outcomes. I argue that the overlay of pressures on programme evaluation as a result of the engagement with such complex public policy interventions presents a series of challenges for evaluation. By ‘complex public policy programmes’ I refer to either integrated or loosely coupled policy initiatives which share the following characteristics: they depend on marshalling change agency across a number of levels of government; Dilemmas of Engagement: Evaluation and the New Public Management Advances in Program Evaluation, Volume 10, 53–65 Copyright r 2007 by Elsevier Ltd. All rights of reproduction in any form reserved ISSN: 1474-7863/doi:10.1016/S1474-7863(07)10004-1

53

54

CHRIS HUSBANDS

they depend on monitoring performance outcomes across a range of service contexts; they are not the focused remit of any individual agency or government level; and they normally operate in areas of high political surveillance. These characteristics arise from a number of features in public governance since the development of the ‘new public management’ of the 1980s. The concern to marshal change agency across a number of levels of government is deeply embedded in the shifting nature of governance: the ‘hollowing out’ of the nation state (Rhodes, 1997; Holloway, 2000) – ‘too small for the big issues, too big for the small issues’, as Dahrendorf memorably phrased it (quoted in Wilding, 1997) – and the consequent engagement of the local state and the voluntary and community sector in the development and delivery of national public policy. Attempts to align, and then to manage, multi-level governance arrangements around policy intervention also arise from a recognition of the limitations of command and control hierarchies, and the need for more ﬂexible, adaptive systems to respond to increasingly complex and diverse patterns of social life (Benington, 1997). The concern with the monitoring of performance outcomes across a range of service contexts (e.g., Parston, 1998) and the development of policy initiatives which are not the focused remit of any individual agency is associated with the engagement of government with the ‘wicked’ issues such as social exclusion, life cycle experiences (teenage pregnancy, drug abuse and the exclusion of the elderly) which cut across the work of different government and local government agencies, and provide persistent challenges to the ‘silo thinking’ which characterises classic government policy intervention (Wyman, 2001). High political surveillance is characteristic of complex public policy interventions for a number of reasons: the impatience of and ministers, with an almost single-minded focus on the short-term, and their policy-advisers to identify results in immediate ‘delivery’, determination to deploy social audit to off-set the high risk associated with complex interventions, involving collaboration across agencies, and the increasing concern in what has been called the ‘evaluative society’ to demonstrate social return on economic investment (Power, 1997). One of the contentions of this paper is that complex public policy interventions are increasingly becoming the tools of choice for government at national and local level in order to broker transformational change in structures and outcomes, and that evaluation is itself a tool for policy makers in the development and design of such complex interventions.

Public Policy Programmes and Evaluation of Children’s Trusts

55

Complex public policy programmes have become common in the UK over the last decade and a half and have replaced earlier, more focused policy initiatives. In the later 1990s, government funded a series of initiatives which were designed to marshal innovative local solutions to long-standing economic and social difﬁculties. Early examples of such interventions in the UK included the Better Government for Older People programme (Hayden & Boaz, 2000), Health Action Zones (Barnes, Sullivan, & Matka, 2003) and Education Action Zones (NAO, 2001; Halpin, Power, Whitty, & Gerwirtz, 2004). Such initiatives were focused in area, often based on pilot areas developing different delivery and organisational models, disciplinary scope and expectations of outcomes. In more recent years, the focus of government effort has moved to more complex interventions covering larger populations and more ambitious cross-agency work. Sure Start (Tunstill, Alnock, Meadows, & McLeod, 2002), Children’s Fund and Children’s Trusts are all examples of ambitious complex public policy interventions: in each case, the intention is to address long-standing, cross-cutting barriers to well-being by marshalling a range of services and then to conﬁgure those services around the needs of the client group rather than around the groups responsible for delivering services. The 1997 election marked a signiﬁcant change in the political context for the development of public policy intervention programmes in the UK. It brought to power a government which was institutionally conservative – unwilling to create new institutional structures – but socially ambitious, and committed to identifying existing structures and institutions and to harnessing a combination devolved public managerial energy, private sector involvement and voluntary agency contributions to deliver social justice outcomes. This concern with ‘standards not structures’ – ‘a banal catchphrase which ﬁrst saw the light of day in The Blair Revolution, co-authored by Peter Mandelson and Roger Liddle and published in 1996’ (Chitty, 1997, p. 3) – deﬁned the political economy of Labour’s education and social interventions: a strongly interventionist government mandating strong managerial interventions in local government, education and social care. Policy was to be more forward looking, more joined-up and more strategic (Cabinet Ofﬁce, 1999), and government sought to energise policy responses to complex ‘cross-cutting’ issues, in crime, health, the environment and social exclusion. These were ambitious goals both in terms of their long-run aspirations – such as the elimination of child poverty within a generation – and in terms of their policy complexity, demanding interventions across layers of government and sectoral structure. They also pose very complex challenges for generating an understanding about the

56

CHRIS HUSBANDS

processes and outcomes of intervention and change. As Sanderson has commented: In order to provide the capacity for learning in complex policy systems, evaluation must become a more ‘exploratory’ and ‘explanatory’ enterprise, to capture and understand system change in response to policy interventions, which is non-proportional and nonlinear. Thus, little change may be evident in a system for a long period but then major change may suddenly (and unpredictably) occur as ‘synergy’ occurs between a number of pre-conditions. y Approaches to evaluation which seek to isolate policy instruments or programmes in controlled situations will produce results of limited usefulness because they are context-bound, lacking the basis for generalization to guide action in other contexts. (Sanderson, 2000)

CHILDREN’S TRUSTS AND THE RECONFIGURATION OF CHILDREN’S SERVICES I develop these ideas about complex policy interventions through the lens provided by Children’s Trusts. Children’s Trusts were ﬁrst developed by government as part of proposals for the reforms of children’s services following the laming Report into the murder of Victoria Climbie (DH, 2003). The government outlined legislative plans for the most extensive reform of children’s services for over 30 years, with the underlying principles of the reform to include greater integration of service planning, commissioning, management and delivery, greater involvement of children and young people in the design and provision of services provided for them, and a reconﬁguration of the relationships between statutory, voluntary and private provision of services for children – including education, health and social care, and encompassing both universal services and targeted provision. Local authorities were invited to bid for pathﬁnder status, and in October 2003, 35 local authorities providing services for some 20% of the children of England were identiﬁed as pathﬁnder authorities and funded to explore ways of reconﬁguring service provision and structure. At the same time, in October 2003, government published a wide-ranging consultation document, Every Child Matters, which outlined the design principles for new service provision, conﬁgured around children’s needs, and identiﬁed ﬁve outcome areas for all children which local authorities and others were required to promote: children should enjoy and achieve, be healthy, stay safe, make a positive contribution and achieve economic well-being. These ﬁve ‘Every Child Matters’ outcomes became the lodestone for the design and commissioning of children’s services. In 2004, barely a year after the 35 pathﬁnders had begun work, and before the outcomes of evaluation of

Public Policy Programmes and Evaluation of Children’s Trusts

57

their work were known, the 2004 Children Act (DfES, 2004a) was passed, requiring all local authorities to move towards integrated provision of children’s services, and government guidance required all local authorities to put in place children’s trust arrangements by 2008. As described above, children’s trusts constitute a complex public policy intervention; they were dependent on marshalling agency across a variety of levels of government – between national government guidance, support for implementation at government regional ofﬁce level and provision of arrangements through local government. They depended on children’s trust managers – themselves relatively junior ﬁgures in local government hierarchies, brokering agreements to participate in trusts, frequently involving pooling of budgets or pooling of governance arrangements, into trusts, involving not only local government social service departments, but also education departments in local government, increasingly autonomous schools, primary care trusts and a variety of voluntary and private sector agencies who were providers or commissioners of services for children. Trusts approached the change management challenge this posed in a variety of ways. Some focused on the needs of speciﬁc groups (for example, early years, disabled children, vulnerable children, children in one for more wards), whilst others took more ambitious routes and developed whole service trusts looking at the needs of all children. A key change driver was the high professional support given at the level of rhetoric to the ﬁve Every Child Matters outcomes. Although these mapped in some cases onto the core preoccupations of particular services (for example, be healthy as the central concern of health workers), trusts and their managers worked to develop the understanding that the barriers to children’s progress in one area were frequently the consequence of difﬁculties in others – the barriers to educational success and enjoyment or good health often lie outside schools and health services, so that effective children’s trust responses to the challenges of Every Child Matters depended on building relationships between different agencies, either informally or formally. Finally, this was an area of high political scrutiny; the challenge of children’s trusts was to improve outcomes for children. Government looked for early results, and results across a range of performance indicators. A multi-professional team from the University of East Anglia was commissioned to evaluate children’s trusts (NECT, 2004). The evaluation team included educationalists, medical statisticians, health economists, community paediatricians, social work academics and demographers, and the evaluation design agreed between the University and the client – the Department for Education and Skills – outlined a mixed methods evaluation strategies,

58

CHRIS HUSBANDS

incorporating extensive surveys across all 35 pathﬁnders, a review of outcome indicators at national level and across the 35 pathﬁnders, work with panels of children and young people and their parents as well as in-depth case study and interview work in a sub-set of pathﬁnders and a further three local authorities. The focus of the evaluation was to be on three aspects of the work of children’s trusts: processes – change processes within children’s services and in the organisation, management, delivery and approach to children’s services; outcomes – evidence of change in outcomes for children, for groups of children or of children’s perceptions of the services provided for them; outputs – evidence of change in the ways of working, performance and organisation of both individual services and of collaborative or integrative relationships between services (NECT, 2004). In the remainder of this paper, I use the national evaluation of children’s trusts as a basis for four sets of reﬂections on the evaluation of complex public policy interventions. One of the pressing challenges of children’s trusts for both practitioners and evaluators arose from their location in a developing context of multilevel governance. The national evaluation of children’s trusts, like most policy evaluations, was commissioned by a government department. Henkel (1991), quoted by Sanderson (2000), sees policy evaluation: y as a contribution to the control of the periphery by the centre, particularly in the management of resources. It stressed values of economy, efﬁciency, ‘value for money’ and effectiveness of performance. As a corollary, it assumed that evaluation would be summative, delivering authoritative judgements, based as far as possible on performance indicators or quantitative measures of input–output relationships and outcomes and set against predeﬁned targets and standards.

Whilst the civil servants who commissioned and managed relationships with the evaluation team certainly stressed the importance of economy, efﬁciency and the demonstration of impact on notoriously stubborn performance indicators, the relation was more complex than this, and two aspects of the complexity merit brief discussion. Although the client for the evaluation was clearly national government, and the evaluation was commissioned as a part of the policy process, one of the strands in government thinking relates more generally to the emergence since 1997 of a more nuanced conception of multi-level governance, and an understanding that whilst policy leadership (though see below) rested at national government level, that policy was shaped in large part by the practical engagement with addressing complex issues in children’s lives in local government. This was not an evaluation of a

Public Policy Programmes and Evaluation of Children’s Trusts

59

national policy, but an evaluation of the elaboration of policy options in a variety of settings. In some ways, the recognition of the limits of central government action links to Rhodes’ conception of the ‘hollowing out’ of the state: the evaluation was conducted at a time when the capacity of central government to intervene directly to micro-manage policy implementation and to prescribe structural forms was becoming constrained – and was, indeed further constrained in the course of the evaluation as support for implementation moved from national government (the Department for Education and Skills) to regional level (Government Regional Ofﬁces). Children’s trusts provide an example of, and an opportunity to test out, developing arenas for enacting public policy and for delivery. A practical consequence of this was an open discussion about the nature of the performance levels and indicators which might form a template for the evaluation of impact: central government had few views because it was not grappling with the development of local strategic plans for children’s services, on which were the most appropriate indicators against which to test performance. Whilst the explicit context of performativity was not underplayed, in practice, the opening up of discourse about what indicators might map local and regional development suggested some different conﬁgurations of the relationship between the ‘core’ and the ‘periphery’. Conventionally, the widespread deployment of performance indicators by central government has been seen as a tool for the direct management of local government and others (van Helden & Tillema, 2005). Although this underlay the integration of children’ services, in practice, local authorities had considerable freedom to identify local priorities – relating to given issues or sub-groups and to nominate appropriate performance indicators. It is in this sense that the relationship between centre and localities becomes more problematic in an area of considerable complexity: indicators can be traded against each other, prioritised differently, and so on. A second set of issues arises from the issue of ‘delivery’ as an underlying mantra of complex policy interventions. There is something of a potential tensions between the complexity of the institutional architecture, which can make rapid change difﬁcult to negotiate and implement and government’s concern to see measurable impacts of ‘delivery’, which requires short lines of executive accountability. The relationship between service outputs and service outcomes is dealt with later; however, a key component of discourse in the focus on ‘delivery’ has been to devolve responsibility, accountability and managerial capacity to the ‘front-line’. The Gershon Review of efﬁciency in public services examined ‘new ways of providing departments, their agencies and other parts of the public sector with incentives to exploit

60

CHRIS HUSBANDS

opportunities for efﬁciency savings, and so release resources for front-line public service delivery’ (HM Treasury, 2004, p. 3). The DfES, quoting the Prime Minister, has referred to the importance of ‘devolution to front-line professionals, freeing them to innovate and develop services built around the needs of the individual citizen’ (DfES, 2004b). Setting aside the military metaphors, there is an interesting set of debates about where this ‘front line’ is located. If the ‘front line’ is in, schools, is it to be found in the classroom (where operational services are delivered), or the headteacher’s ofﬁce (where services are planned and managed). If it is predominantly around developing ‘services built around the needs of the individual citizen’, then the frontline looks to be located at the point of service commissioning and governance. There is obvious scope for endless variation on this, but the repeated mantra of ‘devolution to front line professionals’ as a theme in government strategy poses questions for evaluators – and practitioners – about where the front line is located and, therefore, what is involved in ‘deliver’. The ﬁnal set of reﬂections relates to issues of method in the evaluation of intervention effects and consequences. Little of this is novel. Children’s trust pathﬁnders, and the construction of children’s trusts in particular, are intended to contribute to transformational change in the lives of children, both through a renewed focus on issues of vulnerability and through connecting issues of provision for the vulnerable to provision for all: the quality of what we do for some children depends on the quality of what we do for all. Benchmarking through performance indicators has been seen as a public sector alternative to the action of market forces (van Helden & Tillema, 2005). At both national and local level, this concern to broker transformational change is routinely expressed in terms of the transformation of quantitative time series indicators. These are complex in themselves, relating to the outcomes of multiple processes. For example, the educational and social attainment of looked after children and the effective operation of child protection arrangements have been pressing – totemic – concerns of those working in and with children’s trusts. Both depend on inter-agency working and referral. There are obvious difﬁculties in relating trends in time series indicators to speciﬁc interventions. Paradoxically, it is easier to track the consequences of deﬁned local interventions in relation to indicators such as these than it is to relate national or local authority trends. It is also possible – again, easier in relation to local interventions than complex interventions – to identify perverse trends in performance indicators. For example, more effective collaborative working in relation to child protection might either reduce referrals (successful interventions heading off referrals

Public Policy Programmes and Evaluation of Children’s Trusts

61

for intensive support) or increase referrals (more effective collaboration promoting greater professional awareness of risk indicators so increasing the likelihood of referrals). It is also the case that trends in child indicators are rarely the result of speciﬁc indicators. The educational attainment of looked-after children, for example, highlighted as a key performance indicator by government as a result of intensive media scrutiny, was on a longterm upward swing nationally – rising before children’s trusts were developed and continuing to rise. However, the mental health of young people – less easily traceable as a national indicator (because no headline time series indicator existed) – was routinely cited as a matter of concern by teachers, headteachers, social workers and community paediatricians, and has become a major focus for local innovation in children’s trust arrangements. The issue here is not whether some indicators are ‘more’ or ‘less’ reliable than others. It is rather that given the range, scope and complexity of children’s trusts as a complex public policy intervention, the range, scope and complexity of the indicator data available poses difﬁcult challenges for both practitioners and evaluators (Boyne & Law, 2005). The identiﬁcation of ‘appropriate’ indicators is difﬁcult: the same indicator moving in different directions may be equally an indicator of success; given the range of indicators available, it may be that this sort of intervention can always be seen as a success (some indicators move in the desired direction) or a failure (some do not), and that ‘failure’ on some dimensions is a pre-requisite for ‘success’ in others (Boyne & Law, 2005). This dysfunctional consequence of the use of complex performance indicators in large public sector organisations has been commented on before (e.g., Smith, 1993), and lines of accountability, already complex in the public sector, are considerably more so in settings of pooled or shared governance (Smith, 1990). Campbell’s Law suggests that ‘the more any quantitative social indicator is used for social decision making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor’ (Campbell, 1977). There is an awkward set of relationships here (Diagram 1) between outputs (organisational outputs) and outcomes (for children and young people) and between the performance of individual organisations and the networked relationships they establish. Given that the desired outcomes of complex interventions are (almost by deﬁnition) difﬁcult to achieve – these are difﬁcult problems to address because they present complex organisational and social challenges – there can be obvious tensions between the priorities set for a partnership organisation such as a Children and Young People’s Strategic Partnership (CYPSP) and the

62

CHRIS HUSBANDS

Diagram 1 Outcomes

Outputs Individual Service

Joint Service Provision

Individual Service

Individual services (e.g. education) focusing on KPIs relevant to service provision (e.g. educational attainment)

Joint Service Provision

Individual service action to address indicators which impact across a range of services 9 e.g. collaborative/ extended services managed from school to address health, social care

Collaborative cross-service action to raise a KPI relevant to one of the services irrespective of impact on other service measures (e.g. educational attainment of speciﬁc group) Integrated governance and delivery arrangements across a range of services to address a range of social indicators across a range of outcome areas

priorities of contributory organisations. A shared commitment to delivering change is not always enough to carry multiple partners through the real challenges of setting budgets, making difﬁcult choices, setting priorities and developing staff. One of the challenges for evaluating complex interventions is therefore to unpack the ways in which different projects have come to their focus on particular indicators, but the pressure from national policy makers are either towards highest proﬁle indicators (such as child protection referrals or the attainment or looked-after children) or to trade off lower proﬁle indicators against higher proﬁle indicators. If policy makers nationally have built structures which enable them to focus the attention of local policy makers and ofﬁcials on the higher proﬁle indicators, then evaluators inevitably occupy the contested territory between them: which indicators? Why? Why not others? The pursuit of which indicators shapes, and perverts, actions most strongly? The national evaluation of children’s trusts, like all complex public policy interventions, poses a series of evaluation challenges, operating in a changing and very ﬂuid conception of the delivery of social change, in a loosely cohered structure in which national initiative shapes but does not determine local action and in which local actions rub against each other. Others have considered the implications for evaluation theory (Sanderson, 2000) but there are also issues, touched on here about underlying governance and

Public Policy Programmes and Evaluation of Children’s Trusts

63

change conceptions, about the role of national and local government in structuring, driving and managing change initiatives: Despite the difﬁculties involved in managing a complex world, governments y should do very much less in terms of detailed, short-term intervention. And they should spend much more time thinking about the overall framework of whatever particular problem is at issue. For it is here that governments have the potential to achieve a great deal. Less can be more. (Ormerod, 1998, pp. 188–189)

Complex public policy interventions represent ambitious attempts by central and local government to develop cross-cutting governance arrangements which address persistent socio-economic challenges. They are typically based on sophisticated analytical engagements with the interaction between agencies for change and social settings. Their intentions – making every child matter, reconﬁguring services around the individual child, ending child poverty – are ambitious if often highly rhetorical. These ambitions demand high connectivity between national policy development and multi-layered implementation arrangements, themselves involving complex interaction between different organisations. Given the scale and complexity of such interventions, they also place high demands on evaluation as a potential source for learning and advanced implementation: these are areas where there are no ‘easy answers’ in either policy terms or in developing local implementation plans. It is easy for evaluators to ﬁnd themselves dealing with ‘part of the picture’, not least since ‘part of the picture’ is what policy developers and implementers themselves see. As a result, evaluation needs to face in at least four different ways: towards central government and its policy makers, who seek to learn from evaluation lessons about ‘what works’, towards local implementers, who want to know ‘what others have done and are doing’ (Coote, Allen, & Woodhead, 2004), towards professional communities who are interested in the relationship between national policy development and local implementation, and towards a wider academic and evaluation community. Of course these audiences overlap, and their interests intersect. The construction of knowledge about ‘what works’ – which itself involves issues of efﬁciencies, input–output relationships and long-term measurable effectiveness may sit at odds with the generation of short-term ‘practical’ knowledge about implementation and ‘what next’. Formative evaluators are familiar with the tensions which can arise in the ﬁeld when actors want immediate feedback and advice, and the conﬂicts of interest which can arise. In the case of complex public policy interventions, one role of evaluators is to support the development of capacity to understand the relationship between the ‘local’,

64

CHRIS HUSBANDS

the ‘short-term’ and the ‘immediate’ with the wider context of issues which are being faced and to enhance local capacity to elaborate sophisticated local solutions. This is not – pace some of the demands of national policy makers – to suppose that evaluation is about the uncovering of ‘best practices’ or to develop templates for future activity, but to see the process of evaluation as a source of deep learning which can connect the local to the global, and the immediate to the strategic.

REFERENCES Barnes, M., Sullivan, H., & Matka, E. (2003). The development of collaborative capacity in health action zones: A final report of the national evaluation. Birmingham: University of Birmingham. Benington, J. (1997). New paradigms and practices for local government. In: S. Kraemer & J. Roberts (Eds), The politics of attachment. London: Free Association Books. Boyne, G. A., & Law, J. (2005). Setting public service outcome targets: Lessons from local public service agreement. Public Money and Management, August, 253–261. Cabinet Ofﬁce. (1999). Modernising government Cm 4310. London: The Stationery Ofﬁce. Campbell, D. T. (1977). Keeping the data honest in the experimenting society. In: H. W. Melton & D. J. H. Watson (Eds), Interdisciplinary dimensions of accounting for social goals and social organizations. Columbus, Ohio: Grid Inc. Chitty, C. (1997). Editorial. Forum, 97(3), 1–6. Coote, A., Allen, J., & Woodhead, D. (2004). Finding out what works: Building knowledge about complex-community based initiatives. London: King’s Fund. DfES. (2004a). Children Act 2004. http://www.opsi.gov.uk/acts/acts2004/20040031.htm DfES. (2004b). Schools-achieving success. http://www.dfes.gov.uk/achievingsuccess/chap1.shtml DH [Department of Health]. (2003). The Victoria Climbie´ Inquiry. http://www.victoria-climbieinquiry.org.uk/ﬁnreport/downloadreport.htm Halpin, D., Power, S., Whitty, G., & Gerwirtz, S. (2004). Area-based approaches to educational regeneration: The case of the English education action zones experiment. Policy Studies, 25(1), 75–85. Hayden, C., & Boaz, A. (2000). Making a difference: Better government for older people evaluation report. Coventry: University of Warwick Local Government Centre. Henkel, M. (1991). Government, evaluation and change. London: Jessica Kingsley. HM Treasury. (2004). Releasing resources to the front line: Independent review of public sector efficiency. http://www.hm-treasury.gov.uk/media/B2C/11/efﬁciency_review120704.pdf Holloway, I. (2000). Is the British state hollowing out? The Political Quarterly, 71, 167–176. National Audit Ofﬁce. (2001). Education action zones: Meeting the challenge: The lessons identified from auditing the first 25 zones. London: NAO. NECT. (2004). Children’s Trusts: Developing integrated services for children in England. National Evaluation of Children’s Trusts, Phase 1 Interim Report, DfES. http:// www.everychildmatters.gov.uk/strategy/childrenstrustpathﬁnders/nationalevaluation/ Ormerod, P. (1998). Butterfly economics: A new general theory of social and economic behaviour. London: Faber and Faber. Parston, G. (1998). What is managing for social result? London: Ofﬁce for Public Management.

Public Policy Programmes and Evaluation of Children’s Trusts

65

Power, M. (1997). From risk society to audit society. Soziale Systeme, 3(1), 3–21. Rhodes, R. A. W. (1997). Understanding governance. Buckingham: Open University Press. Sanderson, I. (2000). Evaluation in complex policy systems. Evaluation, 6(4), 433. Smith, P. (1990). The use of performance indicators in the public sector. Journal of the Royal Statistical Society, 153(1), 53–72. Smith, P. (1993). Outcome-related performance indicators and organizational control in the public sector. British Journal of Management, 4(3), 135–149. Tunstill, J., Alnock, D., Meadows, P., & McLeod, A. (2002). Early experiences of implementing sure start. London: Department for Education and Skills. van Helden, G. J., & Tillema, S. (2005). In search of a benchmarking theory for the public sector. Financial Accountability and Management, 21(3), 337–363. Wilding, P. (1997). Globalisation, regionalisation and social policy. Social Policy and Administration, 31(4), 410–428. Wyman, M. (2001). Thinking about governance. The Commonwealth Foundation Citizens and Governance Programme.

This page intentionally left blank

PROGRAMME EVALUATION IN A DYNAMIC POLICY CONTEXT Paul Mason INTRODUCTION This chapter describes the evaluation of a large complex social change programme in England, UK. During implementation, the programme experienced a series of changes to its form, function and governance, which in turn had impacts upon the practice of evaluation itself. These changes also raised questions about the place of evaluation within a policy context that increasingly focuses upon indicators and outcomes of effectiveness and other features and tensions that characterise the New Public Management. This chapter outlines the programme and its evaluation in this context, in order to be able to explore these impacts and raise questions about how we learn from social change programmes. The Children’s Fund Prevention Programme followed the work of the UK government’s Social Exclusion Unit, established to explore a series of policy issues seen as cutting across spheres of policy and government responsibility. Their report ‘Young People at Risk’ (SEU, 2000) identiﬁed gaps in preventative services for children and young people and argued for a greater emphasis on early intervention, more ﬂexibility on the part of service providers and increased co-ordination of local provision in order to address the complex needs of vulnerable children and young people. This growing political recognition within the UK of the impact of social exclusion on Dilemmas of Engagement: Evaluation and the New Public Management Advances in Program Evaluation, Volume 10, 67–83 Copyright r 2007 by Elsevier Ltd. All rights of reproduction in any form reserved ISSN: 1474-7863/doi:10.1016/S1474-7863(07)10005-3

67

68

PAUL MASON

children’s health, education, and social and emotional well-being has generated a series of cross-cutting initiatives that attempt to address a complicated problem with a complex solution (Mason, Morris, & Smith, 2005). A number of programmes, which share features of Complex Community Initiatives (CCIs) (US) (Connell, Kubisch, Storr, & Weiss, 1995) and Area Based Initiatives (ABIs) (UK) (Smith, 1999), have sought to explore new ways of working and delivering services. CCIs and ABIs are complex multi-layered initiatives and programmes that seek change across a number of dimensions: individual, for example in health or education outcomes; populations, for example high incidence of anti-social behaviour within a neighbourhood; communities, for example an historic lack of service access or engagement amongst particular communities or geography or identity; services, for example in developing services more responsive to users’ needs; and systems, for example in the way in which agencies work together to identify or respond to need (adapted from Barnes, Matka, & Sullivan, 2003). New Deal for Communities (NDC) – an area-based initiative for deprived neighbourhoods (ODPM, 2005); Sure Start – a programme of neighbourhood based early years services for families (DfES, 2000); and Health Action Zones – a programme to address the determinants of health inequalities (Bauld & Judge, 2002), are all examples of this policy agenda. The establishment of the Children’s Fund was part of this broader programme of change initiated by the Labour Government, upon coming to power in 1997, which sought to develop new ways of working between government departments at the national level, and agencies and organisations developing and delivering services at the local and operational level (Cook, 2006). Within this broad ‘modernisation’ agenda (Newman, 2002) we can see features of the ‘New Public Management’ (NPM) (Ferlie, Ashburner, Fitzgerald, & Pettigrew, 1996; Hood, 1991; Pollitt, 1990). As we shall see in the discussion below, Children’s Fund was organised around a broad centralised vision, delivered through local delivery partnerships formally constituted as Strategic Boards. Local partnerships set their own targets within the broad framework provided by Children’s Fund guidance (CYPU, 2001), yet central monitoring and accountability regimes were established. Partnerships were expected to commission local services from a range of providers, with contracts ensuring delivery against local targets. User involvement was a central theme, being seen as necessary in the

Programme Evaluation in a Dynamic Policy Context

69

development and delivery of effective services. The initiative was managed by a new governance unit, the Children and Young People’s Unit (CYPU), and was expected to develop new ways of working with learning from this informing future child and family welfare policy. Thus governance and policy were explicitly linked.

THE CHILDREN’S FUND The Children’s Fund initiative was launched, following an announcement by the Chancellor of the Exchequer in 2000, by a newly established crossgovernment policy unit, the CYPU, and guidance was issued for the development of local Children’s Fund Programmes (CYPU, 2001). Each of the 150 Local Authority areas in England (in 149 partnership arrangements) were required to produce a Children’s Fund strategic programme and establish a multi-agency Partnership Board to develop and oversee its delivery. The aim of the programme was explicitly linked to social exclusion by the overarching objective: To provide additional resources over and above those provided through mainstream statutory funding, speciﬁc programmes and though speciﬁc earmarked funding streams. It should engage and support voluntary and community organisations in playing an active part and should enable the full range of services to work together to help children overcome poverty and disadvantage. (CYPU, 2001, p. 6)

This was underpinned by seven sub-objectives that encouraged local Children’s Fund partnerships to focus on effective collaborative working to address needs linked to education, health and crime. Partnerships were also expected to enter into a ‘strong, continuous, ongoing dialogue’ (CYPU, 2001, p. 13) with children, families and their communities in order to facilitate their participation in the development, design and delivery of Children’s Fund Programmes and services and there was a sub-objective explicitly linked to this user involvement. A sub-objective stated that services should be ‘experienced as effective’ by users (ibid., p. 15), and a ﬁnal sub-objective recognised the need to develop capacity within local services and agencies to deliver new ways of working. The central theme of the Fund was ‘prevention’: that social exclusion can only be reduced effectively by addressing problems before they become acute or established, and that over time funding should move from crisis to preventative services. A mapping of the Children’s Fund revealed that in planning services Partnerships developed programmes of

70

PAUL MASON

services that targeted geographical neighbourhoods, areas and communities but also particular target groups (NECF, 2004a). Local programmes were funded in three ‘Waves’, based upon a central assessment of levels of deprivation and need in each local authority according to a measure of the number of children in poverty and the waves were phased so that areas of most need received funding ﬁrst. Consequently, in many cases, programmes funded in the third wave, where need is lower than in the ﬁrst two waves, are working with much lower budgets than those funded earlier. The ﬁrst wave of programmes was funded in January 2001, ‘Wave Two’ started in February 2002 and ‘Wave Three’ in December 2002. Local Partnerships developed initial 3-year plans; the intention was for a 7-year programme ending in 2008. Funding was distributed on an annual basis, providing from the outset some uncertainty about the stability of the programme. The strategic plans submitted outlined the demographic features of the area, structures in place for the delivery of the programme, details of interagency collaboration including capacity building with voluntary and community groups and evidence of consultation with children and young people. They also provided information on intended strategies for the prevention of social exclusion and the participation of children, young people and their families in the development of provision. This approach was key to ensuring that local plans addressed local need, and thus there is a great deal of variety across the Children’s Fund programmes. Partnerships were also given ﬁnancial ﬂexibilities that allowed them to allocate their funding across the whole of the programme rather than equally across each year of operation. This enabled Partnerships to develop services, strategic and operational partnerships, over time. Children’s Fund programmes were to explore new ways of working, to try innovative methods and interventions, and to establish what is effective in providing joined-up support for multi-faceted problems. In this respect, the Children’s Fund was an opportunity to pilot new ways of working so that over time national policy and local mainstream provision would change in emphasis and priority.

THE NATIONAL EVALUATION OF THE CHILDREN’S FUND The National Evaluation of the Children’s Fund (NECF) was commissioned by CYPU through an open tendering process. The ﬁrst stage of the process was the commissioning of a feasibility study, in 2002, which was

Programme Evaluation in a Dynamic Policy Context

71

undertaken by a team who went on to conduct the evaluation itself. The feasibility study explored options for the structure and purpose of the evaluation, and favoured a model employing a qualitative strand to understand issues of process and that would fulﬁl a formative function, informing the development of policy and practice, and an impact strand that would establish outcomes for children and families. The feasibility study explored the structure and function of other national evaluations, and highlighted the need for an approach that brought evaluation practice and evidence together across government (CYPU, 2002). This study formed the basis of the tender for a discreet national evaluation of the Children’s Fund, and the contract was awarded in early 2003. NECF was funded for an initial 3-year period, but the design was coterminous with the intended life of the programme itself and thus ran from 2003 until 2008. The evaluation was to be reviewed in 2006, providing a mid-point within the structure for reﬂection and continuation beyond this date was, whilst intended, not guaranteed. The evaluation team established was multi-disciplinary, reﬂecting the nature of the initiative itself. The qualitative strand was undertaken by a team comprised of researchers and practitioners from the disciplines of social policy, social work, education, health and economics. The quantitative strand, based in a separate institution, was developed by statisticians with backgrounds in applied education and longitudinal impact research. The qualitative strand employed two complementary frameworks for two elements of the evaluation. Activity Theory (Engestro¨m, Miettinen, & Punamaki, 1999) was used for time-limited case-studies with a sample of partnerships from across the three Children’s Fund ‘waves’ to explore the processes of multiagency working. The sites were selected to reﬂect key dimensions such as size of funding allocation, geographical locale and characteristics, the ability to link data with the strand assessing impact, and a classiﬁcation of their Partnership’s structure. Activity Theory was employed to examine the focus of the activities of strategic partnerships and the operational workings of service providers and to relate these to outcomes, by establishing the ‘object’ upon which activity systems are acting. Activity Theory requires us to explore the activity system in a structured way: who is involved in the activity system; the roles different stakeholders play; the ‘rules’ within which activity takes place; and the ‘tools’ which are used to effect change or action. In the context of NECF, examples of activity systems are both strategic partnership structures and individual services, as well as the structures that link and include them, and examples of an object would include improved multi-agency working as well as particular outcomes for children such as reduced crime. Also of central importance is the local context within which services are developed and

72

PAUL MASON

delivered, for example, the histories of organisations and of their relationships with those they are working with to deliver a service. The second stage of NECF (2006–2008) would revisit some of these partnerships whilst expanding to include new ones. The second framework employed was for longitudinal research focusing upon Partnerships’ work with particular target groups. The work of two partnerships (to be expanded beyond 2006) was explored to understand different approaches to identifying and addressing the needs of groups that were the focus of local Children’s Fund strategies: black and minority ethnic children and families; disabled children; children involved in crime and antisocial behaviour; and refugee and asylum seeker children and families. A consortium of partnerships working with Gypsy/Traveller communities was the ﬁnal focus for this work. The Theories of Change (ToC) approach involves working with those responsible for the development and delivery of a programme to establish the theory that lies behind the work taking place; this provides a framework for the evaluation. In producing a ToC for an initiative, stakeholders make explicit their theories of what (outcome) they hope to achieve (in the long, medium and short term), how (action) they expect to achieve them and why the proposed actions should deliver intended outcomes (rationale). This approach enables us to understand what programmes and services aim to achieve in the short term if they are to achieve their long term objectives, and to establish the extent to which they are able to achieve this over the course of the evaluation. There is also an emphasis on understanding the context within which the initiative operates, as this context may change and may impact upon activities and therefore outcomes (Connell & Kubisch, 1998). Once this framework has been established it guides the evaluation activity: the questions that are asked and the issues that are explored (see NECF, 2004a, 2006 for further details of these two frameworks). The quantitative strand used a number of existing data sources to estimate the impacts for children and families accessing Children’s Fund services. This was a deliberate decision to achieve a light touch evaluation which placed minimal burden on those managing and delivering the services, who had the existing centrally linked performance management and accountability systems. Rather than commissioning a longitudinal survey, with all the attendant costs and resource implications, modules were added to the Millennium Cohort Study (MCS) (see NECF, 2006 for more details), which shared an institutional base with this research team. This survey with families across England began in 2000, to explore from birth aspects of children and families’ lives and takes place in ‘sweeps’ across the sample at

Programme Evaluation in a Dynamic Policy Context

73

2-year intervals. As this survey was already taking place in homes and gathering data about families, where those families had children within the Children’s Fund age range (5–13) additional questions were asked about their service use, their use of local Children’s Fund services, and a range of questions explored aspects related directly to the objectives of the Children’s Fund. Children aged 10 years and over were asked to complete a short survey or their own. School-based education databases were also drawn upon to explore the impact of school-based services. And the Family and Children Study (FACS) was used to compare geographical areas receiving Children’s Fund services with those that did not. The sweeps of MCS in 2003/2004 and 2005/2006 would be used to describe service users, then assess impact from service use (for more details see NECF, 2004a, 2006). A ﬁnal and key element of the evaluation was the development of a ‘knowledge management strategy’. This committed NECF to the production of early and ongoing learning for practitioners and local partnerships as well as central government. Evidence was produced at regular intervals throughout the evaluation so that it could be used formatively and enable stakeholders in the Children’s Fund to respond to policy developments and learn from practice.

CHANGING POLICY, CHANGING GOVERNANCE Children, families and communities are arguably at the heart of social policies. Indeed, the CYPU was established to work across government and the design of the Children’s Fund itself highlighted the multi-faceted nature of problems and possible solutions. As the New Labour Government sought to develop policy around social exclusion and pursued a ‘modernisation agenda’, it was recognised within the design of NECF that Children’s Funds would operate in a dynamic policy context. Nonetheless, a number of changes to policy and governance impacted upon the initiative and the evaluation to fundamentally alter the initiative, the context within which it was developing new ways of working and the place of learning from the evaluation itself. Changes to the Children’s Fund One sub-objective for Children’s Fund programmes to address concerned the reduction of involvement in crime and anti-social behaviour of 5- to 13-year-old children and young people. Against a backdrop of media and popular concern over rising crime and government conjecture about the need

74

PAUL MASON

to address ‘yob culture’, the Treasury’s 2002 Spending Review (a UK government bi-annual review of spending and priorities) agreed the allocation for Children’s Fund that was intended, but required that 25% of the allocation for each local programme be spent on crime reduction initiatives developed by the Youth Justice Board (YJB). The YJB is the government agency that oversees the youth justice system in England and Wales, and had developed a ‘menu’ of evidence-based interventions that were presented in guidance issued in 2003. Drawing on studies from the US and UK, the services developed by YJB work within a risk and protection paradigm that identiﬁes risk factors for children and young people and the protective factors that can mitigate against them. Each of the interventions from the menu was placed within this model, although there was a ﬁnal option for ‘innovative’ practice. This stipulation from central government came at a time when most local programmes had already allocated their funding, although those from Wave 3 had yet to become established. Money had to be taken from existing services through cuts to their funding that reduced or withdrew it altogether, or through the cancelling of contracts for/or of services that had not yet been commissioned or were not yet in place, effecting the integrity of local programmes and undermining the principle of local autonomy. The proportion of the allocation, being a quarter of total funds, also weighted programmes towards crime prevention and a narrow menu of interventions. This requirement brought new stakeholders into the Children’s Fund partnerships, requiring local Boards to reconsider their deﬁnitions of prevention and their strategies for achieving change. Later in 2003, CYPU announced that the funding to the Children’s Fund programme was to be reviewed, as partnerships had large amounts of their allocations that had not yet been awarded. This ‘under-spend’ was identiﬁed as a resource that could be spent by government elsewhere to support their (changing and developing) targets for children and families. Yet this underspend was the result of the ﬁnancial planning undertaken by partnerships in developing their programmes over time in line with the original guidance framework. Participative work with communities of both geography and identity, who were often the most marginalised and disadvantaged, was developed over time in order that it was meaningful. In developing effective local services, partnerships took time to develop in themselves, time to develop ways of identifying need and, crucially, of commissioning and contracting services. In applying for local funding, organisations and agencies needed time to build their capacity, particularly where they were intending to work in new multi-agency partnerships. Partnership Boards had used the permitted ﬁnancial ﬂexibilities to design their programmes over time, rather

Programme Evaluation in a Dynamic Policy Context

75

than commission services too eagerly, yet these programmes were again under threat as most were only partially in place. In the December of that year, following pressure from Children’s Fund stakeholders across the local programmes, as well as national non-proﬁt organisations involved at a local level and bodies representing local government bodies, the previously indicated cuts across the programme were revised, but there was no commitment to funding levels and the initiative itself was seen as under threat (see ‘changing policy’ below). There was uncertainty about the levels of allocation until Spring 2004 when revised annual awards were announced. These entailed a cut for programmes, but were coupled with a commitment to continue the programme until 2006. Financial ﬂexibilities were removed. In the autumn of 2004, it was announced that funding would be provided for Children’s Fund programmes for the period 2005–2008, and partnerships were required to develop new strategic plans (DfES, 2004a). Some ﬁnancial ﬂexibilities were restored but allocations were reduced, requiring further cuts and re-proﬁling of strategic programmes. Both of these developments had signiﬁcant impacts upon the national Children’s Fund initiative and its local programmes. They altered the strategic programmes developed by local partnerships to meet multi-faceted needs in a joined-up way by altering their focus and impeding their intended design. It also created a great deal of uncertainty as to the future of the initiative for those directly involved in Children’s Fund programmes and partnerships, and for local agencies and organisations planning to deliver services. Both strategic planning and service development and delivery in such a context is difﬁcult, and attentions and energies are directed elsewhere in managing change and uncertainty.

Changing Policy Context The Children’s Fund was always intended to explore new ways of working and to identify effective practice for policy makers locally and nationally. But this policy context was itself radically altered in 2003 with the publication of a government Green Paper1 proposing new structures for the planning and delivery of services for children and families and new outcome areas for all services to address under the heading ‘Every Child Matters’ (DfES, 2003). It was also announced that all local authority areas would be required to establish multi-agency ‘Children’s Trusts’, to develop and commission collaborative services to address needs in a joined-up way, and that

76

PAUL MASON

these should be in place by 2006. The Green Paper was further developed and became legislation the following year with the Children Act 2004 (HMSO, 2004). Whilst the requirement to develop a Children’s Trust was revised to 2008 following consultation, the expectation that equivalent collaborative arrangements would be developed within local authorities and areas was maintained. The Act requires all local authorities to bring the planning of children’s services together in a way that echoes the intentions and practices of the Children’s Fund; but early documentation made little mention either of the programme or of the relationship between programmes and Trusts locally. Compounding the uncertainty around funding, Children’s Fund partnerships felt that their work to change ways of working was again under threat, in that there was no clear intention to learn from it. There was no single model for Trusts proposed, and stakeholders in children’s services within local authority areas began to concentrate their efforts on the local Trust arrangements, taking strategic and operational attention away from Children’s Fund programmes. This ‘Every Child Matters’ (ECM) agenda (DfES, 2004b) requires Trusts to commission services that address ﬁve outcome areas: being healthy; staying safe; enjoying and achieving; making a positive contribution; and economic well-being. The award of Children’s Fund funding for 2005–2008 stipulated that programmes should identify how appropriate and effective Children’s Fund services would be continued by Trusts, and Children’s Fund programmes and services thus needed to demonstrate their effectiveness within this new agenda. The outcomes have a framework for assessment, which includes a range of indicators that services must use to demonstrate impact, and a new regime of local inspection to ensure local authorities and their Trusts are delivering appropriate services. The ﬁrst Guidance to Children’s Fund’s (CYPU, 2001) had emphasised the need for monitoring and evaluation of services, but indicators were not provided and partnerships spent time developing their own systems and procedures to meet more detailed local requirements arrived at within the broad guidance. The only central monitoring required in a standard form across the programme was basic data about service use, and ﬁnancial data relating to spend. When the ﬁve ECM outcome areas were announced, Children’s Fund partnerships and their contracted services began to ﬁnd ways of demonstrating their achievements; but they were not designed to meet them as a priori objectives. Again, Children’s Fund partnerships perceived their position as one increasingly isolated from this new policy context, despite the broad similarity in achieving new ways of working for children’s services.

Programme Evaluation in a Dynamic Policy Context

77

Changing Governance The new outcome and indicator framework and development of Children’s Trusts built upon other changes to the governance of Children’s Fund programmes. The central change was the closure of the CYPU and the reorganisation of the regional teams that supported local programmes. The management of the Children’s Fund moved to a new Children, Young People and Families Directorate (CYPFD) based within the Department for Education and Skills (DfES) during 2003, meaning that those involved in the development of the initiative were lost within new arrangements and new responsibilities tied to ‘Every Child Matters’ and the ﬁve outcome areas. The symbolism of this was not lost on Children’s Fund partnerships and programmes who perceived this as another part of the cumulative loss of commitment to the programme within government. One programme was being replaced with another, with tighter central accountability and more precisely deﬁned central targets and attendant indicators.

IMPACTS FOR NECF These changes also impacted upon NECF and it is worth outlining these before we reﬂect on the implications for evaluation and learning from social programmes. The uncertainty around the initiative also applied to NECF; as well as coming to end were the initiative to, during that period there was uncertainty about the funding committed and therefore to the design and application of the evaluation. In 2005 it was decided that although the Children’s Fund would continue until 2008, the evaluation would end in 2006. There could therefore only be partial learning: the design of the evaluation was compromised and the experiences of Children’s Funds beyond 2006 would be lost. In particular, the second sweep of MCS that would begin to understand impact did not take place. With the loss of the CYPU, new relationships needed to be developed with those within government with new responsibility for the Children’s Fund. Those who had commissioned the evaluation and were familiar with its aims and design moved on, as had those with responsibility for policy development who had designed the initiative and were committed to it. The new agenda became the focus for the CYPFD, and the lack of explicit links between the Children’s Fund and the emergent arrangements led to uncertainty about where learning from the evaluation could be placed. The

78

PAUL MASON

highly theoretical nature of NECF focused upon the Children’s Fund as originally designed and did not explore the ECM ﬁve outcome areas. The uncertainty experienced by the initiative, whilst providing learning in itself (and we return to this below) impacted upon the practice of the evaluation. Those involved in Children’s Fund partnerships, staff within programme management teams, those involved in delivering services, were all experiencing uncertainty about their futures and about the role of the Children’s Fund programme. Some stakeholders were reluctant to engage with the evaluation as they saw it as an added burden on their time when they had to address the implications of cuts to budgets or services. Some strategic players moved their attentions to Children’s Trusts, seeing the Children’s Fund as increasingly irrelevant. And many stakeholders, particularly those working in front-line services, looked to the evaluation team to advocate on their behalf. The environment within which the research took place altered dramatically. In particular, programmes and services struggled to meet the new accountability agenda that they now needed to address and looked to the evaluation team to provide guidance and expertise; yet the evaluation was not designed to perform this function. This focus upon performance management and inspection, reliant upon indicators of outcomes, takes programmes and services away from learning and towards accountability and demonstration of effectiveness as deﬁned by those outcome areas (a feature discussed in detail elsewhere in this volume (Dahler-Larsen, 2007).

LEARNING AND EVALUATION In this chapter we have seen the experience of a major UK government initiative and how changes to its form, function and to the context within which it operates have also impacted upon its evaluation. This complicated story has lessons for evaluation beyond the stakeholders included in the discussion above. The changes to policy and the development of a new agenda for children’s services have taken place without reference to the learning from the Children’s Fund, despite its central aim of testing new ways of working to meet complex problems. Mention of the Children’s Fund was conspicuous by its absence from early documentation referring to ‘Every Child Matters’ (e.g. DfES, 2003, 2004b). The changes to policy were driven, in part, by the Laming Enquiry (Laming, 2003) following the death of Victoria Climbie. The case of this young girl, who died from the abuse by her carers that agencies had repeatedly failed to recognise and address, caused widespread

Programme Evaluation in a Dynamic Policy Context

79

anger in the UK and prompted the government to commission an independent review of child protection. A key recommendation was greater co-ordination of children’s services and greater accountability, yet this recommendation had already been one contained in the early cross-cutting report which informed the creation of the Children’s Fund (SEU, 2000). Despite an explicit commitment by the New Labour government to ‘evidence-based policy’, a study of policy development in the UK highlighted the lack of consideration of evaluation evidence in the face of political pressures and an approach to policy-making that has a: Basis of informed guess work and expert hunches, enriched by some evidence and driven by political and other imperatives. (Coote, Allen, & Woodhead, 2004, p. xi)

In some respects the Children’s Fund was a pilot programme, testing new ways of working with freedoms to be innovative; but it was not preceded by a pilot from which Children’s Funds could learn, nor did it have the formal status which may protect a pilot programme. There was therefore no formal structure for the learning from Children’s Funds to be placed within, as the CYPU was closed and policy priorities changed without reference to it. Yet a review of UK pilot programmes within government found that they too are rarely allowed to run their course, in the face of political timetables that do not allow for policy to be trialled and tested (Cabinet Ofﬁce, 2003). One of the key social scientists in the development of the Sure Start initiative has claimed that it has been abolished in all but name since it was expanded and changed from its original neighbourhood family support focus to a concern with the provision of affordable childcare for parents in work, education or training (Glass, 2005). An early publication from NECF (2004b) highlighted messages for policy makers across government at all levels and which have been further elaborated here: the destabilising effects of rule changes and funding uncertainty; the changes in strategies and in the nature of leadership required as initiatives like this mature; and the importance of systems to enable bottom-up learning. All of these were lessons from the National Evaluation of Health Action Zones (Bauld & Judge, 2002), and reﬂect dominant themes from the literature relating to strategic and operational partnerships (Sullivan & Skelcher, 2002) and community development and user participation (Beresford & Hoban, 2005). Yet in a context for policy making where evidence competes with other requirements, such learning appears to be lost. NECF brought all the national evaluations working with disadvantaged communities and children and young people together for a seminar to share evaluation designs and ﬁndings. This was the ﬁrst time such

80

PAUL MASON

a meeting had taken place, suggesting a lack of co-ordination across government to link and learn from the evaluations different departments and units commission (Coote et al., 2004). Local Children’s Fund programmes were initially required to commission their own local evaluations, but this stipulation was dropped with the cuts to funding in 2004. Where local evaluations have taken place, local experience has been captured, although their form and quality has varied widely (Spicer & Smith, 2006). With the changing policy context and the requirement to demonstrate effectiveness in addressing the ﬁve ECM outcome areas, local attention has now become focused on those outcome indicators. Children’s Fund speciﬁc learning is therefore in danger of being lost, as funded services seek to demonstrate their effectiveness in the hope of securing their future within the new (Trust) commissioning arrangements. The learning about multi-agency working, partnerships, participation and from a strategic approach to early intervention preventative services may not be relevant in centrally driven wholesale change to structures for service planning and development. The changes to the initiative have moved it away from the development of strategy to the commissioning of services and it is in this way that the integrity of programmes was undermined. Children’s Fund was intended to provide lessons for mainstream agencies and services, yet there has not been time for the lessons about preventative services to emerge and there is a danger that statutory sector driven Trusts will be unable to move away from their traditional focus on acute and crisis services. There is a danger that this framework will lead to a focus on a narrow range of key outcome areas in the way that the (NPM-characterised) reorganisation of the NHS has done (Ferlie & Fitzgerald, 2002); Trusts are not required to develop a preventative programme in the way that Children’s Funds were and local authorities are increasingly tied to local service agreements that cut across areas of policy and service delivery and development, and that specify local outcomes of central government’s broad targets.

LEARNING FROM SOCIAL PROGRAMMES We have seen through our discussion the precarious place of evaluation practice and evidence within policy development, and we have related this to a dynamic context characterised by features of NPM. NECF was a substantial evaluation project, yet this investment did not secure its place within the policy process. Evaluation evidence is one consideration, but it competes with others particularly where the political environment is highly charged.

Programme Evaluation in a Dynamic Policy Context

81

Whilst some authors have pointed to the differences between a NPM model and New Labours’ ‘modernisation’ agenda (Newman, 2002), it may be more appropriate to view the CF as a NPM hybrid (Christensen & Lægrad, 2003) in that a managed balance was aimed for between central targets and devolved responsibilities but which for much of its life has existed in a state of ﬂux. A tension inherent within NPM models – giving greater freedoms whilst increasing accountability – was exposed when the performance management structures in place failed to capture the developmental work local partnerships undertook. This lack of measurable, or measured, outcome or output led to the restructuring of and threats to the programme itself, irrespective of messages from evaluation. This in turn collided with political concerns around children both as victims (the Laming Enquiry) and as a threat (youth crime), bringing more centrally deﬁned objectives and requirements, and ultimately a new national programme with higher central control, more clearly deﬁned objectives (although still in many respects ambiguous for local programmes) and a much higher degree of accountability and control. The lack of reference to learning from the Children’s Fund further demonstrated a new and more centrally determined framework. Such structures can place restrictions on local practice, despite a rhetoric of local governance. A changing focus upon different indicators and outcomes can limit the learning from social programmes as attention is drawn to accountability and performance management functions and away from a reﬂection upon the processes that lie behind their achievement or failure in outcome terms. Evaluation is essential if we are to learn from social programmes, but it requires sustained commitment, as do the programmes themselves. Within a NPM paradigm the place for learning can be lost and evaluation may ﬁnd the demands placed upon it move towards audit and accountability, or it may ﬁnd itself vulnerable within a dynamic policy context that has central, political, concerns at its core; and a danger of a move away from evidence-based policy to what Cook (2006) refers to as ‘policy-based evidence-making’ (p. 112).

NOTE 1. In the UK political and legislative system, a Green Paper is an initial policy document issued to outline the government’s intentions for legislation for a particular policy area. This is followed by a White Paper detailing the planned legislation, which subsequently informs the Act put before Parliament for debate and ﬁnal agreement before passing into legislation.

82

PAUL MASON

REFERENCES Barnes, M., Matka, E., & Sullivan, H. (2003). Evidence, understanding and complexity: Evaluation in non-linear systems. Evaluation, 9(3), 265–284. Bauld, L., & Judge, K. (2002). Learning from Health Action Zones. Chichester: Aeneas. Beresford, P., & Hoban, M. (2005). Participation in anti-poverty and regeneration work and research: Overcoming barriers and creating opportunities. York: JRF. Cabinet Ofﬁce. (2003). Trying it out: The role of ‘Pilots’ in policy-making. Report of a Review of Government Pilots. London: Cabinet Ofﬁce. Children and Young People’s Unit. (2001). Children’s Fund Guidance. London: DfES. Children and Young People’s Unit. (2002). The Children’s Fund National Evaluation: A feasibility study. London: DfES. Christensen, T., & Lægrad, P. (2003). Transforming governance in the new millennium. In: T. Christensen & P. Lægrad (Eds), New public management: The transformation of ideas and practice. London: Ashgate. Connell, J. P., Kubisch, A. C., Storr, L. B., & Weiss, C. H. (1995). New approaches to community initiatives: Concepts, methods and contexts (Vol. 1). Washington DC: The Aspen Institute. Connell, J. P., & Kubisch, A. C. (1998). Applying a theory of change approach to the evaluation of comprehensive community initiatives: Progress, prospects, and problems. In: K. Fulbirght-Anderson, A. C. Kubrisch & J. P. Connell (Eds), New approaches to evaluating community initiatives: Theory, measurement and analysis (Vol. 2). Washington DC: The Aspen Institute. Cook, D. (2006). Criminal and social justice. London: Sage. Coote, A., Allen, J., & Woodhead, D. (2004). Finding out what works: Understanding complex, community-based initiatives. London: Kings Fund. Department for Education and Skills. (2000). SureStart: A guide for second wave programmes. London: DfES. Department for Education and Skills. (2003). Every child matters. London: DfES. Department for Education and Skills. (2004a). Developing preventative services: Children’s Fund Strategic Plan Guidance 2005–2008. London: DfES. Department for Education and Skills. (2004b). Every child matters: Next steps. London: DfES. Engestro¨m, Y., Miettinen, R., & Punamaki, R. (Eds). (1999). Perspectives on activity theory. Cambridge: Cambridge University Press. Ferlie, E., Ashburner, L., Fitzgerald, L., & Pettigrew, A. (1996). The new public management in action. Oxford: Oxford University Press. Ferlie, E., & Fitzgerald, L. (2002). The sustainability of the new public management in the UK. In: K. McLaughlin, S. P. Osborne & E. Ferlie (Eds), New public management: Current trends and future prospects. London: Routledge. Glass, N. (2005). Surely some mistake? The Guardian, Wednesday, January 5, 2005. HMSO. (2004). The Children Act 2004. London: TSO. Hood, C. (1991). A public management for all seasons? Public Administration, 69(Spring), 3–19. Laming. (2003). The Victoria Climbie Enquiry. London: TSO. Mason, P., Morris, K., & Smith, P. (2005). A complex solution to a complex problem? Early messages from the National Evaluation of the Children’s Fund Prevention Programme. Children and Society, 19, 131–143.

Programme Evaluation in a Dynamic Policy Context

83

National Evaluation of the Children’s Fund. (2004a). Developing collaboration in preventative services for children and young people: The National Evaluation of the Children’s Fund First Annual Report 2004. London: DfES. National Evaluation of the Children’s Fund. (2004b). Summary of key learning points from NECF reports on participation, prevention and multi-agency working. London: DfES. National Evaluation of the Children’s Fund. (2006). Working to prevent the social exclusion of children and young people: Final lessons from the National Evaluation of the Children’s Fund. London: DfES. Newman, J. (2002). The new public management, modernization and institutional change: Disruptions, disjunctures and dilemmas. In: K. McLaughlin, S. P. Osborne & E. Ferlie (Eds), New public management: Current trends and future prospects. London: Routledge. Ofﬁce of the Deputy Prime Minister. (2005). New deal for Com 2001–2005: An interim evaluation (Research Report 17). London: ODPM. Pollitt, C. (1990). The new managerialism and the public services: The Angli American experience. Oxford: Blackwell. Smith, G. (1999). Area-based initiatives: The rationale and options for area targeting: Case Paper 25. London: Centre for Analysis of Social Exclusion. Social Exclusion Unit. (2000). Report of Policy Action Team 12: Young people. London: SEU. Spicer, N., & Smith, P. (2006). Local evaluation of the Children’s Fund initiative: Opportunities, challenges and prospects. London: DfES. Sullivan, H., & Skelcher, C. (2002). Working across boundaries: Partnerships in the public sector. Basingstoke: Palgrave-Macmillan.

This page intentionally left blank

SCHOOL SELF-EVALUATION Ron Ritchie 1. INTRODUCTION In England, systematic school self-evaluation (SSE) began as model for school improvement. However, since 2000, and in the context of increasing moves toward ‘new public management’, it has become a policy priority for the Government which is inextricably linked to the inspection regime and risks becoming more closely associated with accountability than improvement. Such policies can, paradoxically, compromise school improvement (Fuhrman, 1993). This chapter will explore ways in which teachers and school leaders have attempted to engage in school self-evaluation in ways that allow them to maintain control of the evaluation agenda and retain the focus on improvement and pupils’ learning. It illustrates attempts by schools to engage with and reduce the negative side-effects of target-driven policies and the drive towards competition between schools. In particular, it will consider the beneﬁts and challenges of inter-school collaboration through networking. It should certainly not be read as an endorsement of current policy developments in England and does not seek, as other chapters do in this volume, to address the key question of what kind of society we want? It offers a pragmatic exploration of some schools’ creative responses to the context in which they currently operate aimed at being more conducive to public rather than private good, that is meeting the needs of the ‘knowledge society’ as opposed to the ‘knowledge economy’ (Hargreaves, 2003).

Dilemmas of Engagement: Evaluation and the New Public Management Advances in Program Evaluation, Volume 10, 85–101 Copyright r 2007 by Elsevier Ltd. All rights of reproduction in any form reserved ISSN: 1474-7863/doi:10.1016/S1474-7863(07)10006-5

85

86

RON RITCHIE

2. NATURE AND HISTORY OF SCHOOL SELF-EVALUATION AND ITS RELATIONSHIP TO SCHOOL IMPROVEMENT AND EFFECTIVENESS The idea that headteachers and teachers in schools engage in aspects of selfevaluation (or self-review) to question whether they have been successful in achieving their goals is not new, but historically such activities have often not been systematic and not based on an explicit evidence base. Indeed, previous generations of school leaders have apparently accepted external evaluators (national or local inspectors) as more obvious ‘judges’ of their endeavours. The notion of self-evaluation, as a process to complement external evaluation, has gone in and out of favour over the last 30 years but has become widely accepted as valid in the last decade (MacBeath & McGlynn, 2002) and the importance of schools ‘speaking for themselves’ emphasised (MacBeath, 1999). However, evaluation for its own sake, or merely to inform others, has limited value – the more signiﬁcant purposes of self-evaluation relate to how such self-critical questioning of practice and outcomes can inform decisions about how the situation can be improved for the beneﬁt of pupils. The desire for constant improvement, in terms of pupil outcomes and the quality of provision, has been more explicit in recent years as governments (in England) have sought to ensure schools are accountable for their performance. School improvement is being used in this chapter as a label for the processes by which schools seek to become more successful at their core purpose of educating pupils. ‘School improvement is about raising student achievement through enhancing the teaching and learning process and the conditions that support it’ (Hopkins, Ainscow, & West, 1994). Of course, the nature of achievement and how it can be ‘measured’ is contested. There is an extensive research base and literature related to processes of school improvement (Hopkins & Reynolds, 2001). This contrasts with the related ﬁeld of school effectiveness, which is less concerned with how schools improve and more concerned with what makes schools successful. Effectiveness is usually associated with student outcomes and a more effective institution is typically deﬁned as one whose students make greater progress over time than comparable students in comparable institutions (Hopkins et al., 1994; Reynolds et al., 1996; Sammons, 1999). School effectiveness research seeks to identify factors that lead to successful schools rather than understanding how schools might become more effective. School self-evaluation has a close relationship with school improvement (Davies & Rudd, 2001) and the latter

School Self-Evaluation

87

could, to some extent, be regarded as dependent on the former. Additionally, self-evaluation, for example in terms of foci, can usefully be informed by the ﬁndings of school effectiveness research, but the links are more tenuous than with improvement. The history of school self-evaluation in England is complex (NCSL, 2005) and begins in the 1970s when tools for self-evaluation were ﬁrst developed. However, it was in the mid-1990s that SSE saw rapid growth as local education authorities developed their own approaches (ibid., p. 10). By the end of the 90s, the governmental organisation responsible for inspecting schools in England, the Ofﬁce for Standards in Education (Ofsted), was encouraging more consistent approaches (Ofsted, 1998). Additionally, the range of comparative data available to schools for SSE, from central sources, was extended through, for example, Ofsted’s Performance and Assessment Reports (PANDAs) provided for individual schools. Over the last 30 years, numerous frameworks and tools for self-evaluation have been developed and evaluated in England and beyond (for a comparison of school self-evaluation across 18 European countries see MacBeath, Schratz, Meuret, & Jakobsen, 2001). However, externally provided frameworks can reduce the ownership of leaders and teachers of the processes of self-evaluation and potentially weaken the focus on pupils’ learning and classroom processes.

3. THE CURRENT POLICY CONTEXT IN ENGLAND AND THE LINK TO INSPECTION Self-evaluation has been an explicit dimension of inspection in England since 1999 (Ofsted, 1999). Ofsted has more recently provided guidance for schools to clarify the relationship between self-evaluation and school improvement under what is called the New Relationship with Schools (Ofsted, 2005a). This guidance is also intended to clarify the role of SSE within the new framework for inspection introduced in 2005. This gives even greater signiﬁcance to SSE and extends the formalisation of it through the provision of an extensive ‘Self-evaluation form’ (SEF), of some 35 pages, which schools are required to complete as a ‘summative document, intended to record the outcomes of y (an) ongoing process of rigorous self-evaluation’ (Ofsted, 2005b, p. 3). The guidance establishes four principles: SSE should not be undertaken solely for the purpose of inspection; schools should shape for themselves a

88

RON RITCHIE

process that is simple and integrated with their routine management systems; schools should listen and do something about the views of their stakeholders; the school’s recorded summary of its self-evaluation process should be updated at least annually. The SEF covers (through a set of speciﬁc questions): characteristics of the school; views of learners, parent/carers and other stakeholders; achievement and standards; personal development and well being (of learners); the quality of provision; leadership and management; overall effectiveness and efﬁciency. The school is required to grade aspects of work on a four-point scale (outstanding; good; satisfactory; inadequate). Additionally schools must provided a range of factual information and conﬁrm ‘compliance with statutory requirements’. Whilst the guidance includes some useful material (and challenging questions) for school leaders, the inherent tensions in the principles established highlight the dilemmas schools face. A key question is whether schools can shape (and therefore ‘own’) processes which lead to genuine improvement within the rigid framework of the SEF? Can such processes and, indeed, should they be considered as ‘simple’ given the complexity of schools and processes of change? Is it helpful to grade all aspects on a four-point scale? Will the requirement to complete the form distort more innovative and generative processes of school improvement? It should also be noted that, at least in England, the positive impact of inspections on standards and school improvement is contested (Cullingford, 1999). Lee and Fitz (1998) remind us that whilst external inspection may be an appropriate instrument for judging schools, it cannot, on its own, lead to improvement.

4. SCHOOL LEADERSHIP AND SCHOOL SELF-EVALUATION Headteachers are recognised as key to school improvement and ample evidence demonstrating their contribution is available (MacBeath, 1999; Ritchie & Ikin, 2000; Sammons, Hillman, & Mortimore, 1995). It is also hard to conceive of effective SSE that does not involve the headteacher in a signiﬁcant way. However, effective school improvement and SSE requires the active involvement of all participants in the school’s endeavours, especially classroom teachers. Schools are increasing dealing with a diverse range of policy initiatives which have made it difﬁcult for headteachers to take

School Self-Evaluation

89

individual responsibility for ‘leading’ all developments, even if they so desired. In the words of Ofsted: It is no longer true – if it ever was – that leadership and management are the sole responsibility of the headteacher. High-quality leadership and management must now be developed throughout a school’s organisation if these new challenges, many of which require working much more closely in partnership with other schools and agencies at all levels, are to be met successfully. (Ofsted, 2003, p. 35)

To meet this need for increasing the leadership capacity of schools, in England, notions of ‘distributed leadership’ are being promoted by the National College for School Leadership and others (Bennett, Harvey, Wise, & Woods, 2003). This approach extends the boundaries of leadership in schools to others, especially middle leaders and subject leaders. SSE is one of the school’s processes where distributed leadership can have a positive impact. Subject leaders, when they monitor teaching and learning and develop colleagues, have been recognised for some time as contributors to schools’ successes (Bell & Ritchie, 1999). In the context of SSE, the signiﬁcance of appropriate ethos, climate and culture is also highlighted in the literature related to school improvement (Day, Hall, Gammage, & Coles, 1993; Ritchie & Ikin, 2000). The headteacher can play an important part in establishing a fully collaborative culture and self-critical ethos conducive to the approaches to SSE discussed below. However, other leaders and class-teachers contribute to the culture of the school and need to be convinced of and committed to the approach being advocated. Hay McBer, in their research on teacher effectiveness and leadership reported by the National Association of Head Teachers (NAHT, 2001), identify six key dimensions of a school ‘climate’, which have obvious links to ‘culture’: ﬂexibility, reward, clarity, standards, responsibility and team commitment. These dimensions provide a framework for schools to evaluate the nature of the ‘climate’ in which they aspire to make improvements. The way in which headteachers and others taking on formal and informal leadership roles in schools approach leadership is another dimension relevant to understanding SSE. Hay McBer identify six leadership styles: authoritative, coercive, democratic, pacesetting, afﬁliative and coaching (NAHT, 2001, p. 11), some of which are more likely to contribute to effective SSE than others. In the ﬁrst case study discussed below, one of these styles, coaching, was used with particular success in a primary school.

90

RON RITCHIE

5. SELF-EVALUATION WITHIN A SCHOOL In some respects, SSE is essentially something that happens within an individual school and this section and the next reviews this aspect of its use. The ﬁrst question to address is concerned with who sets the agenda for SSE? In England, Ofsted certainly make their agenda explicit through the SEF. However, it is an agenda which is relatively broad and, although the form itself might be seen as constraining, it does not prescribe processes used to gather evidence or to analyse data. Schools can continue to make choices about aspects of focus and process as was possible before the link between SSE and external inspection was formalised. The quality of teaching and learning is central to SSE and one which is being addressed in some schools through collaborative approaches to peer observation and coaching. The author was involved in supporting one large primary school, identiﬁed by Ofsted as failing and ‘requiring special measures’, to establish such processes to improve the quality of teaching, judged by Ofsted to be unacceptably poor (a judgement validated by others who worked in an knew the school well) (Ritchie, 2002). In this case, the school established a process which involved middle leaders (initially the deputy headteacher) working collaboratively with other staff through systemic professional development cycles (PDCs) of collaborative planning; peer observation of a lesson; collaborative review and subsequent planning for improvement. It proved to be a successful model of school improvement and, along with other changes introduced by a new headteacher, led to a signiﬁcant improvement in a relatively short period of time. Just over a year later, the school was judged by Her Majesty’s Inspectorate (HMI) as ‘a rapidly improving school which has notable strengths’ and the special measures requirement was withdrawn (op cit). The school continued to make progress and produced ongoing improvement in its pupils’ results, as judged by nationally prescribed standard assessment tests (SATs). The process introduced involved the collection of evidence from peer observation in classrooms, using the current published Ofsted criteria, and from teachers’ self-evaluation which was recorded by the ‘coach’ and formed an evidence base which contributed to the schools’ ongoing record of selfevaluation which was regularly shared with the school’s senior management team: the school’s link Local Education Authority (LEA) adviser, the governors and HMI when they visited to judge the progress the school was making. The role the coach took to supporting colleagues could be described as ‘zetetic’1 (Harland, 1990; Bell & Ritchie, 1999) and characterised by a ‘let’s enquire together’ mode, rather than an expert/novice approach.

School Self-Evaluation

91

Research conducted at the time (op cit, pp. 337–340) identiﬁed features of a fully collaborative culture (Day et al., 1993) that contributed to the success of the school. Other positive factors, speciﬁc to the PDCs, identiﬁed included: clarity with regard to the purpose and process of the PDCs; establishing a systematic and structured process; introducing the approach as part of a positive school ethos which emphasised improvement (a phrase regularly used by the headteacher was ‘you don’t have to be ill to get better’); the credibility, skills and qualities of the coach; the explicit support of the headteacher (direct and indirect, including emotional). The overall approach to improvement in the school had rational/coercive elements to it and there was evidence that some teachers initially felt ‘threatened’ by the idea of being observed, although as one commented, ‘as the cycles went on your conﬁdence builds y on completion it was rewarding and valuable’. The data indicate that it took time to encourage teachers to feel ownership of the process. The focus for development was the individual teacher’s, but the ‘judgement’ of quality was essentially made by the coach (a senior colleague), informed by the teacher’s self-evaluation (during the review discussion). The evidence that informed the judgement was systematically collected, but its validity could be challenged (if the Ofsted criteria are not accepted as valid in the particular context in which they were being used). Action, as a result of the evaluation process was built into the cycles and therefore changes, where appropriate, resulted from the classroom observation. The school continued to use the PDCs and in 2003, their next Ofsted inspection highlighted a number of key strengths including: The monitoring and evaluation of the school’s performance and taking effective action is excellent, including the monitoring, evaluation and development of teaching and induction of new staff. The school’s appraisal procedures and action taken to meet its targets are excellent. The school’s procedures for monitoring pupils’ academic and personal performance are very good. The school is excellent at tracking pupils and setting targets for pupil improvement (Ofsted, 2003). The report goes on to say (in reference to the model of school improvement discussed above), ‘a carefully considered balance of professional development opportunities to meet the needs of individual teachers, and to secure whole school improvement are successfully implemented. The very

92

RON RITCHIE

effective links made between all these procedures and strategies, have enabled the school to make signiﬁcant improvements in many aspects of its work, very quickly’. (ibid., para 63) This school, through its implementation of PDCs, provides an example of a self-evaluating institution (Ofsted, 1998). Rigorous and systematic practices were established that ensured teaching quality was consistently monitored and the evidence of that monitoring was used formatively for improvement. The school was able to provide HMI (who visited regularly to judge improvement) with detailed evidence of its approach to self-evaluation and evidence of the impact of its strategies. It provides an example of a professional learning community (Little, 1992). In many ways, it had become a ‘self-inspecting school’ (Ferguson, Earley, Fidler, & Ouston, 2000) within the current policy framework, although this could, of course, be seen as problematic and constraining in helping it tackle wider goals appropriate to the needs of the ‘knowledge society’ (Hargreaves, 2003).

6. THE CONTRIBUTION OF TEACHER EVALUATION AND CLASSROOM ACTION ENQUIRIES The above example clearly involves teachers evaluating their own practice, but it was built on the model of coaching and prioritised evidence of practice from the coach in the context of classroom observation. This is seen in many schools, although often without the explicit intention of action planning for improvement, where subject leaders (in primary schools) and heads of department (in secondary schools) are involved in monitoring the quality of teaching through classroom observation to provide data for school selfevaluation. However, another source of data and, more importantly, another wellestablished model of improvement can be observed in many schools where teacher led action-enquiries are promoted, drawing on aspects of action research. Such approaches involve, in the best examples, whole school support for classroom enquiries focused on issues that individual teachers have identiﬁed or which have been identiﬁed collectively by a group or all staff. Numerous examples of these can be found in the literature (e.g. Franey, 2002) and, in England, were encouraged through a funding stream for which teachers could apply – Best Practice Research Scholarships, BPRS – (Campbell & Jacques, 2001). Currently, in England, the Training and Development Agency (TDA) for schools is promoting, through Postgraduate Professional Development (PPD) funding, the value of accredited

School Self-Evaluation

93

professional development through higher education institutions. Such programmes must, as a requirement of the funding, provide evidence of classroom and school impact and the majority of these programmes engage teachers in work-based enquires with considerable potential for contributing to SSE. It is not the intention of this chapter to consider these approaches in more detail, but to signal the contribution they can make through the work of individual teachers or groups to school self-evaluation and improvement.

7. NETWORKING FOR SCHOOL SELF-EVALUATION Alongside the policy agenda in England, which prioritises SSE, are a range of funded initiatives that encourage schools to collaborate and network such as:

Leadership Incentive Grants (LIGs) (see www.standards.dfes.gov.uk) Excellence in Cities (EiC) (see www.teachernet.gov.uk) Networked Learning Communities (NLCs) (see www.ncsl.org.uk) Specialist Schools Network (see www.standards.dfes.gov.uk) Leading Edge Partnerships (see www.standards.dfes.gov.uk) Federations (see www.standards.dfes.gov.uk) Primary National Strategy Learning Networks (see www.standards.dfes. gov.uk).

These networks, despite the tensions related to encouraging schools to collaborate whilst other policy initiatives (league tables and funding related to numbers) foster competition, have the potential for schools to work collaboratively on SSE. They also pose problems for schools in terms of increased complexity and possible distraction from their own missions – many secondary schools, for example, can ﬁnd themselves in several networks with potentially conﬂicting demands on their time and inconsistent aims. The Networked Learning Communities’ initiative, managed through the National College for School Leadership led to the funding of over 130 networks of schools – usually in excess of six schools (Hopkins & Jackson, 2003). All of these networks were required, as a condition of their funding, to focus on improving student achievements and, therefore, support school improvement. Outcomes of networks were intended to be evident at ﬁve levels: pupil learning, adult learning, leadership learning, school-wide learning and school-to-school learning. This policy was extended in 2004 as part of the National Primary Strategy to roll out networking to even more

94

RON RITCHIE

schools. There is, however, little evidence of these NLCs making SSE an explicit outcome of networked learning. There are, of course, other beneﬁts (and challenges) to networking, that are well documented (Jackson, 2002). Much appears to be being learnt about networked learning in particular networks, but dissemination of this is proving less successful as the national roll out is demonstrating in some areas where these networks are having limited impact in the author’s experience. The author has worked with a number of networks over the last decade and this section includes examples of practice where the networks have contributed to individual school’s SSE. One group, with whom the author has worked as a co-facilitator and critical friend for ﬁve years, comprises the heads (approximately 10) of small schools in South Gloucestershire, in the West of England. The funding for the work was provided by the LEA and an LEA adviser (and trained Ofsted inspector) acted as a co-facilitator. The approach taken to the initiative was based on a model used in another LEA by the author (Ritchie & Ikin, 2000). The starting point for the initiative was the headteachers’ values and beliefs about teaching, learning and leadership. They all expressed the opinion that external pressures and national initiatives were sometimes causing them to lose sight of the values they held – and some wondered whether those values needed to change. They were also aware that working in small schools brought with it particular challenges and opportunities. They went on to explore when and why those values and beliefs were apparently ‘denied in action’ in their professional work. This provided the stimulus for identifying what it was that could be changed and improved (as well as helping them understand what was outside of their control). Systematic plans for implementing changes and evaluating their impact were then developed. The goal of improving pupils’ achievements and participants’ leadership qualities was the key motivation for these actions. The group then engaged in a range of school-based action enquiries for the purposes of school improvement and acted as critical friends to each other. The group acted, in part, as a ‘validation group’ for these enquiries (McNiff, Lomax, & Whitehead, 1996). The enquiries, which they have shared with other local headteachers and at national events, contributed directly to SSE in their schools. The data collected and the processes involved have been used, in the context of Ofsted inspections, to provide inspectors with evidence of the schools’ progress and their commitment to SSE and improvement. Examples of foci have been varied, for example: the underachievement of boys in reading, use of ICT throughout the school, assessment and development of social skills and behaviour.

School Self-Evaluation

95

Additionally, when Ofsted introduced the original SSE form (then known as the S4 form) the group agreed to support the ﬁrst school that was due to experience the new methodology and learn from the experience. This involved the author leading a structured discussion with the head after the inspection with follow-up questions and discussion from the rest of the group. This session was audio taped and transcribed. The transcript was used to facilitate a fuller discussion about issues and ways to better prepare for the SSE element of the inspection. These heads identiﬁed and collectively validated a number of beneﬁts of networking including: opportunities to revisit and explore key beliefs and values quality time for discussion and reﬂection opportunities for new and experienced heads to work together and learn from each other overcoming the feelings of isolation that working as a head in a small school can produce sharing and discussing educational concerns being able to openly admit mistakes and look for ways forward collaborative work with LEA advisers, other heads and their own colleagues gaining new perspectives developing more objective points of view about their practice asking challenging questions of themselves and others monitoring and evaluating in a more systematic way clarifying issues in the course of seeking solutions to problems increasing understanding of issues such as management of change, school cultures, school development planning, target setting, etc. improved morale and commitment peer support in times of stress The network has continued to meet regularly and recently bid for external funding to support their collective learning. Whilst there have been many successful aspects to its work, there have been signiﬁcant challenges related to maintaining focus, maintaining a critical edge to the dialogue, protecting time for activities and dealing with changes in membership. However, this example illustrates, to some extent, the value of a dynamic process involving schools and their local authority working together ‘to craft or continually negotiate the ﬁt between external demands and the schools’ own goals and strategies (Honig, 2004, p. 1) in order to seek coherence in addressing the

96

RON RITCHIE

policy agenda. In this case, unlike the American example, a higher education tutor plays a facilitating role.

8. LEARNING SETS AND PROFESSIONAL LEARNING COMMUNITIES Another initiative, planned in collaboration with the same LEA (now a local authority) (South Gloucestershire) has a more explicit focus on SSE. The author and another adviser from that LEA planned a university-accredited module entitled ‘School Improvement through School Self-Evaluation’. It was targeted at experienced headteachers (in a context where nationally their needs were being less explicitly met) and was intended to support SSE through the establishment of facilitated learning sets, as a means of fostering professional learning communities (Jackson & Tasker, 2004) or ‘communities of practice’ (Wenger, 1998). The idea of the learning set was to ‘promote and sustain the situated learning of its members’ (Jackson & Tasker, 2004, p. 2). In this example, two learning sets were formed, comprising seven heads and either the LEA adviser or the author, as an HE facilitator. Each learning set began by establishing ground rules (especially related to trust) and, as with the other group discussed above, exploring values that members held. This led to an identiﬁcation of school improvement issues that individuals decided to focus on with the support of the group. Within each set four pairs were formed to act as critical friends for each other. The sets met approximately monthly (with pairs having contact in between), usually for half a day. At the meetings each participant had ‘protected’ space to talk about her/his issue and ongoing action. Usually this involved a 10–15 min input from the participant followed by discussion. The critical friend kept notes through the discussion to feedback to the ‘presenter’ after the meeting. The group also identiﬁed ‘input’ they would like from the facilitators (for example, on data collection or leadership of change) to be addressed during the other half of each day when they met as a whole group. At the end of the year, the group planned a conference for other heads (a ‘leaders of learning forum’) at which they shared their experiences, the outcomes of their enquiries and provided workshops for other heads to engage with some of the issues with which they had been grappling. The initiative, whilst successful in many ways, especially in fostering more educative approaches to school improvement and school self-evaluation, was not without its difﬁculties. Getting headteachers to prioritise the time

97

School Self-Evaluation

for such activities was not always easy since the demands of ‘running a school’ often lead to unpredictable circumstances that make attendance difﬁcult. This then has a destabilising impact on the rest of the group as the learning set cannot function as planned. Additionally, the time taken to build up trust within the group and to encourage genuinely critical engagement with issues and reﬂexivity is considerable. To quote one participant: ‘Obviously it took time to establish the professional and personal trust needed to make such a structure work, especially as we did not know each other well. Once this was in place, aided by the structure of learning sets, and pairing people together, the experience for everyone was positive and very helpful. The facilitator was able to provide an objective view and helped the group by challenging our thinking, both in whole group information sessions and in our learning sets’. However, despite the challenges, the learning sets demonstrated, to some extent, all of the characteristics of professional learning communities identiﬁed by Jackson and Tasker (2004): shared values and vision; collaborative learning built around enquiry; collaborative and shared personal experience; supportive and shared leadership; collective responsibility for student learning and success; supportive conditions. These were heads who sought to live out what Michael Fullan calls ‘the moral imperative’ of working together for the collective good of students (Fullan, 2003). Overall the approach was evaluated by participants as very successful. One reported, ‘‘From the outset it was clear that success would depend on the commitment to sharing with the group. Once the group dynamic was established the whole ‘took off’ for me, because, for the ﬁrst time, I could learn in the way that best suited me y we got to know each other better quite quickly – the connections were established. I found out that I am not the only head to be facing ‘x’ – and that every other head has problems and challenges to which they are constantly seeking solutions.’’ This group identiﬁed the following factors which led to the success of the initiative from their perspectives: Organisation Factors Personal time booked out in diary and adhered to Collaborative work over a sustained period Facilitation by LEA and HEI colleagues – providing up to date research and challenge in a non-threatening style An out of school venue Having ground rules set up at initial meeting

98

RON RITCHIE

Collaboration Validation from peers about what you are doing in school Support from colleagues providing ideas, suggestions, constructive challenge Improved and extended network of colleagues Developed thoughts on own leadership styles and how others deal with challenges in their schools Security of working in a trusted group and the honesty shown by colleagues Encouragement for collaboration back in school Personal Gains

Learning Time away from school for personal reﬂection Ownership of the focus for professional development Motivation to get things completed for the next session Overcoming feelings of isolation

Again, problems related to the participants protecting time for the meetings. There were also issues, in some cases, around participants broadening the scope of enquiries to embrace wider staff groups, which was an intention of the initiative. The more formal organisation of the learning sets and the explicit link to SSE did help, in the facilitators’ views, ensure the dialogue remained focused and critical. These examples of schools networking in the context of school selfevaluation demonstrate ways in which, despite the pressures on headteachers, some are seeking to work constructively within the constraints in which they ﬁnd themselves to favour collaboration over competition and to ensure the focus remains on the quality of pupils’ learning and well being.

9. DILEMMAS AND CHALLENGES This chapter raises a number of key issues. The key one is whether SSE, as an aspect of the accountability agenda, which is itself a signiﬁcant feature of the wider public management agenda, is a threat to broader approaches to school improvement that seek to impact on the quality of pupils’ experiences.

School Self-Evaluation

99

The above discussion has provided examples of schools where the approach to self-evaluation is authentic and valid (according to the views of the participants and other stakeholders), creative and learning centred. It suggests that the accountability agenda need not, necessarily, take away from the focus on learning in schools, but that risk is certainly evident. A number of further questions then arise: Can schools ensure that within the context of SSE, teachers are the key agents and pupils the key focus of change? Is SSE conducive to fully collaborative schools? Does SSE (and the related framework of the SEF) encourage teachers and headteachers to be sufﬁciently self-critical and reﬂective or does it foster ‘playing the game’ approaches? Is SSE conducive to teachers and headteachers seeing themselves as learners? Can schools strategically manage multiple, external demands? The discussion has sought to illustrate ways in which, at least, some schools in England have found ways that suggests positive responses to these questions. However, their experience also indicates the complexity of engaging in school improvement in a context which potentially constrains such endeavours and many, given the opportunity, in my experience, challenge what they regard as an over-emphasis on accountability and inspection and the nature of some of the indicators (such as national testing) used to judge the success of schools. The formal introduction of SSE into the accountability regime in England undoubtedly provides a major challenge for schools. For schools like those discussed above, the culture of the schools and the approaches taken may allow them to deal with the dilemma of balancing such requirements with their own aspirations for school improvement. For others, according to the Government’s own inspectors, the management culture of schools must change radically if the new inspection regime is to succeed. Those inspectors have reported that trials of streamlined inspections, where self-assessment plays a much bigger role, have shown that some schools are failing to provide the quality of self-evaluation needed. They found that schools with traditional ‘top-down’ management structures were ill-equipped for the new regime (TES, 2005). As schools in England begin to take on further change in the context of the Children Act (2004) and the ‘Every Child Matters’ agenda (www. everychildmatters.gov.uk), as another dimension of ‘new public management’, the complexity will become greater and traditional ‘top-down’

100

RON RITCHIE

management structures potentially even less viable. These new challenges, alongside the continuing accountability regime, will require new approaches to distributed leadership that are even more creative and learning centred and empower teachers and related professionals (and pupils) to take control of local school improvement.

NOTE 1. A term used by Harland (1990) to describe an investigative approach which proceeds by enquiry (after Zetetick Philosophy).

REFERENCES Bell, D., & Ritchie, R. (1999). Towards effective subject leadership in the primary school. Buckingham: Open University Press. Bennett, N., Harvey, J. A., Wise, C., & Woods, P. A. (2003). Desk study review of distributed leadership. Nottingham: NCSL/CEPAM. Campbell, A., & Jacques, K. (2001). Best Practice Researched: An investigation of the early impact of teacher research on classrooms and schools. British Educational Research Association (BERA) Conference, September 2001, University of Leeds. Cullingford, C. (Ed.) (1999). An inspector calls: Ofsted and its effect on school standards. London: Kogan Page. Davies, D., & Rudd, P. (2001). Evaluating school self-evaluation. Slough: National Foundation for Educational Research. Day, C., Hall, C., Gammage, P., & Coles, M. (1993). Leadership and curriculum in the primary schools. London: Paul Chapman Publishing. Department for Education and Skills (DfES). (2004). The Children Act. London: DfES. Ferguson, N., Earley, P., Fidler, B., & Ouston, J. (2000). Improving schools and inspection: The self-inspecting school. London: Paul Chapman Publishing. Franey, T. (2002). Working smarter together: The development of an enquiry team across twelve schools. Nottingham: National College for School Leadership. Fuhrman, S. (Ed.) (1993). Designing coherent education policy. San Francisco: Jossey-Bass Publishers. Fullan, M. (2003). The moral imperative of school leadership. London: Sage Publications. Hargreaves, A. (2003). Teaching in the knowledge economy: Education in the age of insecurity. Maidenhead: Open University Press. Harland, J. (1990). The work and impact of advisory teachers. Slough: National Foundation for Educational Research. Honig, M. I. (2004). Crafting coherence: How schools strategically mange multiple, external demands. Educational Researcher, 33(8), 16–30. Hopkins, D., Ainscow, M., & West, M. (1994). School improvement in an era of change. London: Cassell.

School Self-Evaluation

101

Hopkins, D., & Jackson, D. (2003). Networked learning communities: Capacity building, networking and leadership for learning. Nottingham: National College for School Leadership. Hopkins, D., & Reynolds, D. (2001). The past, present and future of school improvement towards the thirds age. British Educational Research Journal, 27, 459–475. Jackson, D. (2002). Networked learning communities: The journey so far. Nottingham: National College for School Leadership. Jackson, D., & Tasker, R. (2004). Professional learning communities. Nottingham: National College for School Leadership. Lee, J., & Fitz, J. (1998). Inspection for improvement: Whose responsibility? Journal of InService Education, 24(2), 239–253. Little, J. (1992). Teacher development and educational policy. In: M. Fullan & A. Hargreaves (Eds), Teacher development and educational change (pp. 170–193). London: Falmer Press. MacBeath, J. (1999). Schools must speak for themselves. London: Routledge. MacBeath, J., & McGlynn, A. (2002). Self evaluation: What’s in it for schools? London: Routledge/Falmer. MacBeath, J., Schratz, M., Meuret, D., & Jakobsen, L. (2001). Self-evaluation in European schools: A story of change. London: Routledge. McNiff, J., Lomax, P., & Whitehead, J. (1996). You and your action research project. London: Routledge. NAHT. (2001). Primary leadership paper 3: Teacher effectiveness and leadership. London: National Association of Headteachers. NCSL. (2005). Self-evaluation: A guide for school leaders. Nottingham: National College for School Leadership. Ofsted. (1998). School evaluation matters. London: Ofsted. Ofsted. (1999). Inspecting schools: The framework. London: Ofsted. Ofsted. (2003). Inspection report: Peasedown St John Primary School. London: Ofsted. Ofsted. (2005a). A new relationship with schools: Improving performance through school self-evaluation. London: DfES. Ofsted. (2005b). Primary self-evaluation form. London:DfES. Reynolds, D., Bollen, R., Creemers, B., Hopkins, D., Stoll, L., & Lagerweij, N. (1996). Making good schools: Linking school effectiveness and school improvement. London: Routledge. Ritchie, R. (2002). School improvement in the context of a primary school in special measures. Journal of Teacher Development, 6(3), 329–346. Ritchie, R., & Ikin, J. (2000). Telling tales of school improvement. Birmingham: National Primary Trust. Sammons, P. (1999). School effectiveness: Coming of age in the twenty-first century. Lisse: Swets and Zeitlinger. Sammons, P., Hillman, J., & Mortimore, P. (1995). Key characteristics of effective schools: A review of school effectiveness research. London: Ofﬁce for Standards in Education and Institute of Education. TES. (2005). Will heads be able to inspect themselves. Times Educational Supplement, 28 January. Wenger, E. (1998). Communities of practice: Learning, meaning and identity. Cambridge: Cambridge University Press.

This page intentionally left blank

CHANGING CONTEXTS AND RELATIONSHIPS IN EDUCATIONAL EVALUATION Katherine E. Ryan The 2004 Sixth Cambridge Evaluation Conference convened to examine how three decades of political and social change are shifting educational evaluation theory and practice. Certainly, preoccupation with educational accountability is part of this change (Ryan, 2004, 2005). The notion of accountability was introduced into the educational discourse around 25 years ago (Nevo, 2002). Nearly replacing external evaluation, educational accountability is now increasingly linked to this notion of external control of education within the United States and in the international arena (Nevo, 2002). In the early 80s, Cronbach et al. (1980) warned about leaving the notion of accountability undeﬁned and unexamined. This warning is especially pertinent today because educational accountability discourses are entangled – going in several different directions at the same time. In common discourse, accountability means answering for responsibilities and conduct. Many educational accountability forms do reﬂect this notion of serving the public interests. For example, schools should be held accountable to ensure efﬁcient and appropriate use of resources. Further, teachers and educators (e.g., principals) have obligations and responsibilities towards their students, parents, and the public. This can be characterized as professional accountability. The

Dilemmas of Engagement: Evaluation and the New Public Management Advances in Program Evaluation, Volume 10, 103–115 Copyright r 2007 by Elsevier Ltd. All rights of reproduction in any form reserved ISSN: 1474-7863/doi:10.1016/S1474-7863(07)10007-7

103

104

KATHERINE E. RYAN

audit culture’s performance indicators also serve stakeholders and public interests through monitoring by promoting educational equity. Educational inequalities are identiﬁed by within- and across-school achievement comparisons (Rizvi, 1990). However, the current form of educational accountability goes beyond identifying educational inequities and is distinct from and much more complex than ‘‘being answerable to.’’ In this paper, I critically examine how current notions of educational accountability have altered the educational evaluation landscape in these three decades. I begin with brief vignettes to instantiate how taken-for-granted educational relationships and evaluation practices have shifted. I brieﬂy discuss political and social changes that contribute to altering these relationships and practices and the meaning of educational accountability. After presenting a critical analysis of educational accountability and these changed relationships and practices, I consider the implications of these changes for the evaluator’s role and relationships with stakeholders.

WHAT IT USED TO BE LIKE So I want to take you back to about 20 years ago, as a ﬁrst year graduate student I was doing a research assistantship with a professor interested in math test performance and motivation. This was the period when the U.S. was equally awed by the Japanese automobile import/production cycle and the Japanese educational system. The systems were both efﬁcient and effective. As part of that assistantship, I helped some staff from the Midwest State Board of Education (MSBE) do a modest longitudinal study. It was all very informal – MSBE and Midwest University (MU) faculty thought it would be interesting to see how the students in 1984 compared with 1974 – called the Decade Study. MU faculty also wanted to pilot some scales assessing student motivation and test anxiety. The staff at the Midwest State Board was excited about doing the study. There was no difﬁculty getting schools to participate. Schools thought it was an honor and a privilege that the Board asked them to participate. This was an extremely rare event – that the state would be involved in administering a test – and that a test that would compare one group of students to another. Students and schools participating in this study did no test preparation – let alone for 2–4 weeks, they did not attend test-taking skills workshops, there were no extraordinary security measures to protect the test integrity, the results were not published in the newspaper, and no one said whether they

Changing Contexts and Relationships in Educational Evaluation

105

met standards or not. There were no standards. And the state sent a letter thanking the schools and students for participating and reported the students did well. It never occurred to anyone to make a judgment of merit or worth – to decide to judge on the basis of that math test how well the participating schools and students were doing.

WHAT IT IS LIKE NOW Today it is 2005. I am still working with the Midwest State Board of Education – doing very different activities. In 2001, I was a public observer for the standard setting process (determination of performance levels students are expected to achieve to be considered ‘‘proﬁcient,’’ or ‘‘needs improvement,’’ etc.) to identify the test score cutoffs for the Midwest Standards Assessment Test (MSAT). Of course, the MSAT has both content and performance standards. The extent to which students have mastered these standards is assessed by large-scale assessment. All districts, each school in each district, each tested grade in each school, and each student in each tested grade receives either a 4 (exceeds standards), 3 (meets standards), 2 (below standards), or 1 (academic warning). Now I serve on The Midwest State Board of Education Technical Advisory Committee – a committee designed to critically analyze and strengthen the state assessment system. Prior to the passage of No Child Left Behind Act of 2001 (NCLB, 2002), the committee deliberations were not subject to the Freedom of Information Act; so the committee was able to hold frank discussions about various weaknesses in the assessment systems. Since the passage of NCLB, the most recent committee charge is to address psychometric issues (e.g., scaling, equating, and standard setting) in conjunction with requirements posed by NCLB regulations, students with special needs, and English language learning. At the same time, these meetings are open to the press and the public. The most recent committee tasks include reviewing a standard setting process designed to reconsider (lower) cutoffs on an English Language Learner (ELL) Assessment that were set prior to NCLB. The pre-NCLB current cutoffs were intended as ‘‘goals’’ (set higher than students were expected to perform), so now a substantial number of Midwest State schools are unable to reach ‘‘annual yearly progress’’ (AYP).1 Teams of bilingual education teachers were convened to consider what post-NCLB ELL students scores should look like: 4 (exceeds standards), 3 (meets standards), 2 (below standards), or 1 (academic warning).

106

KATHERINE E. RYAN

Today test security is an important issue with the stakes higher than ever. The Technical Advisory Committee also devoted substantial time on deciding how to address an inadvertent statewide security breach of an MSAT 5th grade reading passage and follow up items for a number of school districts. MSBE had to remove items from scoring. Both test theory and empirical investigations provide justiﬁcation for removing some items without changing the estimates of students’ scores appreciably. However, deciding the exact number of items that can be removed before impacting score accuracy is far from a perfect science. Certainly, the districts that needed only a small score increase to make ‘‘AYP’’ were doubtful.

WHAT HAPPENED 1984 to 2004 – So my question is what happened? What is the lever? According to Stein (2001), a Canadian political theorist, and others (SuarezOrozco & Qin-Hillard, 2004; Burbules & Torres, 2000), in the past 30 years, political, economic, and social forces have converged to elevate efﬁciency to a value. It is ‘‘globalization that has pushed the notion of markets and their language of efﬁciency to the forefront of public consciousness’’ (Stein, 2001, p. 46). While acknowledging that globalization is a contested term, for the purposes of this discussion, globalization is deﬁned as the integration of markets and production through large multinational corporations based on the notion of efﬁciency (Stein, 2001; Burbules & Torres, 2000). At the same time, the rise of neo-liberal ideologies2 has taken place. However, how or whether these ideological shifts are intertwined with these economic changes is the subject of substantial discussion and debate (Biesta, 2004; Burbules & Torres, 2000; Carnoy, 1999). The audit culture is an instantiation of globalization. The audit culture reﬂects New Public Management (NPM) – one of the fundamental concepts being redeﬁned by globalization. New Public Management refers to a set of initiatives characterized by a regulatory style that makes individuals and organizations accountable through auditable performance standards (Power, 1997). These standards are intended to improve internal performance and to make these improvements externally conﬁrmable and public. Performance is formed by economy, efﬁciency, and effectiveness (Power, 1997). What does this have to do with education? Globalization is demanding more of education as markets have shifted from industrial production to one of service, with information technology receiving more attention

Changing Contexts and Relationships in Educational Evaluation

107

(Stein, 2001; Teachers College Annual Report, 2004). Education must change to meet the needs of the new knowledge-based society. With this shift to a knowledge-based society, there is the demand that students be educated for the new world order to remain competitive in the global economy. Intellectual resources and knowledge, instead of natural resources and industrial labor, are critical assets for continuing economic growth within a knowledge-based economy. Being educated for the new economy means the labor force needs higher level skills and knowledge to function successfully in the labor force and to compete successfully in the international arena. For instance, the current U.S. Department of Education strategic plan explicitly ties education to the economy and democracy with the following: ‘‘Since A Nation at Risk y we have acknowledged the importance of our education system to our economy y Now we acknowledge its importance y to the strength of our democracy itself’’ (p. 6) (www.ed.gov/pubs/ stratplan2002-07).

The Audit Culture and Education How can education be changed to meet the needs of the new knowledgebased society? How can education be reformed efﬁciently? Educational practices are being ﬁtted to the logic of NPM to create an educational accountability based on auditing mechanisms. The U.S. educational accountability context reﬂects this audit intensiﬁcation, particularly since the passage of NCLB. NCLB, which holds individuals and organizations accountable through auditable performance standards, illustrates how the NPM audit culture is shaping educational practices and relationships. This legislation essentially institutionalizes a reliance on performance indicators as a key mechanism for improving student achievement. Educational Standards and Assessments In the U.S., by law, statewide learning standards are proposed with statewide tests administered to see if students are meeting the standards. States, school districts, public ofﬁcials, educators, parents, and students are held accountable for improved test scores as the key educational outcome (Heubert & Hauer, 1999). In the United States, this model does engender some mistrust. For example, there is a sense that standard setting need to be monitored closely. The vignette I presented illustrated how monitoring is incorporated in the standard setting process when I described my role as a

108

KATHERINE E. RYAN

public observer. Being a public observer seems similar to being a poll watcher during elections. Poll watchers are hired to watch polling places to make sure there are no questionable practices (voter fraud, electioneering) occurring while the public casts their votes. Student achievement is also ranked and compared on international assessments like the Third International Mathematics Science and Math Study (TIMSS) and Programme for International Student Assessment (PISA) that are used to examine cross-curricular competences (Stronach, 1999; Stronach, Halsall, & Hustler, 2002). These cross-country assessment comparisons are linked to economic performance in these countries in the global economy (Stronach et al., 2002). Quantitative performance indicators have come to represent and communicate quality and quantity of education (Carnoy, 1999). The standardsbased scoring for the MSAT is a concrete example of how quality and quantity is being represented by this kind of performance indicator. A ‘‘4’’ on the MSAT in Mathematics (on a scale of 1–4) means students’ performance is above what is required to meet the standards, which is a ‘‘3.’’ However, what math students know and the level at which they know the math is not conveyed. This is in contrast to the complex picture portrayed in traditional educational evaluations. There is an assumption that increases from this kind of quantitative measurement represent more and better education. The Organisation for Economic Cooperation and Development (OECD), the International Education Association (IEA), and the National Center for Educational Statistics (NCES) have contributed to developing this view through their respective policy efforts. This is essentially an efﬁciency model that uses indicators of productivity (gains in achievement test scores) to represent increased school productivity. Changing Practices and Policies Historically, increased demands on education have been supported with resource increases (Lundgren, 2003). However, globalized education reform is committed to the notion that educational growth and change can be achieved through increased production and efﬁciency. Schools are ‘‘managed’’ by the state now with performance indicators. There is an economic relationship between the state and the schools (Biesta, 2004). Schools as ‘‘providers’’ are accountable to the state because of the ﬁnancial resources the government invests in them, in effect establishing central control. There is an interest in and commitment to high-yield resources that provide substantial increases in student achievement at a low cost (Carnoy, 1999).

Changing Contexts and Relationships in Educational Evaluation

109

Educational accountability based on auditing practices is thought to be of just this kind of ‘‘high-yield resource’’ where increased student achievement is expected with a minimal increase in resources. For instance, under NCLB, when schools and districts do not meet AYP for more than 2 years, a summative evaluative judgment of ‘‘unsuccessful’’ is triggered. Students then may legally enroll in another school, a policy reﬂecting a free-market approach to improving student learning and as a mechanism for reforming education. No extra resources are allocated to see if increased resources can assist in improving school performance. After 5 years of not meeting AYP, the school can be closed – essentially ‘‘going out of business.’’ Fundamentally, educational accountability based on a performance standards deﬁnition is the technical-managerial meaning of accountability (Biesta, 2004). The origin of this technical deﬁnition of accountability comes from the ﬁnancial domain (Bernstein, 2004; Charlton, 2002). In this context, accountability referred to documenting ﬁnancial activities. This usage is now generalized to include all kinds of activities in substantive domains like education and healthcare as well as the ﬁnancial domain (Stein, 2001). This technical-managerial accountability is a system of governance (Biesta, 2004). Cronbach et al. (1980) cautioned the evaluation community about leaving this form of accountability unexamined. The extent to which the meaning and logic of holding individuals, groups, and organizations responsible should be same for ﬁnancial expenditures and student learning is largely unchallenged. There is the assumption that students and ﬁnancial resources are interchangeable units. If it works for ﬁnances, it should work for students. Changing Relationships Implementing a technical-managerial notion of educational accountability changes the relationship among the state, schools, and parents and students by introducing these new institutional arrangements. Previously, within a Lockean liberal democracy, the state (government) was constituted as delegated, institutional power structures administering the interests of society.3 Power is vested in the people – citizens and the public (ordinary citizens interested in or affected by concerns and representing multiple viewpoints) hold the state to a collective accounting. Within these rearrangements, the relationship between the state and its citizens and their local communities can now be characterized in economic terms (Biesta, 2004). Instead of producing goods and services, governments (United States and others) have turned to managing interests for citizens through indicators (House, 2004). Parents, students, and communities are

110

KATHERINE E. RYAN

recast as consumers of public services like education (Biesta, 2004). At an abstract level, as consumers, parents and students have a ‘‘choice’’ about what school to attend when their school as a service provider has problems. However, it is the state that dictates the means and terms of holding schools accountable, not parents, students, or their communities. Conﬁguring parents and students as consumers within educational relationships is intended to empower them by enabling school choice when their home schools are not adequate. This reconﬁguration is not without complexity. As a consequence, parent, citizen, and community options to improve local schools are more limited. There is an assumption that parents and students would rather change schools than work with their local school to improve it. Further, within this conﬁguration, there is no longer any direct accountability between schools and citizens and the public (communities, parents, and students) (Biesta, 2004). At least within the U.S., the current educational accountability system mediates and moderates this relationship. The schools are directly accountable to the state, not parents, citizens, and the community, in the form of standards and progress on meeting those standards. While perhaps not intended, this may be reducing the power of local communities, individuals, and groups to address speciﬁc local school issues like adequate ﬁnancial resources (Biesta, 2004). When school improvement does come up for public discussion, it is often based around the recent school and district report cards. This effectively deﬁnes a discourse about learning and content standards and how to improve test performance. The conversation tends to be about improving reading test scores, not about improving students’ reading or whether more resources are needed to improve reading. What is missing from this conversation? One the one hand, a great deal of this educational accountability is justiﬁed because of the shift to knowledgebased society. Nevertheless, how educational accountability contributes to a knowledge-based society or the connections between them are not clearly articulated. A technical-managerial-based educational accountability is oriented towards quality assurance – systemic efﬁciency and effectiveness (Biesta, 2004; Stein, 2001). Quality assurance is committed to the best possible use of available resources to achieve public ends – not what the ends are. In the case of public schools, this means addressing the purposes and goals of education and attendant values in a knowledge-based society. The example with ELL standards I presented in the vignette illustrates this

Changing Contexts and Relationships in Educational Evaluation

111

means–ends tension. Pre-NCLB ELL standards were set as goals for ELL students to attain. Instead of considering what to do to help students meet the earlier pre-NCLB standards, the post-NCLB standards are lowered. This is one of the ways post-NCLB standards are oriented towards efﬁciency and effectiveness. Standards are set or changed, so programs can meet AYP without additional resources. So the current form of educational accountability does not help with determining what the ‘‘ends’’ are. These ends include information about whether the educational system needs to be effective at basic literacy and numeracy, at developing creativity and critical thinking, and/or at developing civically engaged citizens. Currently, these ends remain notably absent in global educational reform that relies so heavily on the technical-managerial notion of educational accountability.

BACK TO THE FUTURE Returning now to 20 years ago – it is clear a great deal has happened. My question now is, what will happen in the next decade? These changing practices and relationships have implications for where and how evaluators do their work. While these are challenging times, the evaluation community can make a difference at addressing these important issues. Below, I make a few small suggestions for how evaluators might go about mapping this changing terrain. In the literature, the ideology of globalization is often portrayed as an inevitable consequence of market forces that results in a top-down homogenization effecting social, political, and cultural changes (Carnoy, 1999; Lingard, 2000). The empirical effects of globalization are much more complex and hopeful (Lingard, 2000). This technical-managerial educational accountability, which reﬂects the current restructuring of educational systems, is likely linked to globalization. Nevertheless, globalization effects at the local level and the relationships between the local and the global are not well understood. So how the educational accountability practices are instantiated locally is not clear. The notion of glocalization captures how local and national politics and social relations can mediate or moderate the effects of globalization on the local (Lingard, 2000). Educational issues, like educational accountability, the aims of education, and the like, are subject to glocalization. There are politics, local histories, and cultures that will shape how these educational issues, like accountability, actually play out at the local level.

112

KATHERINE E. RYAN

Reframing Local Practices, Policies, and Relations The local is one space or access point for the educational evaluator to work – where local politics, histories, and cultures are likely to alter globalized educational accountability practices. Local educational practices and policies can be developed that are at odds with the global practices like relying so heavily on standardized tests to represent educational quality. The globalized economic relationship among schools, state, parents, students, and the community can be reframed locally to a local political relationship where schools, parents, students, and the community come together to have a conversation about the common or public good (Biesta, 2004). The evaluator can play a key role by helping to reframe this relationship. This is at least one place where there is space to go beyond technical-managerial educational accountability to construct a democratic accountability (Ryan, 2004, 2005). Parents, students, citizens, professionals (teachers and administrators), and experts can join in dialogue and discussion to address the meaning of educational accountability, how its meaning is constructed including accountable for what, how, and to whom (Ryan, 2005). This community composed of parents, students, citizens, professionals (teachers and administrators), and experts become actors in constructing the terms and means of educational accountability. Evaluators can assist by helping develop and design this kind of self-monitoring community. Schools, students, parents, and their communities hold themselves responsible for what they are doing in democratic accountability. They jointly engage in critical reﬂection about school and policy implementation. Further, schools and communities take ownership of external and internal standards. A ﬁrst task is deciding what quality is and translating it into standards of effectiveness (House, 2004; Stein, 2001). However, it is critical to acknowledge that making these decisions about standards is a formidable task – ‘‘neither politically neutral nor easily done’’ (Stein, 2001, p. 77). Critical topics like the values and purposes of education are important to democratic discussion. This is where the ‘‘ends’’ that are overlooked in a technical-managerial educational accountability are decided in democratic accountability. It is in these discussions where other valued outcomes can be considered. These outcomes include the extent to which education can and should be aimed at developing innovative thinkers, knowledgeable and tolerant citizens, and literate and numerate citizens (Stein, 2001; Willinsky, 2004). However, these kinds of outcomes are not easily assessed with multiple choice assessments. These kinds of complex knowledge and skills are best

Changing Contexts and Relationships in Educational Evaluation

113

represented by complex achievement tasks – problems and activities in which students engage. Standardized test score performance has an important place in representing student achievement. This is a static snapshot of each student’s learning. However, the kinds of multifaceted understandings involved with broader educational outcomes, like innovative thinking, etc., are best represented by multiple assessment methods to adequately represent students’ knowledge and skills. A more important question about the evaluator’s role in this new context – the changing circumstances of globalization – remains unanswered. Within a liberal democracy, the government is constituted as delegated, institutional power structures administering the interests of society. Like the institutional structures administering the interests of the public, within a liberal democracy the role of the evaluator was to serve the public interests (MacDonald, 1976; House & Howe, 1999; Ryan, 2004). A neo-liberal democracy that focuses on appropriate markets aimed at creating enterprising and entrepreneurial individuals signals very different arrangements. The evaluator’s relationship to this ruling apparatus4 or institutional power structures remains to be charted.

NOTES 1. Annual yearly progress is the (a) percent of reading and math scores that meet or exceed standards, compared to the annual state targets; and (b) the participation rate of students in taking the state tests, which must meet or exceed 95%. 2. With neo-liberalism, the state role is to create an appropriate market and one that creates an individual that is enterprising and entrepreneurial (Biesta, 2004). 3. Here society is devoted to maximizing individuals’ power and personal interests rather than communitarian (Habermas, 1996). 4. D. Smith (1987) deﬁnes ‘‘ruling apparatus as that familiar complex of management, government, administration, professions, and intelligentsia, as well as the textually mediated discourses that coordinate it and penetrate it’’ (p. 107).

REFERENCES Bernstein, D. (2004). Town meeting: Developing an AEA public statement on educational accountability. Panel presentation at the Annual Meeting of the American Evaluation Association, Atlanta, GE, November. Biesta, G. J. (2004). Education, accountability, and the ethical demand: Can the democratic potential of accountability be regained? Educational Theory, 54(3), 233–250. Burbules, N. C., & Torres, C. A. (Eds). (2000). Globalization and education: Critical perspectives. New York: Routledge.

114

KATHERINE E. RYAN

Carnoy, M. (1999). Globalization and educational reform: What planners need to know. Paris: UNESCO: International Institute for Educational Planning. Charlton, B. G. (2002). Audit, accountability, quality, and all that: The growth of managerial technologies in UK Universities. In: S. Prickett & P. Erskine-Hill (Eds), Education! Education! Education! Managerial ethics and the law of unintended consequences. Charlottesville, VA: Imprint Academic. Cronbach, L. J., Ambron, S. R., Dornbusch, S. M., Hess, R. D., Hornik, R. C., Phillips, D. C., Walker, D. F., & Weiner, S. S. (1980). Toward a reform of program evaluation: Aims, methods, and institutional arrangements. San Francisco, CA: Jossey-Bass Publisher. Habermas, J. (1996). Three normative models of democracy. In: S. Benhabib (Ed.), Democracy and difference (pp. 21–30). Princeton, NJ: Princeton University Press. Heubert, J. P., & Hauer, R. M. (Eds). (1999). High stakes: Testing for tracking, promotion, and graduation. Washington, DC: National Academy Press. House, E., & Howe, K. (1999). Values in evaluation and social research. Thousand Oaks, CA: Sage Publications. House, E. R. (2004). The role of the evaluator in a political world. The Canadian Journal of Program Evaluation, 19(2), 1–16. Lingard, B. (2000). It is and it isn’t: Vernacular globalization, educational policy, and restructuring. In: N. Burbules & C. Torres (Eds), Globalization and education: Critical perspectives (pp. 125–134). New York: Routledge. Lundgren, U. P. (2003). The political governing (governance) of education and evaluation. In: P. Haug & T. A. Schwandt (Eds), Evaluating educational reforms: Scandinavia perspectives (pp. 99–110). Greenwich, CT: Information Age. MacDonald, B. (1976). Evaluation and the control of education. In: D. Tawney (Ed.), Curriculum evaluation today: Trends and implications (pp. 125–134). London: Macmillan Education. Nevo, D. (2002). Dialogue evaluation: Combining internal and external evaluation. In: D. Nevo (Ed.), School-based evaluation: An international perspective (pp. 3–16). Kidlington, Oxford: Elsevier Science. No Child Left Behind Act of 2001, Pub. L. No. 107th Cong., 110 Cong. Rec. 1425. 115 Stat. (2002). Power, M. (1997). The audit society. New York: Oxford Press. Rizvi, F. (1990). Horizontal accountability. In: J. Chapman (Ed.), School-based decision-making and management (pp. 299–324). Hampshire: Falmer Press. Ryan, K. E. (2004). Serving the public interests in educational accountability. American Journal of Evaluation, 25(4), 443–460. Ryan, K. E. (2005). Making educational accountability more democratic. American Journal of Evaluation, 26(4), 443–460. Smith, D. (1987). The everyday world as problematic: A feminist sociology. Boston, MA: Northeastern University Press. Stein, J. G. (2001). The cult of efficiency. Toronto, ON: House of Anansi Press, Ltd. Stronach, I. (1999). Shouting theatre in a crowded ﬁre: Educational effectiveness as cultural performance. Evaluation, 5(2), 173–193. Stronach, I., Halsall, R., & Hustler, D. (2002). Future imperfect: Evaluation in dystopian times. In: K. Ryan & T. Schwandt (Eds), Exploring evaluator role and identity (pp. 167–192). Information Age Publishing: Greenwich, CT.

Changing Contexts and Relationships in Educational Evaluation

115

Suarez-Orozco, M., & Qin-Hillard, D. M. (2004). Globalization: Culture and education in the new millennium. Berkeley: University of California Press. Teachers College Annual Report, New Rules, Old Responses, http://www.tc.columbia.edu/ news/article.htm?id=4741, 8/31/2004 Retrieved, 10/1/2004 U.S. Department of Education. (2002). U.S. department of Education strategic plan 2002-2007. Retrieved November, 2004, from http://www.ed.gov/about/reports/strat/plan2002-2007/ index.html Willinsky, J. (2004). Keep the whole world at your fingertips: Education, Globalization, and the nation. Paper presented Bureau of Educational Research Colloquium, University of Illinois at Urbana-Champaign, Champaign, IL, December.

This page intentionally left blank

ON THE IMPORTANCE OF REVISITING THE STUDY OF ETHICS IN EVALUATION Thomas A. Schwandt Ethical inquiry is fundamentally about the moral relations between human beings. For our purposes here, there is no need to enter a lengthy debate over what moral relations refer to, for they encompass just what one might think, namely, human interactions that are perceived as appropriate, honest, decent, just, proper, honorable, principled, kind, and so forth. Ethical inquiry is concerned with making distinctions between what is good and bad in our actions and in our ways of being with one another. Contrary to some contemporary accounts that regard moral/ethical reasoning and argument as primarily rooted in a capacity to understand and apply some set of moral principles that yields a context-free decision procedure, I side with those moral philosophers who accept the thoroughgoing practical and interpretive character of moral/ethical inquiry and decision making (e.g., Miller, 1996; Moody-Adams, 1997; Raz, 2003; Walzer, 1987). What this means is that ethics and morality is something we have to argue about; it is not something we discover. What the argument is about is our selves and the kind of life we take to be worth living; at issue is the moral meaning of our way of life as citizens, professionals, parents, friends, spouses and partners, and so on. This chapter deals with ethical inquiry in evaluation, but perhaps not as one might expect.

Dilemmas of Engagement: Evaluation and the New Public Management Advances in Program Evaluation, Volume 10, 117–127 Copyright r 2007 by Elsevier Ltd. All rights of reproduction in any form reserved ISSN: 1474-7863/doi:10.1016/S1474-7863(07)10008-9

117

118

THOMAS A. SCHWANDT

TYPES OF ETHICAL INQUIRIES IN EVALUATION Traditionally, the kinds of ethical inquiry that one encounters in program and policy evaluation involve two broad matters: Evaluator (mis)conduct – that is, concerns such as conﬂict of interest, contractual obligations, competence, integrity and honesty, and so forth – and protection of evaluation participants’ rights to autonomy, privacy, informed consent, and so on. Ethical inquiry in both cases is typically guided by a set of ethical theories, standards, guidelines, or principles that an evaluator must interpret and apply in the situation at hand (Newman & Brown, 1996). For example, the Joint Committee’s (1994) Propriety Standards speak to matters of both evaluator conduct – with respect to formal agreements/contracting, conﬂict of interest, and ﬁscal responsibility – and to matters of participants’ rights. Similarly, the Guiding Principles for Evaluators, endorsed by the American Evaluation Association (see www.eval.org), address these two ethical concerns stressing the evaluator’s duty or obligation with respect to methodological competence, integrity and honesty, and respecting people. These are important matters in professional ethics in evaluation, situated largely within the discussion of the ethics of principles or rules. However, this view of ethical inquiry in evaluation is not my concern here. My primary interest lies with a third kind of ethical inquiry that involves examining the moral-political orientation of the practice of evaluation (Schwandt, 1997). At issue here is the conception of human well-being or the public or social good that evaluation as a social practice is intended to serve.1 This ethical matter is preﬁgured in both the Joint Committee Standards and the Guiding Principles for Evaluators. In the former, the Propriety Standard referred to as ‘‘Service Orientation’’ holds that evaluators ought to look beyond the self-interests of program developers and managers to serve program participants, community, and society (Joint Committee, 1994, p. 83). In the latter, the principle ‘‘Responsibilities for General and Public Welfare’’ states that an evaluator should ‘‘take into account the public interests and good, going beyond analysis of particular stakeholder interests to consider the welfare of society as a whole.’’ Mark, Henry, and Julnes (2000) present a statement of the ethical aim of evaluation that perhaps many evaluators would ﬁnd difﬁcult to dispute. They argue that the raison d’eˆtre of policy and program evaluation is social betterment, understood as ‘‘the reduction or prevention of social problems, the improvement of social conditions, and the alleviation of human suffering’’ (p. 7). Moreover, they note ‘‘evaluation contributes to social betterment by supplying information

Revisiting the Study of Ethics in Evaluation

119

that can be used in democratic deliberations and administrative actions. In other words, evaluation can assist democratic institutions and agents to better select, oversee, improve, and make sense of social programs and policies’’ (p. 8). One could read these concerns in a rather innocent way as simply signifying that a profession, by deﬁnition, is in business for the common good, as well as for the good of its members, or it is not a profession (Sullivan, 1995). However, I choose to read this in a more robust and profound way as suggesting that addressing the social welfare is professional civic obligation in evaluation. To successfully realize and perform that obligation depends on genuinely understanding and coming to terms with conceptions of human well-being (e.g., ‘‘social betterment’’) that, in turn, also shape the self-understanding of professional evaluation practice.

SITUATING THE MEANING AND SIGNIFICANCE OF THE MORAL-POLITICAL AIM OF EVALUATION While the notion of the social good or human well-being is notoriously difﬁcult to pin down and has been investigated and debated by ethical and political theorists from Socrates to the present day, it is not necessary for present purposes to deﬁne the notion as much as to situate it in present realities and suggest what investigating it might entail for the practice of evaluation. One useful way of framing questions of human well-being and how evaluation practice is tied to conceptions of human well-being is to consider how evaluation is linked to various theories of democracy, for it is in the latter that ideas of the public good and human well-being receive expression. This is the approach taken by House and Howe (1999) who argue that all evaluators have some broad idea of how their work will inform society and some conception of democracy and of evaluation’s role in it, even if that conception is not made explicit. They suggest that the moral orientation of evaluation depends a great deal on whether the (implicit or explicit) framework guiding the evaluation is informed by a thin liberal version or a strong and substantive notion of democracy. The former model, often referred to as pluralist or interest group pluralism, regards democracy ‘‘as a mechanism for identifying and aggregating the preferences of citizens in order to understand which are held in the greatest number or with the greatest intensity’’ (Young, 2000, p. 21). A distinctive feature of this model is that it is highly skeptical about the possibility of assessing the normative and

120

THOMAS A. SCHWANDT

evaluative objectivity of the interests or claims being promoted. Young (2000) explains that the pluralist or aggregative model of democracy: Denies that people who make claims on others about what is good or just can defend such claims with reasons that are objective in the sense that they appeal to general principles beyond the subjective preferences or interests of themselves or others. On this subjective interpretation, if people use moral language, they are simply conveying a particular kind of preference or interest which is no more rational or objective than any other. (p. 23)

In other words, ends or values promoted by a policy or program are subjective, incapable of resolution via rational argument, and exogenous to the political process. Hence, in serving the political process, evaluation does not concern itself with appraising ends or values of programs or policies, rather it serves the framework of liberal democracy and exercises its professional civic obligation via its competent execution of two tasks – means depiction (investigating the effectiveness and efﬁciency of alternative means to interests, values, or ends preferred by a given interest group) and interest group depiction (i.e., if X is important to you, then the program or policy in question is good for the following reasons) (House & Howe, 1999). A different moral orientation of evaluation arises from within a strong, substantive, deliberative model of democracy. Barber (1984) argues: Strong democracy is a distinctively modern form of participatory democracy. It rests on the idea of a self-governing community of citizens who are united less by homogeneous interest than by civic education and who are made capable of common purpose and mutual action by virtue of their civic attitudes and participatory institutions rather than their altruism or good nature. (p. 117)

The ethical aim of evaluation informed by this model of democracy is realized by submitting matters of both means and ends to reﬂection, including conﬂicting values and stakeholder groups in the evaluation, ensuring that there is deliberation and dialogue about conﬂicting views, facilitating that deliberation, helping media and policy makers make sense of conﬂicting claims by sorting through the good and bad information, bringing interests of presumed beneﬁciaries of a policy or program to the table if they are neglected, and so on (House & Howe, 1999). Evaluation activities are conceived in this way because democracy is envisioned as form of practical reason: Participants in the democratic process offer proposals for how best to solve problems or meet legitimate needs, and so on, and they present arguments through which they aim to persuade others to accept their proposals. Democratic process is primarily a discussion of problems, conﬂicts, and claims of need or interest. Through dialogue others test and challenge these proposals and arguments. (Young, 2000, p. 22)

Revisiting the Study of Ethics in Evaluation

121

In this moral-political orientation, evaluators exercise their professional civic obligation by acting as analyst as well as mediator and one of many social actors actively engaged in the discursive construction of the value of a policy or program (see also Fischer, 2003). House and Howe’s analysis can be extended to examine how evaluation practice acquires a particular moral-political orientation in the face of the severe attenuation taking place in notions of the public good and general welfare that are the result of the emphatic turn towards the politicaleconomic theory of neoliberalism in the past 30 years. Neoliberalism ‘‘proposes that human well-being can best be advanced by liberating individual entrepreneurial freedoms and skills within an institutional framework characterized by strong private property rights, free markets, and free trade. The role of the state is to create and preserve an institutional framework appropriate to such practices’’ (Harvey, 2005, p. 2). Several developments associated with the neoliberalization of society and the spread of new public management in the public sector are particularly worth considering. First, in brief, in the past century, political systems such as socialism, social democracy, and social liberalism – despite considerable ideological differences – viewed the public as ‘‘the social’’ – a collectivity or communality of citizens. According to some observers, neoliberal theory is associated with ‘‘the end of the social’’ as the ‘‘inevitable horizon for political thought and action’’ (the horizon in which educational and social programs are developed and evaluated): The end of the social has corresponded to the return of the market and to a reworking of relations between government and capitalist markets y . New models of public management address a particular conﬁguration of ‘‘the public’’, one inscribed with a utilitarian notion of the public sector’s relationship to citizen-consumers y . New public management discourse addresses ‘‘a public’’ depicted as customers who relate to their government on the basis of an economic, rather than a social contract – through the logic of consumption – getting value for their dollars. (Hall, 2005, pp. 154–155)

The neoliberal market model is exceptionally weak in its notions of citizenship and community (Clarke & Newman, 1997; Pollitt, 1993) in large part because: In so far as neoliberalism values market exchange as an ethic in itself, capable of acting as a guide to all human action, and substituting for all previously held ethical beliefs, it emphasizes the signiﬁcance of contractual relations in the marketplace. It holds that the social good will be maximized by maximizing the reach and frequency of market transactions, and it seeks to bring all human action into the domain of the market. (Harvey, 2005, p. 3)

122

THOMAS A. SCHWANDT

The ethical norms – impersonal, egoistic, exclusive, and want-regarding – structuring market relations (Tsakalotos, 2004) potentially transform the purpose of social and educational programs and inﬂuence the behavior of service providers such as teachers, doctors, counselors, and social service agency personnel of all kinds. A recent study of JobCentre Plus, a new organization formed from the merger of the Beneﬁts Agency and the Employment Service in Britain, charged with delivering the government’s work-focused welfare agenda provides a case in point (Rosenthal & Peccei, 2004). JobCentre Plus is emblematic of the new public management and explicitly emphasizes the discourse of customers and service quality. Researchers examined how the putative social service agency itself is constructed as a business and how users of the agency – its customers (the terms client, claimant, or unemployed are never used) – are constructed in agency texts and talk. They found competing narratives. On the one hand, the customer is the sovereign ﬁgure of homo economicus assumed in neoclassical economic theory – portrayed as choosing, autonomous, active, and powerful. On the other hand, the customer is constructed as an object of control, in a state of obligation, and subject to surveillance (i.e., to monitor compliance with the rules associated with receiving beneﬁts). Second, neoliberalism reconﬁgures the interactions between citizens and state away from the notion of a political relationship (government and citizens together concerned about the common good) to a notion of an economic relationship. In so doing, it replaces the idea of accountability as a system of mutual responsibility among citizens with the idea of accountability as a system of governance characterized by a customer-oriented ethos. Biesta (2004) argues, ‘‘Choice has become the key word in this discourse. Yet ‘choice’ is about the behavior of consumers in a market where the aim is to satisfy their needs; it should not be conﬂated with democracy, which is about public deliberation and contestation regarding the common good’’ (p. 237). Harvey (2005) holds that neoliberalism is ‘‘profoundly suspicious of democracy’’: ‘‘Governance by majority rule is seen as a potential threat to individual rights and constitutional liberties y . Neoliberals therefore tend to favour governance by experts and elites. A strong preference exists for government by executive order and judicial decision rather than democratic and parliamentary decision-making’’ (p. 66). The point of this brief sketch is to suggest that neoliberalism provides another framework in which the moral-political orientation of evaluation is given meaning. In this framework, evaluation serves a fairly vacuous sense of the public good by supporting a style of formalized accountability between service providers and customers that becomes the new ethical and

Revisiting the Study of Ethics in Evaluation

123

political principle of governance (Power, 1997). What is particularly troublesome here is the explicit and sole equation of the value of social programs with performativity – the measurement of performance against indicators of target achievement. The movement towards an exclusive focus on performativity is subtle and surreptitious: Subtle because the language of what is ‘‘of value’’ in social programs and policies has been recast in terms of quality assurance, outcomes, and performance and made to appear so relevant and cleverly targeted that it needs no defense; surreptitious because although an apparent concern for value is maintained and even made more prominent, the very notion is neutered and emptied of meaning. The social actions we know as policies and programs are rendered adiaphoric – that is, they are emptied of any socially deﬁned ethical signiﬁcance and no longer to be judged in terms of moral criteria (good, bad, corrupt), but simply measurable against technical indicators (purposes achieved, procedures followed) (Bauman, 1993).2 This is worrisome because the idea of judging the value of policies and programs is rendered moot or so carefully ﬁnessed that the substitution of performance for the complex judgment of value goes unnoticed. To the extent the evaluation practice supports and reinforces a system of entrepreneurial governance ‘‘orchestrated by the logic of calculation and of measuring outcomes and results,’’ it makes it increasingly difﬁcult to formulate fundamental questions about the purposes of educational and social policies and programs and their relation to the common good (Hall, 2005).

RECONSIDERING ETHICAL INQUIRY IN EVALUATION Ethical inquiry in evaluation ought to extend beyond a concern with preventing irresponsible and harmful evaluation and with acting competently. Avoiding misconduct and acting with methodological integrity is not the same as doing good evaluation. To address the latter, we must deal with both the complexities of deﬁning the public good and the complexities of human lives that aim to enact oft-competing conceptions of the good (Hostetler, 2005). Without addressing the matter of what it means to contribute to ‘‘social betterment,’’ to serve society, or what the public good and the welfare of society mean as a whole, how could it be possible for evaluators to take the same into account? Hostetler argues that it is incumbent upon researchers to be knowledgeable and ﬂuent regarding conceptions of human well-being for two basic reasons. First, the concepts used to

124

THOMAS A. SCHWANDT

articulate the aims of educational and social programs are typically contestable. What does it mean to ‘‘improve social conditions,’’ ‘‘alleviate human suffering,’’ decrease poverty, reduce school violence, lessen recidivism, provide adequate housing, or learn to read? Even these seemingly obvious goods are subject to different understandings that form part of what comprises human well-being. Second, the concept of human well-being or the public good is complex. Even if one aim of an educational or social program is uniformly judged to be ‘‘good,’’ it constitutes only one good. Rarely is it the case that good things come without costs. Hence, deciding on the human good involves tradeoffs. Hostetler’s view suggests that somewhere along the line in their professional preparation, evaluators need to gain adequate awareness of, concern for, and understanding of issues of human well-being. Stated somewhat differently, there is a need for moral education as well as methodological training in evaluation. This does not mean that all evaluators ought to study moral philosophy. Questions of human well-being are routinely explored in literature, history, sociology, and anthropology. In fact, we are likely to learn more about moral thinking and behavior by undertaking ﬁeldwork in familiar places – studying the moral inquiries (and disagreements) of everyday agents – than from the largely disengaged philosophical debate about moral theories (Moody-Adams, 1997). To serve the welfare of society as a whole, evaluation practice ought to engage in ethical inquiry that goes beyond matters of avoiding evaluator misconduct and ensuring participants’ rights. This kind of inquiry has at least three signiﬁcant dimensions. First, it involves examining how social and educational policies and programs embody norms and values that guide human interactions and therefore shape the lived reality of managers, service providers, and participants. In other words, part of ‘‘doing ethics’’ in evaluation involves understanding the kinds of values and moral principles embedded in programs and policies that inﬂuence the way managers relate to workers, teachers relate to parents and students, social workers relate to clients, health services personnel relate to patients, and so on. This is brieﬂy illustrated above in the example showing how the customer in a social service agency is constructed via ofﬁcial agency documents, the physical arrangement of ofﬁces, and the interactions between agency personnel and service users. Second, evaluation should not only endeavor to understand and portray these matters; it also ought to engage in responsible criticism of these principles and associated ways of acting and interacting. In other words, another important part of ‘‘doing ethics’’ in evaluation is engaging in moral

Revisiting the Study of Ethics in Evaluation

125

criticism. Thus, following the example used above, we might examine the signiﬁcance and consequences of the simultaneous strategic deployment of language, images, and practices that are associated with customer sovereignty as well as narratives of obligation, responsibility, and control. Criticism involves thoughtful exploration of ways in which this situation can be interpreted. As the authors of the study suggest, it might be read in several ways. On the one hand, it can be linked to a perspective of organizational legitimization – an effort of JobCentre Plus to frame itself as a proper business organization in contrast to some outdated model of a public sector bureaucracy. It can also be linked to a Foucauldian perspective revealing the complex workings of governmentality, or to a broader project of constituting and positioning the users of JobCentre Plus in terms of ruling values of enterprise and self-reliance. Finally, engaging in ethical inquiry in evaluation ought to include appraising the various moral orientations of evaluation itself. It will not do to simply assert that evaluation serves the goal of social betterment without careful analysis of how that notion is given meaning in different socialpolitical points of reference. When, for example, evaluation takes its moralpolitical orientation towards the public good from a deliberative model of democracy, we ought to consider ways in which that conception of evaluation can be interpreted too narrowly to sufﬁciently support the task of deepening democracy in social situations that reﬂect structural injustices. For example, a deliberative democratic framework can privilege argument and a mode of expression more typical of highly educated people and thereby silence some problem or experience incapable of expression as an argument or well-formed dispassionate speech; proceed on the assumption that that there is a basis for common understanding to which we can appeal thereby denying the realities of plural and structurally differentiated social groups; and assume that deliberation must proceed on the basis of norms of communication thereby ruling as uncivil or out of order other forms of political communication (Young, 2000, p. 36). Likewise, when evaluation’s sense of serving the social good is given shape in a neoliberal framework that emphasizes ‘‘what works,’’ we would be wise to consider that: Facts about the way the world is cannot tell us what we ought to do. If students responded well to cattle prods, it would not follow that they ought to be shocked. If children can learn the alphabet before entering school, it does not follow that they should. If abstinence-only sex education programs were shown to reduce the teenage pregnancy rate more than other programs, that alone would not determine that those are the programs we should use. To each of those scenarios, we can and must say, ‘‘Okay,

126

THOMAS A. SCHWANDT

but how does that serve people’s well-being?’’ And to answer that question, we have to venture wide-eyed and strenuously into the bewildering complexities of the human good. (Hostetler, 2005, p. 19)

In light of the general skepticism of neoliberalism with respect to participatory democracy, we might ask whether evaluation aligned with this view is fostering a government of professionals rather a government of citizens. If, to borrow an idea from Box, Marshall, Reed, and Reed (2001): ‘‘The reality of our social experience is a hyper-rationalized world in which democracy is equated with consumer choice y then, the problem we face is this: In what ways might we reassert a meaningful democratic context for the practice of [evaluation] in light of such social experience?’’ (p. 615). Evaluation is a crucial part of the public policy arena and is inherently connected to the social construction of the good society. Hence, responsible ethical inquiry in evaluation requires that we examine and evaluate the conceptions of human well-being promoted in the ways evaluation as a social practice situates, interprets, and undertakes its task of judging value in a particular moral-political orientation.

NOTES 1. It may be argued that there are grounds for a quarrel about whether evaluation is a social practice and not simply a kind of technology utilized by various kinds of professionals to judge whether intended policy and program outcomes have been achieved. However, the preponderance of evidence points to the fact that evaluation is a particular kind of community of practice (actually, several communities of practice) with its own traditions, norms, special methodological knowledge, professional associations, and institutional locations. 2. Whether this is a predictable transformation of the moral-political orientation of evaluation previously aligned with a thin version of liberal democracy is an issue worthy of further investigation but beyond my concern here.

REFERENCES Barber, B. (1984). Strong democracy: Participatory politics for a new age. Berkeley, CA: University of California Press. Bauman, Z. (1993). Postmodern ethics. Oxford: Blackwell. Biesta, G. J. J. (2004). Education, accountability, and the ethical demand: Can the democratic potential of accountability be regained? Educational Theory, 54(3), 233–250. Box, R., Marshall, G., Reed, B., & Reed, C. (2001). New public management and substantive democracy. Public Administration Review, 61(5), 608–619. Clarke, J., & Newman, J. (1997). The managerial state: Power, politics and ideology in the remaking of social welfare. Thousand Oaks, CA: Sage.

Revisiting the Study of Ethics in Evaluation

127

Fischer, F. (2003). Reframing public policy: Discursive politics and deliberative practices. Oxford: Oxford University Press. Hall, K. D. (2005). Science, globalization, and educational governance: The political rationalities of the new managerialism. Indiana Journal of Global Legal Studies, 12(1), 153–182. Harvey, D. (2005). A brief history of neoliberalism. Oxford: Oxford University Press. Hostetler, K. (2005). What is ‘‘good’’ educational research? Educational Researcher, 34(6), 16–21 House, E. R., & Howe, K. R. (1999). Values in evaluation and social research. Thousand Oaks, CA: Sage. Joint Committee on Standards for Educational Evaluation. (1994). The program evaluation standards (2nd ed.). Thousand Oaks, CA: Sage. Mark, M. M., Henry, G. T., & Julnes, G. (2000). Evaluation: An integrated framework for understanding, guiding, and improving policies and programs. San Francisco: Jossey-Bass. Miller, R. B. (1996). Casuistry and modern ethics. Chicago: University of Chicago Press. Moody-Adams, M. (1997). Fieldwork in familiar places. Cambridge, MA: Harvard University Press. Newman, D. L., & Brown, R. D. (1996). Applied ethics for program evaluation. Thousand Oaks, CA: Sage. Pollitt, C. (1993). Managerialism and the public services. Oxford: Blackwell Publishers. Power, M. (1997). The audit society. Oxford: Oxford University Press. Raz, J. (2003). The practice of value. Oxford: Oxford University Press. Rosenthal, P., & Peccei, R. (2004, April). The work you want, the help you need: Constructing the customer in JobCentre Plus. Management Centre Research Papers No. 026, King’s College, London. Retrieved November 21, 2005 from http://www.kcl.ac.uk/depsta/pse/ mancen/research Schwandt, T. A. (1997). The landscape of values in evaluation: Charted terrain and unexplored territory. In: D. J. Rog & D. Fournier (Eds), Progress and future directions in evaluation: Perspectives on theory, practice, and methods. New directions for evaluation (Vol. 76, pp. 25–39). San Francisco: Jossey-Bass. Sullivan, W. M. (1995). Work and integrity: The crisis and promise of professionalism in America. New York: HarperCollins. Tsakalotos, E. (2004). Homo economicus, political economy and socialism. Science & Society, 68(2), 137–160. Walzer, M. (1987). Interpretation and social criticism. Cambridge, MA: Harvard University Press. Young, I. M. (2000). Inclusion and democracy. Oxford: Oxford University Press.

This page intentionally left blank

NEW PUBLIC MANAGEMENT AND EVALUATION UNDER DECENTRALIZING REGIMES IN EDUCATION$ Christina Segerholm In recent decades public education has been the target for reorganization in several western democracies as well as in other countries. This is part of what is commonly referred to as ‘‘globalization.’’ Several scholars in education have taken an interest in this concept, the rhetoric and practices it carries (see, for example, Adick, 2002; Daun, 2003; Lindblad, Ozga, & Zambeta, 2002; Steiner-Khamsi, 2000). Part of the ‘‘globalization’’ discussion is focusing issues of governing (Lindblad et al., 2002), international competition (Hirtt, 2004), evaluation, quality assessment and performativity (Ball, 2003). New Public Management (NPM) is also part of this scholarly attention (Pollitt & Bouckhaert, 2004). In this text a comparison between Sweden and the U.S. is used to illustrate some of the features of NPM and evaluation in a decentralized $

This text is a revision of a presentation made at the joint Canadian Evaluation Society and American Evaluation Society conference in Toronto, October 26–29, 2005, as part of the panel session Evaluation, the ‘‘New Politics’’ and the ‘‘New Public Management’’: Dilemmas of Engagement. I am grateful for useful comments from Saville Kushner, Gunter Rochow and others in the audience.

Dilemmas of Engagement: Evaluation and the New Public Management Advances in Program Evaluation, Volume 10, 129–138 Copyright r 2007 by Elsevier Ltd. All rights of reproduction in any form reserved ISSN: 1474-7863/doi:10.1016/S1474-7863(07)10009-0

129

130

CHRISTINA SEGERHOLM

education system. Independent of the degree of decentralization, there are many similarities but also differences in how NPM and its evaluative activities are put into practice locally and nationally. The very sense of rational improvement and a wish to steer the future in line with pre-set goals are common denominators, and perhaps the essence of societies and education in (late) modern times.

BACKGROUND In the academic year 2000–2001, I visited the U.S. What I found out concerning the ‘‘standards reform’’ in public education intrigued me. I had a conception of public schooling in the U.S. as a very decentralized system, mainly governed by local school boards. But during my stay, people talked about public schooling in terms of centralization, not only referring to the standards reform but also to other issues of state control, like tests and development projects. Some years earlier I had been made aware of changes in England that also were talked about in terms of centralization. Local school authorities lost much of their power in policy and decision-making to the central power of the state, and a national curriculum was introduced. Under the same period, changes in the Swedish school system have been described as decentralization, indicating a different direction of change. Let me describe this change in more detail.

SWEDISH CONTEXT After the Second World War, there was an ongoing reformation of the Swedish school system. From being a very complex and parallel system of different school forms (attracting different social groups), the war made it clear that there was a need to unify the nation around democratic goals. (Hitler’s success in Germany provoked this discussion and a fear of a repetition of Nazi inﬂuence was a driving force. Also, an argument for a general basic school for all had for a long time been part of the public political debate.) After a long, and from time to time, quite intense political debate, the parliament decided on a reform: a common basic education for all children.1 This meant that a highly centrally governed system was in operation from the beginning of the 1960s to the middle of the 1980s. From the end of the 1970s, a decentralized type of governing has been successively implemented.

New Public Management and Evaluation

131

Examples of decisions that previously were made at the national level concerned: the organizational features of both the municipalities and the school organization; economic priorities in education (how much money to allocate to different activities); the content in each school subject in each grade; the amount of time allotted for each school subject (all stipulated in an extensive national curriculum, together with a strong emphasis on democratic goals like tolerance and every ones equal value); national tests in the higher grades in certain school subjects to calibrate the national grading system that was norm related and the selection instrument to higher educational levels; a national group approved textbooks. Apart from the municipalities, there were hardly any other providers of mandatory education. Actually, providers other than the municipalities ran less than 1% of the schools. This central control of schooling was connected to the principle of equivalence and how it was perceived at that time. (Each child’s right to get good education and skills in citizenship independent of social class, geographical location, gender, ethnicity – equal access and an emphasis on equality of results; Englund, 2005.) Today, things are very different. First and foremost, the state funding system has changed. Municipalities now receive a lump sum and priorities between different public sectors as well as between different educational areas have to be made at the municipal level. Previously there was a highly speciﬁed system for the allocation of state funds where certain funds were directed to certain activities, efforts or materials. There was, for example, particular state funds for children with special needs, for school premises, etc., calculated on the basis of a formula. The teachers were state employees. The municipalities now employ them. Secondly, municipalities are free to organize their different policy sectors as they please, but need to have political boards accountable for decisions and administrations attached to the political boards. The internal control systems in the municipalities have been strengthened requiring municipalities to install quality assurance/assessment2 systems and more intense auditing. In 2003, the government assigned the National Agency for Education (NAE) to inspect all public school units and municipalities once within a period of six years.

132

CHRISTINA SEGERHOLM

Providers other than the municipalities (private companies, co-operations, NGOs, etc.) can easily start independent schools, tax ﬁnanced as long as they comply with the national curriculum and Education Act and Ordinance. Their eligibility is assessed by the National Agency for Education. NAE supervises these schools directly. There is a national curriculum, still stressing democratic values, but it is more concerned with individual rights than with social justice. When it comes to curricular content and time spent on school subjects, the national curriculum is much broader and open to local interpretations. A speciﬁed minimum amount of hours is to be spent on each school subject but can be spread over the nine years in different ways. However, in each school subject goals to be achieved in order to be rewarded certain grades are speciﬁed. The degree of speciﬁcation is rather low and the goals and criteria for being rewarded a certain degree need interpretations by the teachers. National mandatory tests in certain school subjects in grades ﬁve and nine are performed yearly and are meant to calibrate the grading system nationally. The grading system is now criterion based. Grades are still the main selection instrument to higher levels in the education system. Taken together, these changes have altered the substance in the meaning of the principle of equivalence. Individual rights, like individual choice and the right to realize individual lifestyles, are now emphasized as a democratic ideal (Englund, 2005). In practice, the principle of equivalence is also now more related to all students’ right to receive education of a minimum standard and the rest is up to individual choice and capacity. All in all, this new context for public schooling in Sweden is typically described as ‘‘decentralized,’’ requiring more decision-making of local politicians and interpretative work of local actors in general. ‘‘Decentralization’’3 in a Swedish context has come to mean that resources, administrative routines and decision-making that were previously a national political concern, are now to a much greater extent a responsibility for the municipalities. Interpretations of the national curriculum, course syllabi, and criteria for grading have become a concern and responsibility for teachers.

Legislations and Regulations Concerning Evaluative Activities Along with the changes that took place during the late 1980s and 1990s, the requirement to account for quality in municipal public services was strengthened by the State (national level). The municipalities were required

New Public Management and Evaluation

133

to install quality assessment systems, and from 1998 they also had to make sure that municipal audits were undertaken yearly in all political sectors (SFS, 1991, p. 900; Government bill 1998/1999, p. 66). The audits examine ﬁscal accountability, routines for the fulﬁllment/implementation of decisions made, and assess efﬁciency. Since mandatory education (and other kinds of education) is one of the most costly sectors in the municipalities, it is common to audit it fairly extensively every year along with social services, and care for the sick and elderly. When it comes to legislation surrounding public education, there are additional rules that apply. NAE administers yearly follow up studies, mainly collecting ratios for statistical descriptive purposes, comparisons between municipalities, etc. They carry out national comprehensive evaluations of all school subjects (student results and instruction in most cases) about every 10th year.4 NAE is also commissioned to supervise the municipalities’ work with public schooling. Complaints concerning individual students’ rights are investigated by NAE (for example, bullying) as well as more general complaints concerning the public school in a municipality. (One example being the lack of local school plans developed by the school boards. According to the Education Act every municipality has to have a local school plan.) NAE may also undertake inspections of anything they see ﬁt, on their own initiative. As mentioned above, the agency is assigned to undertake inspections of every single public school unit in the country in a six-year cycle along with decision-making, implementation, and quality assessment work at central level in the municipalities. Both legal issues of compliance with the Education Act and quality issues are targeted in the new inspections. NAE has no authority to ‘‘punish’’ but rely on public reports and injunctions, often reported in media. As also mentioned above, the municipal school organization is to be audited yearly. In recent years these audits have changed from mostly dealing with economic accountability to deal also with routines for follow up of the implementation of decisions and assessment of quality of the services (educational practice). At school level, the national curriculum states that principals have a special responsibility to evaluate and follow up school activities, and each teacher has a responsibility to evaluate teaching. To summarize, there is a speciﬁed responsibility to evaluate, follow up, and deal with issues of quality and results at all levels in the Swedish mandatory school system, i.e. at the national, municipal, school, and classroom levels. NAE is assigned the task to both enforce and implement these rules (see also below) and has to supervise the entire national school system.

134

CHRISTINA SEGERHOLM

A QUICK COMPARISON Going back again to my stay in the U.S. and what I learned there about what was described in terms of centralization, I started to think about how many things in the recent changes that seemed fairly similar to the changes that took place in Sweden, despite the fact that in Sweden they were part of a decentralization process. The things I noted that seemed to be similar were: the idea of pre-speciﬁed curricular goals/objectives/targets to be achieved in different school subjects, i.e. standards in the U.S. and goals to be achieved in the Swedish national curriculum; the idea of using different kinds of individual measures to control goal fulﬁllment, i.e. yearly state tests in a number of school subjects the U.S. and national tests in certain school subjects in grade ﬁve and nine in Sweden; a stress on parental choice and individual choice. In the U.S. evidenced by the setting up of charter schools and other special state regulations for public schools (private schools were already established and a normal feature in the U.S. context). In Sweden a new legislation permitting independent schools ﬁnanced by tax money, and an increased amount of time set aside for the students’ individual choice of content; an increased stress on accountability – in the U.S. by aligning test results to accountability at all levels, and in Sweden by an increased stress on internal quality assessment procedures in the municipalities and schools. Taken together, the evaluative activities described as common features in both the U.S. and in the Swedish public school systems are examples of what has been called New Public Management (NPM). There is also another common feature underpinning this extremely rational rationale, and that is government by objectives and results (goal-oriented governing). This rationale rests on the idea that it is possible to reach (political) consensus in what the goals/objectives (for education) are to be, that the goals and different levels of goals fulﬁllment can be coherently speciﬁed and operationalized, that the results of the educational process can be accurately measured, and that results other than the pre-speciﬁed goals are of no (educational) interest (Montin, 2002). In this line of thought, little attention is given to the kind of incentives that are needed in order for the intended educational processes to develop. Also, there is an enormous belief in the possibilities to govern by identifying pre-speciﬁed results (goals). However problems have been identiﬁed. There are difﬁculties in reaching consensus concerning how to formulate the goals at all levels in an organization and in a political context characterized by conﬂicting

New Public Management and Evaluation

135

political perspectives. There are difﬁculties ﬁnding good, valid, and reliable methods for how to measure the achievement of goals. Different kinds of efforts to try to measure educational results have also been shown to inﬂuence educational practice much in the same way that tests direct teachers’ instruction and students’ learning processes (see, for example, Sahlin-Andersson, 1995; Segerholm, 2001; Segerholm & A˚stro¨m, 2007).

IMPLICATIONS Now, what are the implications of government by objectives and results, and NPM for evaluation and for public education? It would be presumptuous to pretend that I know the answers. However, I have some suggestions based on previous studies (Hanberger, Khakee, Nygren, & Segerholm, 2005; Segerholm, 2001, 2005; Segerholm & A˚stro¨m, 2007): public education has to handle a web of evaluative systems which is time consuming. It is questionable if this leads to such a degree of improvement that the resources spent on evaluative activities pay off; public education at local levels is partly governed by the direction, content, and form of national evaluative activities. This is particularly obvious because these activities often contain elements of self evaluations, meaning that people in the ‘‘capillaries’’ of the organization take part in evaluative activities inducing a large measure of self reﬂection and self discipline through these activities (cf. Foucault, 1993; Rose, 1996); evaluative activities in a decentralized model such as the Swedish one tend to become blind to other results/effects/ impacts than those pre-speciﬁed as goals or as speciﬁed in the budget. That is, there are very few evaluative activities that question the values concerning educational purpose and content. (This seems to hold true in the U.S. as well.); evaluative activities in a decentralized model permeate all levels in the school organization, taking more time and energy than ever before; and because of that evaluative activities are becoming more and more technical and rational as a way for many actors at different levels to handle the requirements to evaluate, without neglecting all other activities that the school is all about (instruction, planning, personal communication, caring, etc.). For many actors at the practice level/service level, evaluations are a burden, but they can not fend them off since they are required to undertake them. Evaluative activities at this level (school level) are most often talked

136

CHRISTINA SEGERHOLM

about and performed as part of the development of the school and the entire municipality (self-evaluations). And who can (or dare) be opposed to development and improvement? This makes evaluation in this kind of decentralized system a seductive device for the implementation of the ‘‘rational rationale’’ talked about above – it becomes internalized as the one way to keep up high quality and good education. However, I am not sure that these implications are valid only in decentralized education systems like the Swedish one. Since NPM and the goal-oriented governing doctrine are common features in many western education systems, it seems likely that similar observations are made in other national contexts. What may be of particular interest in the Swedish case is the abundant use of self-evaluations or at least the ambition to engage actors at all levels in the school system in evaluative activities. Evaluative activities carried out like that can prove to be more penetrating and have a deeper impact than in systems where there are less self-evaluations. But such issues have to be investigated further as well as local and national practices of how to ‘‘deal with’’ global rhetoric and reforms aimed at NPM, goal-oriented governing and competition. Finally, I ﬁnd it a bit ironic that the goal-oriented governing (government by objectives and results) so popular in today’s societies shares its fundamental underpinnings with a system tried and discarded in another time and another place – the ﬁve-year plans of the former Soviet Union. But these plans were of course directed to material production and not to ‘‘educational production.’’ They also failed to construct categories in their follow-up measures that adequately matched the incentives needed to reach the objectives/targets set out in the plans (see Nove, 1989/1990 for an historical overview; see Wikipedia, n.d. for a brief description; A˚strand, 2005).

NOTES 1. This political debate went on for decades until a try-out period of 10 years with compulsory nine-years’ education, ‘‘enhetsskola,’’ was decided. After that period the idea of a common public school for all was deﬁnitely decided and the ﬁrst national curriculum, that of 1962, was also decided (Richardsson, 1977/1980; Lundgren, 2002). 2. In Swedish, another concept is being used: ‘‘kvalitetsgranskning.’’ This concept carries no clear distinction between quality assurance and quality assessment, but seems to convey both. 3. It seems to me that the Swedish term ‘‘decentralization’’ captures much of the features that in the English language are also described with terms like

New Public Management and Evaluation

137

‘‘deconcentration’’ and ‘‘devolution,’’ which stresses the need to make clear what we mean, by different concepts. 4. A third and most probably last national evaluation was carried out in 2003. NAE is now commissioned by the government to reform this expensive and timeconsuming evaluation system. What seems to be of interest is to develop a cyclic system in which only a small number of school subjects are evaluated yearly.

REFERENCES Adick, C. (2002). Demanded and feared: Transnational convergencies in national educational systems and their (expectable) effects. European Educational Research Journal, 1(2), 214–232. A˚strand, B. (PhD in History of Ideas.). Personal communication, 17 October, 2005. Ball, S. J. (2003). The teacher’s soul and the terrors of performativity. Journal of Education Policy, 18(2), 215–228. Daun, H. (2003). World system, globalisation and educational change (eight cases). Reports from the Stockholm Institute of International Education, 116, 1–55. Englund, T. (2005). The discourse of equivalence in Swedish education policy. Journal of Education Policy, 20(1), 39–57. Foucault, M. (1993). O¨vervakning och straff. [Discipline and punish.] Lund: Arkiv (in Swedish). Government bill 1998/99:66. En sta¨rkt kommunal revision. [A strengthened municipal audit.] (in Swedish). Hanberger, A., Khakee, A., Nygren, L., & Segerholm, C. (2005). De kommungranskande akto¨rernas betydelse. Slutrapport fra˚n ett forskningsprojekt. [The signiﬁcanse of evaluative activities targeting municipalities. A ﬁnal report.] Umea˚: Umea˚ Center for Evaluation research, UCER (in Swedish). Hirtt, N. (2004). The three axes of school merchandization. European Educational Research Journal, 3(2), 442–453. Lindblad, S., Ozga, J., & Zambeta, E. (2002). Introduction. European Educational Research Journal, 1(4), 615–624. Lundgren, U. P. (2002). Utbildningsforskning och utbildningsreformer. [Education research and education reforms]. Pedagogisk forskning i Sverige, 7(3), 233–243 (in Swedish). Montin, S. (2002). Moderna kommuner. [Modern Municipalities.] Malmo¨: Liber (in Swedish). Nove, A. (1989/90). An Economic History of the U.S.S.R (2nd ed., reprint 1990). London, Harmonds worth: Penguin Books. Pollitt, C., & Bouckhaert, G. (2004). Public management reform. A comparative analysis (2nd ed). Oxford: Oxford University Press. Richardsson, G. (1977/80). Svensk utbildningshistoria. Skola och samha¨lle fo¨rr och nu (2nd ed.). [The history of the Swedish education system. School and society, then and now.] Lund: Studentlitteratur (in Swedish). Rose, N. (1996). Governing ‘‘advanced’’ liberal democracies. In: A. Barry, T. Osborne & N. Rose (Eds), Foucault and political reason. Liberalism, neo-liberalism and rationalities of government (pp. 37–64). London: UCL Press Limited. Sahlin-Andersson, K. (1995). Utva¨rderingars styrsignaler. [Signals of governance in evaluations]. In: B. Rombach & K. Sahlin-Andersson (Eds), Fra˚n sanningsso¨kande till styrmedel. Moderna utva¨rderingar i offentlig sector. [From a search for truth to a governing

138

CHRISTINA SEGERHOLM

instrument. Modern evaluations in the public sector] (pp. 71–92). Stockholm: Nerenius & Sante´rus Fo¨rlag (in Swedish). Segerholm, C. (2001). National evaluations as governing instruments: How do they govern? Evaluation, 7(4), 427–438. Segerholm, C. (2005). Coping with evaluations. Inﬂuences of evaluative systems on schoolorganizations in four Swedish municipalities. Paper presented at the AEA/CES conference in Toronto, 25–29 October 2005. Segerholm, C., & A˚stro¨m, E. (2007). Governance through institutionalized evaluation: National evaluations in the Swedish higher education system. Evaluation, 1(13), 48–67. SFS 1991:900. Kommunallagen. [The Act for Municipalities.] (in Swedish). Steiner-Khamsi, G. (2000). Transferring education, displacing reforms. In: J. Schriwer (Ed.), Discourse formation in comparative education (pp. 155–187). Frankfurt am Main: Peter Lang. Wikipedia, The Free Encyclopedia (n.d.). Economy of the Soviet Union. http://en.wikipedia.org/ wiki/Economy_of_the_Soviet_Union#Planning, accessed October 13, 2005.

EVALUATION AND TRUST Nigel Norris INTRODUCTION Given the importance of trust in social life, the concept has had little direct attention from evaluators.1 Trust is central to the seeming integrity of social processes, including, of course, the social processes we call evaluation. Evaluation depends for its success on cooperative relationships and a measure of trust. Evaluation stands in an interesting relationship to trust. The credibility and utility of evaluation rests on trust. Loss or lack of trust is a major impetus to evaluation, and evaluation sometimes takes the place of trust. The process of evaluation requires trust, and evaluation is used to underpin or provide a warrant for trust. Insofar as evaluators have to collect evidence from others or rely on the evidential accounting of others, trust is involved and is integral to the processes and outcomes of evaluation. Even where relationships in an evaluation are highly politicised and mistrust rife, trust is still afoot. This is because some things must be taken for granted, taken on trust for evaluation to happen. In individual instances, trust may be warranted or not, but for evaluation to be a practical possibility, for it to be realised in practice, some things must be trusted. While trust is necessary for the process of evaluation, evaluation often takes place in conditions of low trust; various forms of evaluation and evaluative mechanisms may prompt mistrust on the part of those being evaluated, and some forms of and occasions for evaluation result in the presentation of rehearsed good behaviour: careful, Dilemmas of Engagement: Evaluation and the New Public Management Advances in Program Evaluation, Volume 10, 139–153 Copyright r 2007 by Elsevier Ltd. All rights of reproduction in any form reserved ISSN: 1474-7863/doi:10.1016/S1474-7863(07)10010-7

139

140

NIGEL NORRIS

stage-managed performances or presentations that accord with bureaucratic perceptions of good or procedurally appropriate practice (Norris, 2005). The intimate relationship that evaluation now has with accountability suggests that somewhat paradoxically evaluation may be contributing to a loss of trust in social programmes, institutions and public life more generally.

THE NATURE OF TRUST There is growing recognition that trust is a foundational concept in the social sciences and in social life more generally. Trust has been credited with a key role in promoting and sustaining economic prosperity (Fukuyama, 1995). It has been thought of as a generalised belief that a person may safely rely on the future actions of others – ‘that favours are returned and deals are kept’ (Mouritsen, 2003, p. 651). Trust has been seen as a ‘bet about the future contingent actions of others’ (Sztompka, 1999, p. 25), and as a way of reducing the transaction costs of economic exchange (Granovetter, 1985; Uzzi, 1997; Kramer, 1999). It has been conceptualised by rational choice theorists and more widely by economists as associated with calculative behaviour and cooperation: a weighing of interests, advantages and incentives (Hargreaves Heap, Hollis, Lyons, Sugden, & Weale, 1992; Hollis, 1998). Trust has also been seen as having an emotional and social content that is affective as well as cognitive or as Becker (1996, p. 50) deﬁned it ‘as a general structure of concrete motivation, attitude, affect, and emotion’. Trust is necessitated by uncertainty and risk. According to Seligman (1997, p. 13), ‘trust is rooted in the fundamental indeterminacy of social interaction’. Were it possible, perfect knowledge would dispense with the need for trust. Drawing on the works of Georg Simmel and Niklas Luhmann, Lewis and Weigert (1985, p. 969) argue that trust reduces complexity more thoroughly than prediction and ‘allows social interactions to proceed on a simple and conﬁdent basis where, in the absence of trust, the monstrous complexity posed by contingent futures would again return to paralyze action’. Trust is an artefact of normality and the bedrock of the orderliness of social life (Misztal, 2001). It has been variously thought of as an elixir for social cohesion, the ‘chicken soup of social life’ (Hutton, 2004; Uslaner, 2000), an indispensable condition for the elaboration of self identity and for routine social life (Giddens, 1990), as the basis for normal social interaction (Kydd, 2000) and as a condition for stable concerted actions (Garﬁnkel, 1963). Trust has been thought of as a key disposition for and outcome of social capital (Uslaner, 1999) and is linked to positive outcomes

Evaluation and Trust

141

for individuals, groups and communities (Field, 2003). In her inﬂuential study of American cities, Jane Jacobs illustrates how important trust is to urban life and in particular the streets. This is because one of the deﬁning characteristics of city life is the interaction of strangers – often through what Irving Goffman (1963) called civil inattention. Jacobs (1993, p. 73) describes the trust of the city street ‘as formed over time from many, many little public sidewalk contacts’. It is created through mundane, untroubled day-to-day interactions and by the civic oversight that is a by-product of busy street life. Trust is critical to successful neighbourhoods and correspondingly the absence of trust is a recipe for disorder and degradation. In her major study of trust, Barbara Misztal identiﬁes three functions of trust. First, ‘trust is a device for coping with the contingency and arbitrariness of social reality’ (Misztal, 1996, pp. 96–97). Building on the works of Garﬁnkel and Goffman, she argues that our expectation of a stable social order is sustained by being able to take things for granted and on trust. Second, trust is a mechanism for sustaining the predictability, regularity and legibility of the collective order. Misztal (1996, p. 98) associates trust with Bourdieu’s concept of habitus. She says: ‘trust as habitus operates through rules of interaction, rules of distanciation and rules of remembering’. Third, trust is a device for ‘coping with the freedom of others’, and it functions to foster cooperation (Misztal, 1996, p. 99). Trust enables action without full information. It ﬁlls in informational gaps in our reasoning about others and the future, replacing detailed but bounded rational calculation with judgement, intuition and feeling. Trust means that you do not have to know everything to act. It enables individuals and organisations to take the continuity of social order for granted and so help them deal with the future. What makes trust so important is that it is way of coping with the complexity of the world (Luhmann, 1979, 1988). Thus trust is the backbone of cooperation, a necessity for efﬁcient and effective social action, the basis of security and well-being and a device for dealing with future relationships. Trust, some argue, is in decline. Lack of trust has been implicated in a waning of social and community commitments, civic engagement and social capital; loss of institutional and professional legitimacy; and reluctance to compromise in political life (Putnam, 1995; Uslaner, 1999, 2000). Distrust or loss of trust can be economically expensive and even catastrophic in its social consequences (Pagden, 1988; Gambetta, 1988). Modernity with its increasing mobility, ‘distanciation of time-space’ (Giddens, 1990), loosening of traditional forms of identity formation and social solidarity has increased the importance of interpersonal trust relationships, while at the same time

142

NIGEL NORRIS

weakening the social conditions for enduring trust (Misztal, 1996). Perhaps trust is not in decline nor has there been a dramatic loss of trust in modern social systems, but rather trust needs to be recursively remade and reafﬁrmed more often under conditions of ﬂux. It is in the risky nature of trust that it may be disappointed or betrayed just as it may be misplaced or feigned for advantage. In the social conditions of modernity, trust is less taken-for-granted and this may be one reason for the dramatic explosion of evaluation and evaluative mechanisms and their institutionalisation over the last twenty-ﬁve years.

THE PERVASIVENESS OF EVALUATION Despite recent calls to mainstream evaluation by making it an integral part of organisational life (Sanders, 2002), evaluation is in fact a pervasive activity. There are few aspects of social life that are now not touched by the compulsion to evaluate. Evaluation, it seems, is everywhere. Reﬂecting on the growth of evaluation in the USA, Blaine Worthen (1995, p. 29) noted that evaluation had become institutionalised at state and local levels to a much greater degree than is commonly understood. Berry, Turcotte, and Latham (2002) describe the closer alignment of state legislature evaluation units with political processes. Ryan (2002) notes the way in which the No Child Left Behind Act of 2001 institutionalises accountability and evaluative mechanisms. The reach of evaluation in Britain is such that a number of commentators have talked about an ‘evaluative state’ (Neave, 1988; Henkel, 1991) or the audit society (Power, 1997), or coercive accountability (Shore & Wright, 2000), while others describe evaluation as a ‘mantra of modernity’ (Pawson & Tilley, 1997, p. 2). Evaluation used to be occasioned by innovation or reforms. It is now much more part of the routine monitoring and assessment of professional practices, projects, programmes, policies, organisations and institutions. Love (1991) noted the rapid growth of internal evaluation. House (1993) observed that evaluation had become too important to be left as an independent and external service and was being brought within bureaucracies and organisations. Blalock (1999) argues that in the 1990s, performance management and evaluation research have come together to improve policies and programmes and increase accountability. Similarly, Wholey (2002) argues that evaluation has a key role to play in the development of the performance measurement systems of results-oriented management. As evaluation has become institutionalised, it has been incorporated into

143

Evaluation and Trust

professional and organisational practice and internalised as reﬂexive processes of control. The institutionalisation of evaluation is matched by a parallel process of professionalisation. There are now numerous national and transnational evaluation associations and societies.2 There are specialist journals, special interest groups and annual conferences. Since the 1970s, various evaluation communities have developed professional standards and exemplars of good practice (Widmer, 2004; Yarbrough, Shulha, & Caruthers, 2004). The impulse to evaluate is both moral and rational. Evaluation opens up social processes and outcomes to scrutiny and in making public life transparent could be seen to support democratic accountability. It is one of the ways in which society ensures that people do what they are meant to do. It is part of the processes of rational decision-making that warrants the trust invested in those who spend or receive public money and the professionals on whom the public depend. Evaluation is increasingly based on technologies that monitor individual and organisational performance, tracking progress towards objectives, recording and prompting compliance with operational procedures and best practice guidelines. Evaluation is at the heart of self-developing professional communities, learning organisations and systems and rational policy making. It is the modern way of deliberately shaping the self and social life and of managing the risks associated with bounded rationality and the unpredictability of the future. Evaluation has been with us for a long time. It is in many senses an ordinary everyday activity. But it is now built into the fabric of institutions and institutional relations in a systematic and technical manner. It is part and parcel of the reﬂexivity of modernity (Giddens, 1990), where reﬂexivity refers to the capacity to monitor and modify action on the basis of feedback and other sorts of knowledge. Evaluation might be thought of as a guarantor of trust. It provides the reasons for trust. It might be argued that professionals and organisations that do not monitor their own performance have no means of assuring and improving quality and are thus remiss, neglectful, rudderless, old fashioned and in need of modernisation.

IN PLACE OF TRUST New forms of public management have called for new forms of accountability closely modelled on institutional economics, public choice theory and transaction cost analysis (Gregory, 2003; Dunleavy & Hood, 1994). Historically, accountability referred to the duty to provide an account of

144

NIGEL NORRIS

the work of yourself or your ofﬁce to those who had a right to know. Scott and Lyman (1968, p. 46) deﬁned an account as a ‘linguistic device employed whenever an action is subjected to valuative inquiry’. They distinguish between two types of accounts: justiﬁcations and excuses. The need for accountability arose where there was delegated authority to act on behalf of individuals or organisations. The principle was that principals should be able to know about and judge the performance of their agents. The duty to provide justiﬁcations (reasons) and excuses (depictions of the context of action) was and remains an important element of relationships of trust. But in the wake of public management reforms, accountability has taken on other meanings and associations. It is now embodied as the responsibility to meet certain standards and targets, to comply with regulations and procedures, to be open to inspection and audit and to organise internal quality control in such a way that it can be veriﬁed by others (Power, 1997). Today accountability is construed as the panacea, the silver bullet that will kill off the evils of public policy: poor or failing services, inefﬁciency, escalating costs, democratic deﬁcits, lack of transparency and regulatory compliance, lack of responsiveness to clients and other stakeholders, provider capture and unprofessional behaviour. Thus accountability is not seen as a duty or concomitant of professional autonomy, rather it is a disciplinary device for social control that displaces trust. In the contemporary discourse of accountability, there can be heard the shrill voice of distrust and disrespect: distrust of public institutions and the professionals that they depend on, disrespect for the values and traditions of the public sphere. Part of the explanation for this stems from the presumption that the discipline and methods of the market would be good for public policy and management. As David Marquand (2004, p. 2) has argued: The single most important element of the New Right project of the 1980s and 1990s was a relentless kulturkampf designed to root out the culture of service and citizenship which had become a part of the social fabric. De-regulation, privatization, so-called publicprivate partnerships, proxy markets, performance indicators mimicking those of the private corporate sector, and a systematic assault on professional autonomy narrowed the public domain and blurred the distinction between it and the market domain.

Related to this is the way in which social actors in the substantive accountability relationships of the new public management are deﬁned as rational, self-interested economic agents. The theory of motivation that underpins these social relations is essentially extrinsic, behaviourist and based on a view that people are driven by opportunism and the prospect of rewards and avoidance of punishments and shame. Strong collective or professional

Evaluation and Trust

145

values are not seen as a source trust, but instead as potential threat to management by management’s objectives. Modern accountability creates a near-inexhaustible demand for evaluation or evaluative mechanisms: people and organisations must be scrutinised. Throughout public institutions and service organisations, evaluation and various forms of surveillance are needed to monitor the performance of contractual obligations and service-level agreements and to ensure that standards and targets are met. Where it is too costly or impractical for evaluative information to be collected directly, meta-evaluation or audit is used to oversee or assay the internal evaluative processes of the organisation. The watchwords of modern accountability, namely, openness, transparency, clarity of objectives, targets and standards, indicators, monitoring and audit, are all things that should, at face value, enhance trust. But accountability and the evaluation systems it relies on produce untoward consequences and seem unable to restore or replace trust. Prophesies about the ill winds of accountability have been with us for some time. Writing in the early 1970s, Ernest House noted that the idea of accountability was going to be around for a while, if for no other reason than it was a ‘way of lassoing the wild stallion of educational spending’, but he doubted if it would result in favourable changes in education, rather the reverse (House, 1972). A few years later, Barry MacDonald warned that the evaluative intent associated with the accountability movement in education would be abused in ways that would damage the work of teachers and waste millions of pounds of public money (MacDonald, 1978). More recently, Onora O’Neill in the 2002 Reith Lectures has drawn attention to the paradox that accountability and transparency may well undermine the trust they are supposed to engender and support. There are a number of reasons she gives for this. First, that accountability is experienced as distorting professional practice and undermining the conﬁdence and self-esteem needed for the proper exercise of professional authority. Second, that despite the rhetoric of democracy and the public’s right to know, accountability is really about political and administrative control. Third, indicators are often chosen for ease of measurement and control rather than because they are critical dimensions of good performance. Finally, O’Neill argued that the demands for transparency in public life may encourage deception and dissembling, especially where honesty is likely to result in negative consequences (O’Neill, 2002). Accountability is intimately bound up with blaming. When blame attaches, it spoils reputation, undermines legitimacy and calls into question trust in those who are held to account. The allocation of blame is a way

146

NIGEL NORRIS

of offsetting, explaining and ﬁxing responsibility for failure, error, impropriety or crisis in social life. Blaming restores political normality and order. Reviewing public management reform in ten OECD countries, Pollitt and Bouckaert (2000, p. 174) noted that the new public management is ‘a bundle of disparate elements, but one of those elements has certainly been a process of distancing and blaming by political leaders’. The avoidance of blame is a powerful incentive for agencies and their agents to keep perceived performance in line with targets and contracts. It is also a powerful incentive for executive decision-makers to keep their distance from the services for which they are democratically accountable. The threatened or actual oversight afforded by evaluation can encourage compliance with organisational norms and contractual expectations. Effective social control has always relied on visibility and the fear of visibility, as well as the internalisation of norms and values. Visibility is an incentive to ‘good’ behaviour. Nevertheless, although the presentation of self is conditioned by considerations of situational propriety, its display may be for the purposes of authentic engagement, impression management or outright deception (Goffman, 1963, 1969). Low-trust, high-surveillance environments do not favour learning and innovation. So much energy has to be devoted to engineering and maintaining compliance with regulatory frameworks that adaptive or evolutionary change is rendered problematic. It is unsurprising then that accountability and transparency tend to induce conformity or creative compliance. People and organisations act defensively in the face of accountability systems. Accountability systems that require compliance with standards or codes of practice often produce coping behaviours that distort organisational goals and can lead to fabrications of conformity (Ball, 2003). As Zegans (quoted in Bovens, 2005) says, ‘Rule obsessed organisations turn the timid into cowards and the bold into outlaws’.

THE SOCIAL FOUNDATIONS OF EVALUATION Programme evaluation should be more of a counterbalance (a source of countervailing authority) to the institutionalised evaluative accountability systems inherent in the management of public services. Programme evaluation needs to re-emphasise the importance of learning. Evaluation for learning needs a more social, communitarian, developmental and caring vocabulary of purpose and action than modern accountability permits. Evaluation for learning needs richer, longer term more complex social

Evaluation and Trust

147

relationships. The development of learning systems was called for over 30 years ago by Donald Schon in his 1971 book, Beyond the Stable State (Schon, 1971). He argued that social change was leading to a loss of stability in institutions and anchors for individual identity and that the loss of the stable state made it imperative for people, institutions and society as a whole to learn about learning. Both Lee J. Cronbach and Donald Campbell in their rather different ways have stressed the contribution of evaluation to societal learning. Cronbach (1982, p. 8) likened the evaluator to an educator, saying that the proper function of evaluation was ‘to speed up the learning process by communicating what might otherwise be overlooked or wrongly perceived’. Campbell’s advocacy of social experimentation expressed a commitment to evaluation as the servant of an ‘evolutionary, learning society’.3 Programme evaluation is both a process for keeping social claims honest and learning from the experience. Evaluation contributes to trust in social programmes and policies by producing knowledge about them. It also contributes to trust in political and administrative decision-making by symbolising values of rationality and openness and making an instrumental contribution to their realisation. In effect, evaluation puts the knowledge claims of policy makers and other decision-makers to some kind of public empirical test. When evaluation underwrites trust by producing authoritative knowledge, it does so by using the theories, processes, methodologies and methods of the social sciences or science more generally. Such knowledge does not have to be the last word, it does not have to be true for all times or certain (indeed, it is provisional), but it does have to be warranted and offer more than what common sense, faith or commitment can. Much of the legitimacy and credibility of evaluation ﬂows from its commitment to systematic inquiry and the values of science. Evaluation is favoured over other ways of determining merit, worth and value because of its commitment to public, impartial and systematic inquiry and the contribution this can make to social decision-making. Whether such faith in evaluation is or could ever be fully justiﬁed epistemologically is not what is at issue here. In conditions of cultural and value pluralism, heterogeneity of viewpoint and uncertainty, trust in evaluation depends on its processes and the knowledge it produces being seen as fair, just, credible, warranted and untainted by sectional interests. Impartiality may have a bad reputation in some intellectual and standpoint quarters, but it is difﬁcult to see how widely different communities of interest and value would have conﬁdence in an explicitly partial evaluation. What is at issue is not the epistemology or hegemony of impartiality, but how it is practically achieved in particular circumstances for particular groups of people: that

148

NIGEL NORRIS

is, impartiality in context. Much greater attention should be given to the practical achievement of impartiality. Impartiality is typically thought of as a methodological achievement, but in evaluation it is much more than this. Impartiality is a community achievement in that its fullest realisation requires that evaluation is open to long-term critical scrutiny. Being impartial means treating all those involved in an evaluation with the same respect and consideration. Impartiality demands commitment to inclusion, dialogue and deliberation that is alien to bureaucratic impersonal systems of evaluation. With the aggrandisement of the market has come an undue emphasis on economic understandings of value which impoverish the vocabulary, practice and experience of much evaluation. Markets may make for efﬁcient economic exchange, but their social relations are too sharply focused on proﬁt and loss to provide a sound basis for accountability. There is good reason to believe that the economic model of human motivation and action that underpins public policy and its management is misguided. Emotional and moral reciprocity rather than crude self-interest is as much part of the dynamic that drives collective action and organisational behaviour as is rational self-interested calculation (Kahan, 2003). How people are treated in these respects matters. Evaluative systems that treat people impersonally and only as rational, calculative and self-serving agents are a recipe for undermining the trust and cooperation needed for the effective delivery and long-term improvement of public services. The question evaluators of all institutional hues and methodological persuasions need to ask is: Why should people cooperate with evaluation of whatever kind?4 By cooperate I do not mean participate; many people have no choice but to participate in evaluation and evaluative systems. Participation leaves a good deal of room for strategic action, while genuine cooperation does not. Cooperation implies choice and a certain degree of common commitment. The institutionalisation of evaluation and its association with modern accountability systems mistakes participation for cooperation. It invites strategic, sometimes perverse, responses to evaluative action based on negative assumptions about the motivations, dispositions, competence and actions of the agents of accountability. The higher the stakes associated with evaluation, the more probable it is that strategic responses will be defensive and deceptive, that organisational processes and goals will be distorted and that there will be discrepancies between the public show of accountability and the private reality of organisational or individual performance. Evaluation is thus apt to undermine the very conditions required to make it successful. This is what Bo Rothstein (2005) calls a social trap. It is a process of

Evaluation and Trust

149

strategic action and learning that results in a pattern of mistrust and mutual suspicion which further conditions future action. What is lacking in much evaluative activity is any sense of reciprocity between the different parties involved. Mostly the evaluation system engineers, commissioners and users of evaluation as well as evaluation theorists and practitioners take cooperation and the trust it implies for granted.5 We have seen that trust is integral to evaluation because a stable complex social order and the legibility of social life in general are conditional on trust. The ways in which trust operates to sustain the orderliness and legibility of social life, however, can be sabotaged by inattention to the consequences of using evaluation as a performance management tool and servant of accountability. The judgemental nature of much evaluation makes people feel threatened and defensive. When those subject to evaluation see it as unfair, unsympathetic to local circumstances, unproductive and unnecessary, its legitimacy is called into question and the moral force of open and honest relations is quickly weakened. In their professional standards, various evaluation associations and societies emphasise the importance of trustworthiness. Trustworthiness is associated with technical competence as well as with integrity, independence and the credibility and acceptance of evaluation ﬁndings. Technical competence is an important determinant of the trust placed in professionals. In the case of evaluation, methodological expertise and the ‘expert system’ of the discipline and profession are a source of authority and trust. What is involved here is largely impersonal trust in what Giddens (1990) has called abstract systems. While technical competence is important, there are other bases of trust to which evaluation needs to attend. Typically evaluation standards stress the need to respect and protect the rights and welfare of people involved in an evaluation, and the guidelines of the UK Evaluation Society urge evaluators to treat all parties equally in the process of evaluation and the dissemination of ﬁndings. These formal principles are important, but respecting rights is not quite the same as respecting persons, and equality of treatment does not fully account for the signiﬁcance of reciprocity in evaluation. Respecting people carries with it a more affective and caring quality, while reciprocity implies something about the quality of social interaction beyond equality. Reciprocity is not a here-and-now calculation of exactly what each gives and gets. It is not just a self-balancing exchange or the interchange between rights and duties. Reciprocity has to do with longer term commitments and relationships of understanding. Management and accountability systems and the institutionalised forms of evaluation they rely on are largely

150

NIGEL NORRIS

impersonal, bureaucratic and technical processes that objectify human performance and treat people as means to an end. There is little respect for persons here. Richard Sennett (2003, p. 207) says of respect that it is an ‘expressive performance’. What he means by this is that ‘respect doesn’t just happen, [y] to convey respect means ﬁnding the words and gestures which make it felt and convincing’. Reciprocal relationships are the foundation of the trust and mutual respect necessary for effective evaluation. Trust demands an engagement with people that goes beyond the formalities of their roles and beyond the realm of methodology: it demands empathy and care, an emotional and ethical as well as rational engagement.

NOTES 1. Some notable exceptions are: Symonette (2004, p. 100), who observes that evaluators need to attend to trust building ‘because their roles and responsibilities often automatically engender fear and mistrust’ and that ‘lack of trust erodes the prospects for full access to important data and networks’; Hyatt and Simons (1999), who stress the importance of trust in mutual understanding; Taut and Alkin (2003), who mention lack of trust in evaluators and evaluation processes as important barriers to the implementation of programme evaluation; and Hansson (2006), who considers trust in relation to the evaluation of research and research production. 2. For example, there are evaluation societies, associations or networks in Canada, Germany, Finland, France, Italy, Malaysia, Sri Lanka, Switzerland, the UK and the USA. There are also evaluation associations for Africa, Australasia, Europe and Central America. 3. See, for example, Campbell and Russo (1999). Campbell says he owes the term ‘evolutionary, learning society’ to E. S. Dunn (1971). Economic and Social Development: A Process of Social Learning, Baltimore: John Hopkins Press. 4. You might answer that cooperation in evaluation is part and parcel of taking the King’s shilling; it goes with the turf of employment: it is in the contract and in any event it would be imprudent to bite the hand that feeds you. Another answer to the question is that as either providers or recipients of public services, there is a duty to cooperate. In the market-based conditions of public management, these are probably not compelling reasons for cooperation, however. 5. There are models and approaches to evaluation that pay more attention to providing a reasonably sound and reciprocal basis for informed participation, most notably democratic approaches to evaluation. These represent important alternative models, but they are far from the mainstream of evaluative activity.

REFERENCES Ball, S. (2003). The teacher’s soul and terrors of peformativity. Journal of Education Policy, 18(2), 215–228.

Evaluation and Trust

151

Becker, L. (1996). Trust as non-cognitive security about motives. Ethics, 107(1), 43–61. Berry, F., Turcotte, J., & Latham, S. (2002). Program evaluation in state legislatures: Professional services delivered in a complex, competitive environment. New Directions For Evaluation, 95(Fall), 73–88. Blalock, A. (1999). Evaluation research and the performance management movement. Evaluation, 5(2), 117–149. Bovens, M. (2005). Analysing and assessing public accountability – A conceptual framework. Paper presented to Accountable Governance: An International Research Colloquium, Queens University Belfast October 20–22, 2005. Campbell, D., & Russo, M. (1999). Social experimentation. London: Sage. Cronbach, L. (1982). Designing evaluations of educational and social programs. San Francisco: Jossey Bass. Dunleavy, P., & Hood, C. (1994). From old public administration to new public management. Public Money and Management, 14(3), 9–16. Field, J. (2003). Social capital. London: Routledge. Fukuyama, F. (1995). Trust. New York: Simon & Schuster. Gambetta, D. (1988). Maﬁa: The price of distrust. In: D. Gambetta (Ed.), Trust making and breaking cooperative relations (pp. 158–175). Oxford: Basil Blackwell. Garﬁnkel, H. (1963). A conception of, and experiments with, ‘trust’ as a condition of stable concerted actions. In: O. J. Harvey (Ed.), Motivation and social interaction. New York: The Ronald Press Co. Giddens, A. (1990). The consequences of modernity. Stanford, CA: Stanford University Press. Goffman, E. (1963). Behavior in public places. New York: Free Press. Goffman, E. (1969). Presentation of self in everyday life. Harmondsworth, Middlesex: Penguin (ﬁrst published in 1959). Granovetter, M. (1985). Economic action and social structure: The problem of embeddedness. American Journal of Sociology, 91(3), 481–510. Gregory, R. (2003). Accountability in modern government. In: G. Peters & J. Pierre (Eds), Handbook of public administration. London: Sage. Hansson, F. (2006). Organizational uses of evaluations – Governance and control in research evaluation. Evaluation, 12(2), 159–178. Hargreaves Heap, S., Hollis, M., Lyons, B., Sugden, R., & Weale, A. (1992). The theory of choice: A critical guide. Oxford: Blackwell. Henkel, M. (1991). The new evaluative state. Public Administration, 69, 121–136. Hollis, M. (1998). Trust with reason. Cambridge: Cambridge University Press. House, E. (1972). The dominion of economic accountability. The Educational Forum, 37(1), 13–23. House, E. (1993). Professional evaluation. London: Sage. Hutton, W. (2004). ‘Foreword’ to O’Hara, K. (2004). Trust: From Socrates to Spin. Duxford, Cambridgeshire: Icon Books. Hyatt, J., & Simons, H. (1999). Cultural codes – Who holds the key. Evaluation, 5(1), 23–41. Jacobs, J. (1993). The death and life of great American cities. New York: Random House – The Modern Library (First published in 1961). Kahan, D. (2003). The logic of reciprocity: Trust, collective action, and law. Michigan Law Review, 102, 71–103. Kramer, R. (1999). Trust and distrust in organizations: Emerging perspectives, enduring questions. Annual Review of Psychology, 50, 569–598. Kydd, A. (2000). Overcoming mistrust. Rationality and Society, 12(4), 397–424.

152

NIGEL NORRIS

Lewis, J., & Weigert, A. (1985). Trust as social reality. Social Forces, 63(4), 967–985. Love, A. (1991). Internal evaluation. London: Sage. Luhmann, N. (1979). Trust and power. Chichester: Wiley. Luhmann, N. (1988). Familiarity, conﬁdence, trust: Problems and alternatives. In: D. Gambetta (Ed.), Trust making and breaking of cooperative relations (pp. 94–107). Oxford: Basil Blackwell. MacDonald, B. (1978). Accountability, standards and the process of schooling. In: T. Becher & S. Maclure (Eds), Accountability in education. Windsor, Berkshire: NFER. Marquand, D. (2004). Decline of the public. Cambridge: Polity. Misztal, B. (1996). Trust in modern societies. Cambridge: Polity. Misztal, B. (2001). Normality and trust in Goffman’s theory of interaction order. Sociological Theory, 19(3), 312–324. Mouritsen, P. (2003). What’s the civil in civil society? Robert Putnam, Italy and the Republican tradition. Political Studies, 51, 650–668. Neave, G. (1988). On the cultivation of quality, efﬁciency and enterprise: An overview of recent trends in higher education in western Europe, 1986–1988. European Journal of Education, 23, 1–2. Norris, N. (2005). The politics of evaluation and the methodological imagination. American Journal of Evaluation, 26, 584–586. O’Neill, O. (2002). A question of trust: The BBC Reith Lectures 2002. Cambridge: Cambridge University Press. Pagden, A. (1988). The destruction of trust and its consequences in the case of eighteenthcentury Naples. In: D. Gambetta (Ed.), Trust making and breaking cooperative relations (pp. 127–141). Oxford: Basil Blackwell. Pawson, R., & Tilley, N. (1997). Realistic evaluation. London: Sage. Pollitt, C., & Bouckaert, G. (2000). Public management reform: A comparative analysis. Oxford: Oxford University Press. Power, M. (1997). The audit society. Oxford: Oxford University Press. Putnam, R. D. (1995). Bowling alone: America’s declining social capital. Journal of Democracy, 6, 65–78. Rothstein, B. (2005). Social traps and the problem of trust. Cambridge: Cambridge University Press. Ryan, K. (2002). Shaping educational accountability systems. American Journal of Evaluation, 23(4), 453–468. Sanders, J. R. (2002). Presidential address: On mainstreaming evaluation. American Journal of Evaluation, 23(3), 253–259. Schon, D. (1971). Beyond the stable state. London: Temple Smith. Scott, M., & Lyman, S. (1968). Accounts. American Sociological Review, 33(1), 46–62. Seligman, A. (1997). The problem of trust. New Jersey: Princeton University Press. Sennett, R. (2003). Respect. London: Penguin. Shore, C., & Wright, S. (2000). Coercive accountability: The rise of audit culture in higher education. In: M. Strathern (Ed.), Audit cultures. London: Routledge. Symonette, H. (2004). Walking pathways towards becoming a culturally competent evaluator: Boundaries, borderlands, and border crossings. New Directions for Evaluation, 102, 95–109. Sztompka, P. (1999). Trust: A sociological theory. Cambridge: Cambridge University Press. Taut, S., & Alkin, M. (2003). Program staff perceptions of barriers to evaluation implementation. American Journal of Evaluation, 24(2), 213–226.

Evaluation and Trust

153

Uslaner, E. (1999). Democracy and social capital. In: M. Warren (Ed.), Democracy and trust (pp. 121–150). Cambridge: Cambridge University Press. Uslaner, E. (2000). Producing and consuming trust. Political Science Quarterly, 115(4), 569–590. Uzzi, B. (1997). Social structure and competition in inter-ﬁrm networks: The paradox of embeddedness. Administrative Science Quarterly, 42, 35–67. Wholey, J. S. (2002). Managing for results: Roles for evaluators in a new management era. American Journal of Evaluation, 22(3), 343–347. Widmer, T. (2004). The development and status of evaluation standards in western Europe. New Directions for Evaluation, 104, 31–42. Worthen, B. (1995). Some observations about the institutionalization of evaluation. Evaluation Practice, 16(1), 29–36. Yarbrough, D., Shulha, L., & Caruthers, F. (2004). Background and history of the joint committee’s program evaluation standards. New Directions for Evaluation, 104, 15–30.

This page intentionally left blank

CONTRIBUTORS TO THIS VOLUME AND THEIR CONTRIBUTIONS EDITORS Saville Kushner is seconded as Regional Advisor for Monitoring and Evaluation (Latin America and the Caribbean) at UNICEF, Panama. He holds the post of professor at the University of the West of England, UK and Visiting Professor at the University of East Anglia, UK. He is a theorist and practitioner of educational evaluation. Nigel Norris is professor of Education at the Centre for Applied Research in Education, University of East Anglia, UK. He is a theorist and practitioner of evaluation. In their introductory chapter the ﬁeld of NPM is reviewed and laid out – its form and function within models and practices of bureaucracy. Questions are raised about the implications for NPM on democracy and for democratic processes. CONTRIBUTORS Peter Dahler-Larsen, PhD, is professor at the Department of Political Science and Public Management, University of Southern Denmark. He is interested in social and cultural aspects of evaluation. He is president of European Evaluation Society. In his chapter he argues that the form of evaluation is ‘constitutive’ of forms of social and organizational action. Within the NPM context, this constitutive effect has certain consequences for action that are explored with an analysis of the use of performance indicator-based regimes. This is a much-elaborated version of Stake’s (1986) proposition that social science has the tendency to shape responses to its own interrogations. Leslie K. Goodyear is a Research Scientist at Education Development Center, Inc. in Newton, Massachusetts. She holds a Masters of Science and PhD in Human Service Studies from Cornell University where her major 155

156

CONTRIBUTORS TO THIS VOLUME

concentration was program evaluation. Dr Goodyear’s areas of interest include designing and implementing qualitative, interpretive, mixed-method and participatory-democratic evaluation studies and exploring creative formats for the representation of evaluation ﬁndings to diverse audiences. In her chapter we have an account from inside the world of NPM – but from someone who seeks to counter its worst ‘depersonalizing’ effects. Where NPM and its associated evaluation has a tendency to distance the evaluator from her audiences, Goodyear looks for forms of representation and reporting grounded in drama and poetry that close those gaps and focus on the empowerment of the individual and group. Chris Husbands is Dean of the School of Education and Lifelong Learning at the University of East Anglia and has been director of the national evaluation of children’s trusts for the English Department of Education and Skills. His chapter takes a case-based approach, drawing from the national evaluation of Children’s Trusts in the UK. He uses the case to analyse the shaping of government reform programmes within NPM/low-trust accountability settings. He concludes that such programmes have implications for new forms of governance, and relationships between central and local governance. Paul Mason is a Senior Researcher at the Institute of Applied Social Studies, University of Birmingham, UK. He specializes in applied qualitative research and evaluation with children, families and communities and the agencies and services that work, or seek to, with them. His chapter takes a case-based approach, looking at the National Children’s Fund Evaluation in England to illustrate and explore the tensions for evaluation within an NPM framework. The chapter looks at the politics of programmes as they emerge under such regimes, and raises questions about the way in which we learn from social programmes in this context. Ron Ritchie is Dean of the Faculty of Education at the University of the West of England. He has been actively engaged in school improvement initiatives with head teachers and other school leaders for many years. He has previously written books which cover aspects of the primary curriculum, assessment, school improvement and leadership. His recent research has focused on distributed leadership in schools. His chapter is an account written within the values-framework of NPM. Written by a reﬂective advocate of the School Improvement movement, the account focuses on school self-evaluation in England and its positive promises for school development and self-determination. Ritchie nonetheless

Contributors to this Volume

157

acknowledges and explores the tensions between self-evaluation and low-trust accountability/inspection regimes. Katherine E. Ryan is an Associate Professor of Educational Psychology at the University of Illinois at Urbana. Her research interests include educational evaluation and the intersection of educational accountability and high stakes testing. Her chapter looks at the emergence of NPM in schooling in the USA. Ryan takes a change perspective, choosing two exemplary vignettes 30 years apart to look at the impact of accountability regimes. Her perspective on NPM is linked to globalization and to the shift of control of schooling away from communities to more abstract instantiations of power. She looks at the implications of such movements for schooling and raises questions about the evaluation response to them. Thomas A. Schwandt is Professor of Education at the University of Illinois at Urbana-Champaign where he holds appointments in the Department of Educational Psychology and the Department of Educational Policy Studies. In his chapter he looks at the ethics of evaluation within contexts of NPM and ‘neo-liberal’ market-related regimes of institutional governance. He argues that these new regimes raise questions about the moral basis of evaluation, arguing that the ethical practice of evaluation is one of ‘moral criticism’ of the social and political positioning and conduct of evaluation as well as its technical construction and relationships. Christina Segerholm has a PhD in education and is currently associate professor at Umea˚ University, Department of Education. She holds a similar position at Mid Sweden University. Her main research interests are impacts of evaluative activities at different levels in education systems, and to policy on evaluation and quality assurance/assessment. In her chapter Segerholm looks at a key feature of NPM, the complexities of relationships between central and local (devolved) authority. In comparing governance and evaluative activities in schools in Sweden and in the USA, Segerholm directs attention to the occasional effect the devolution of powers can have on the concentration of authority to political centres.

This page intentionally left blank