Improving Survey Response Lessons learned from the European Social Survey Ineke Stoop The Netherlands Institute for Soci...
35 downloads
1102 Views
3MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
Improving Survey Response Lessons learned from the European Social Survey Ineke Stoop The Netherlands Institute for Social Research/SCP, the Netherlands
Jaak Billiet Centre for Sociological Research, Belgium
Achim Koch GESIS – Leibniz Institute for the Social Sciences, Germany
Rory Fitzgerald Centre for Comparative Social Surveys, United Kingdom
Improving Survey Response
WILEY SERIES IN SURVEY METHODOLOGY Established in Part by Walter A. Shewhart and Samuel S. Wilks Editors: Graham Kalton, Mick P. Couper, Lars Lyberg, J. N. K. Rao, Norbert Schwarz, Christopher Skinner A complete list of the titles in this series appears at the end of this volume.
Improving Survey Response Lessons learned from the European Social Survey Ineke Stoop The Netherlands Institute for Social Research/SCP, the Netherlands
Jaak Billiet Centre for Sociological Research, Belgium
Achim Koch GESIS – Leibniz Institute for the Social Sciences, Germany
Rory Fitzgerald Centre for Comparative Social Surveys, United Kingdom
This edition first published 2010 Ó 2010, John Wiley & Sons, Ltd Registered office John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, United Kingdom For details of our global editorial offices, for customer services and for information about how to apply for permission to reuse the copyright material in this book please see our website at www.wiley.com. The right of the authors to be identified as the authors of this work has been asserted in accordance with the Copyright, Designs and Patents Act 1988. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by the UK Copyright, Designs and Patents Act 1988, without the prior permission of the publisher. Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic books. Designations used by companies to distinguish their products are often claimed as trademarks. All brand names and product names used in this book are trade names, service marks, trademarks or registered trademarks of their respective owners. The publisher is not associated with any product or vendor mentioned in this book. This publication is designed to provide accurate and authoritative information in regard to the subject matter covered. It is sold on the understanding that the publisher is not engaged in rendering professional services. If professional advice or other expert assistance is required, the services of a competent professional should be sought.
Library of Congress Cataloguing-in-Publication Data Improving survey response : lessons learned from the European Social Survey / Ineke Stoop ... [et al.]. p. cm. Includes bibliographical references and index. ISBN 978-0-470-51669-0 (cloth) 1. Social surveys–Response rate. 2. Social surveys–Response rate–Europe. 3. Public opinion research. 4. Public opinion research–Europe. I. Stoop, Ineke A. L. HM538.I47 2010 301.072’3–dc22 2009051057
A catalogue record for this book is available from the British Library. ISBN: 978-0-470-51669-0 Set in 10/12 Times by Thomson Digital, Noida, India Printed and bound in the United Kingdom by TJ International Ltd., Padstow, Cornwall
Contents Preface and Acknowledgements
ix
List of Countries 1
Backgrounds of Nonresponse 1.1 Introduction . . . . . . . . . . . . . . . . . . . 1.2 Declining Response Rates . . . . . . . . . 1.3 Total Survey Quality and Nonresponse 1.4 Optimizing Comparability . . . . . . . . .
xiii . . . .
. . . .
.. .. .. ..
. . . .
. . . .
.. .. .. ..
. . . .
. . . .
.. .. .. ..
. . . .
.. .. .. ..
. . . .
. . . .
.. .. .. ..
. . . .
. . . .
1 1 2 3 5
2
Survey Response in Cross-national Studies 9 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2 Harmonization Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.3 Contactability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.4 Ability to Cooperate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.5 Willingness to Cooperate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.5.1 Social environment and survey culture . . . . . . . . . . . . . . 17 2.5.2 Households and individuals . . . . . . . . . . . . . . . . . . . . . . 19 2.5.3 Survey design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.5.4 Interviewers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.5.5 Interviewer–respondent interaction: why people cooperate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.6 Nonresponse Bias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.6.1 What is nonresponse bias? . . . . . . . . . . . . . . . . . . . . . . . 29 2.6.2 Combating and adjusting for nonresponse bias . . . . . . . . 33 2.7 Ethics and Humans . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3
The European Social Survey 3.1 Introduction . . . . . . . . . . . . . . . . . . . 3.2 What is the European Social Survey? . 3.2.1 Aims, history and philosophy . 3.2.2 Content . . . . . . . . . . . . . . . . . 3.2.3 Participating countries . . . . . . 3.2.4 Organization and structure . . .
. . . . . .
................... ................... ................... ................... ................... ...................
39 39 39 39 42 43 44
vi
CONTENTS 3.3
ESS Design and Methodology . . . . . . . . . . . . . . . . 3.3.1 The central specification . . . . . . . . . . . . . . 3.3.2 Quality and optimal comparability . . . . . . . 3.3.3 Sampling designs, procedures and definitions of the population . . . . . . . . . . . . . . . . . . . . 3.3.4 Fieldwork and contracting . . . . . . . . . . . . . 3.4 Nonresponse Targets, Strategies and Documentation 3.4.1 Background . . . . . . . . . . . . . . . . . . . . . . . . 3.4.2 Requirements and guidelines . . . . . . . . . . . 3.4.3 Definition and calculation of response rates 3.4.4 Contact forms . . . . . . . . . . . . . . . . . . . . . . 3.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Appendix 3.1 A Contact Form as Used in ESS 3 . . . . . . . 4
5
6
.......... .......... ..........
49 49 51
. . . . . . . . .
. . . . . . . . .
55 57 59 59 60 61 67 69 70
.. .. ..
75 75 76 76
. . . . . .
. . . . . .
78 81 82 82 85 87
. . . . . . . . .
. . . . . . . .
89 89 92 92 93 94 95 95 99 102
. . . . . . . . .
.. .. .. .. .. .. .. .. ..
. . . . . . . . .
. . . . . . . . .
.. .. .. .. .. .. .. .. ..
Implementation of the European Social Survey 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Basic Survey Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 Survey organization, administration mode and sample . 4.2.2 Sample size, number of interviewers and length of fieldwork period . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.3 Survey costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Practical Fieldwork Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.1 Interviewers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.2 Information and incentives . . . . . . . . . . . . . . . . . . . . . 4.4 Summary and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . Response and Nonresponse Rates in the European Social Survey 5.1 Data and Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Response and Nonresponse Rates in ESS 3 . . . . . . . . . . . . . . 5.2.1 Rate of ineligibles . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.2 Response rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.3 Structure of nonresponse . . . . . . . . . . . . . . . . . . . . . . 5.3 Response Rate Changes Over Time . . . . . . . . . . . . . . . . . . . . 5.3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.2 Response rate trends for specific countries . . . . . . . . . 5.4 Response Rate Differences and Fieldwork Efforts . . . . . . . . . . 5.4.1 Response rate differences across countries and fieldwork efforts . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.2 Change in response rates over time and change in fieldwork efforts . . . . . . . . . . . . . . . . .
. . . . . . . . .
.
102
.
106
Response Enhancement Through Extended Interviewer Efforts 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Previous Research on Contactability . . . . . . . . . . . . . . . . . . . . 6.2.1 Factors in establishing contact . . . . . . . . . . . . . . . . . . .
115 115 117 117
CONTENTS
7
8
vii
6.2.2 Who is hard to contact? . . . . . . . . . . . . . . . . . . . . . . . . 6.2.3 Call patterns and strategies . . . . . . . . . . . . . . . . . . . . . . 6.3 Previous Research on Cooperation . . . . . . . . . . . . . . . . . . . . . . 6.3.1 Covariates of cooperation . . . . . . . . . . . . . . . . . . . . . . . 6.3.2 Causes of cooperation and noncooperation . . . . . . . . . . 6.3.3 Attitudes towards surveys and reasons for refusal . . . . . 6.4 Sample Type and Recruitment Mode in the ESS . . . . . . . . . . . . 6.4.1 Sampling issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.2 Recruitment mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5 Establishing Contact in the ESS . . . . . . . . . . . . . . . . . . . . . . . . 6.5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5.2 Noncontact rates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5.3 Ease of contact and number of calls . . . . . . . . . . . . . . . 6.5.4 Timing of calls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.6 Obtaining Cooperation in the ESS . . . . . . . . . . . . . . . . . . . . . . 6.6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.6.2 Cooperation rates . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.6.3 Cooperation and number of contacts . . . . . . . . . . . . . . . 6.6.4 Reasons for refusal . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.7 Effects of Enhanced Field Efforts in the ESS . . . . . . . . . . . . . . 6.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Appendix 6.1 Response Outcomes in ESS 1, 2 and 3 (%) . . . . . . . . .
118 120 122 122 126 127 129 129 131 133 133 133 135 138 142 142 144 144 150 153 156 158
Refusal Conversion 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Previous Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.1 Research questions . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.2 How successful is refusal conversion? . . . . . . . . . . . 7.2.3 Which factors contribute to successful conversion? . . 7.2.4 Refusal conversion and data quality . . . . . . . . . . . . . 7.3 Refusal Conversion in the ESS . . . . . . . . . . . . . . . . . . . . . . 7.3.1 Efforts and effects . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3.2 Refusal type and refusal conversion . . . . . . . . . . . . . 7.3.3 Timing of refusal conversion attempts . . . . . . . . . . . 7.4 Refusal Conversion and Data Quality . . . . . . . . . . . . . . . . . . 7.4.1 Refusal conversion and sample representativeness . . . 7.4.2 Refusal conversion and measurement error in the ESS 7.5 Discussion and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . Appendix 7.1 Interviewer Variance in Cooperation Rates
161 161 162 162 164 166 168 171 171 176 182 188 189 191 199 202
.. .. .. .. .. .. .. .. .. .. .. .. . ..
Designs for Detecting Nonresponse Bias and Adjustment 8.1 What is Nonresponse Bias? . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2 Methods for Assessing Nonresponse Bias . . . . . . . . . . . . . . . . . 8.2.1 Comparing response rates across subgroups in samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
205 205 207 207
viii
CONTENTS 8.2.2
Comparing respondent-based estimates with similar estimates from other sources . . . . . . . . . . . . . . . . . . . 8.2.3 Comparing estimates between subgroups in the obtained samples . . . . . . . . . . . . . . . . . . . . . . . 8.2.4 Enriching the sampling frame data with data from external sources . . . . . . . . . . . . . . . . . . . . . . . . . 8.2.5 Contrasting alternative post-survey adjustments for nonresponse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3 Detecting and Estimating Bias in the ESS . . . . . . . . . . . . . . . 8.3.1 Post-stratification . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3.2 Comparing cooperative with reluctant respondents . . . 8.3.3 Using additional observable data collected for all target persons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3.4 The study of bias using core information on nonrespondents . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Appendix 8.1 Overview core variables and constructs . . . . . . . . . . Appendix 8.2 Questionnaires nonresponse modules . . . . . . . . . . . . 9
Lessons Learned 9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . 9.2 Standardization, Tailoring and Control . . . 9.3 Achieving High Response Rates . . . . . . . 9.4 Refusal Conversion . . . . . . . . . . . . . . . . . 9.5 Nonresponse Bias . . . . . . . . . . . . . . . . . . 9.6 Contact Forms and Fieldwork Monitoring 9.7 Into the Future . . . . . . . . . . . . . . . . . . . .
. . . . . . .
.. .. .. .. .. .. ..
. . . . . . .
. . . . . . .
.. .. .. .. .. .. ..
. . . . . . .
. . . . . . .
.. .. .. .. .. .. ..
. . . . . . .
. . . . . . .
.. .. .. .. .. .. ..
.
207
.
208
.
210
. . . .
211 214 214 229
.
243
. . . .
252 278 281 286
. . . . . . .
293 293 295 298 299 300 302 303
References
305
Glossary
321
Index
325
Preface and Acknowledgements Nonresponse occurs when a sample unit does not respond to the request to be surveyed. Nonresponse is generally considered to be one of the most important problems in survey research. In recent decades, many books and journal articles have been devoted to nonresponse. This book is a little different because it studies survey nonresponse from a comparative perspective. It will show how high response rates are aimed for in a cross-national survey – although not always achieved and usually requiring quite some effort – and why response strategies and response rates will differ across countries. The book draws heavily on the European Social Survey (ESS), a 30-country biennial face-to-face survey measuring changes in attitudes in Europe. Most of this book is therefore about the response from the general population to participating in a face-to-face social survey. It is far from easy to study nonresponse in cross-national surveys from a comparative perspective. Firstly, the design and implementation of these surveys in individual countries is often too different to allow for a proper comparison of response outcomes. When interview modes differ, a survey is mandatory in some countries and not in others, and proxy interviews are allowed in a few countries only, these factors will render response processes and response outcomes noncomparable. Secondly, in cross-national studies response rates are often provided nationally and not calculated in a standardized way. Finally, information on the response process – interviewer efforts, reasons for nonresponse – is rarely available. This is why the ESS is central in this book. It combines high quality standards (including high target response rates) with extensive documentation and meticulous recording of the fieldwork and response process. Based on ESS data, standardized response rates can be calculated and compared across different countries. A further reason for the focus on the ESS is that detailed tools and instruments have been developed and implemented which allow the fieldwork process to be closely monitored, in turn allowing the efficiency of different fieldwork strategies to be evaluated. In addition, there have been extensive analyses of ESS response processes and nonresponse bias and (follow-up) experiments, further enhancing its value as a data source.
x
PREFACE AND ACKNOWLEDGEMENTS
This book starts with a short introduction in Chapter 1. The focus is on the decline in response rates in recent decades, nonresponse within the framework of total survey quality, and comparability as a pertinent quality criterion in a cross-national study. One of the main themes of the book is that differences across countries in response rates, nonresponse composition (noncontact, refusal) and nonresponse bias can hamper optimal comparability. On the other hand, fieldwork and response enhancement efforts may have to differ across countries because of national conditions and survey cultures. The book then gives an overview of the nonresponse literature and provides a theoretical background to the later empirical discussions (Chapter 2). The focus here is on the causes of nonresponse, the results of efforts to reduce nonresponse, and auxiliary data that can provide information on the response process, nonrespondents and nonresponse bias. As the empirical parts of this book are mainly based on the European Social Survey, detailed information on the history, aims and design of the ESS is provided in Chapter 3. The advantages and disadvantages of a standardized approach in order to achieve optimal comparability are outlined, and a brief overview is given of the different social and political climates in which ESS fieldwork takes place. Attention is paid to the methodological aims of the ESS, different sampling frames and procedures in different countries and measures that have been taken to enhance response rates. In addition, the chapter introduces the contact forms that have been developed to closely monitor the ESS fieldwork. The information from these contact forms has been used to check whether fieldwork has been carried out according to the specifications, to assess the effectiveness of fieldwork procedures and to suggest future improvements, and also to allow the measurement of bias. The book not only presents the aims and design of the ESS, but also gives an overview of the actual implementation of the Survey in the participating countries (Chapter 4) and the main response outcomes in the first three rounds: response rates, refusal rates and noncontact rates (Chapter 5). This information forms a general background for the three empirical chapters of the book, which present results relating to the effects of extended interviewer efforts to establish contact and obtain cooperation (Chapter 6), the process and results of refusal conversion (Chapter 7) and the detection of and adjustment for nonresponse bias (Chapter 8). These three chapters include additional overviews of the relevant literature, review the usability of different types of auxiliary data and present the results of quasi-experiments. The book ends with a short summary and points for discussion in Chapter 9. This book has three major themes. Firstly, it shows that high response rates are possible, but that the real issue is nonresponse bias. Achieving high response rates is far from easy; detecting and adjusting for nonresponse bias is an even greater challenge. Secondly, it shows that enhancing response rates in a cross-national repeated cross-sectional survey is much more difficult than in a national crosssectional study, because efforts aimed at harmonization and optimal comparability may to some extent limit the possibilities for increasing response rates, and also because diverging response rates may threaten optimal comparability. Finally, the
PREFACE AND ACKNOWLEDGEMENTS
xi
book stresses throughout the importance of treating fieldwork as a controlled process, which implies close monitoring of fieldwork and the collection of detailed paradata on the fieldwork and response process through the ‘contact forms’. Only in this way can response rates be compared, analysed and improved. In order to gain most from reading this book, some prior knowledge of survey methodology is required, and an elementary knowledge of regression and multivariate analysis will be helpful for some sections. This book is the outcome of the work of many people and the support of several organizations. Firstly, of course, we have to thank the numerous respondents of the European Social Survey. We are grateful to the interviewers in more than 30 European countries, who visited all the target persons in their homes, tried to persuade them to cooperate and recorded every visit, the outcome, reasons for refusal and the chance of future successes on detailed contact forms. The fieldwork organizations in these countries struggled with nonresponse and worked hard to make the Survey a success. The National Coordinators made great efforts to translate the ESS requirements into national best practices, and the national funding organizations advanced the substantial sums of money required to run a high-quality survey. An overview of the fieldwork organizations, funders and National Coordinators is available on the ESS website at www.europeansocialsurvey.org. We are also grateful to the Central Coordinating Team (CCT) of the ESS for making part of the 2005 Descartes Prize money available for analysing the ESS nonresponse data. The follow-up studies presented in Chapter 8 were part of a Joint Research Activity of the ESS infrastructure project, financed by the EU under FP7. Increasingly, survey researchers are using the contact data from the ESS to test nonresponse models and further develop nonresponse theories. The input of a number of them is gratefully acknowledged in Chapters 7 and 8. We especially would like to express our appreciation and gratitude for the work of the “Leuven team”, in particular Hideko Matsuo and Koen Beullens. They helped in the design and preparation of questionnaires and forms, edited data files, performed complicated analyses and came up with ideas for quasi-experiments and with additional research questions. The errors are of course all ours. The authors are from four different organizations, each of which takes a keen interest in nonresponse and is a member of the CCT: the Netherlands Institute for Social Research/SCP (the Netherlands); K.U. Leuven (Belgium); GESIS, the German Leibniz Institute for the Social Sciences; and the Centre for Comparative Social Surveys, City University, UK. In cooperating on this book, we learned a great deal. We hope that others will learn something, and perhaps a great deal, from this book. Ineke Stoop, Jaak Billiet, Achim Koch and Rory Fitzgerald
List of Countries AT BE BG CH CY CZ DE DK EE ES FI FR GR HU IE IL IS IT LU LV NL NO PL PT RO RU SE SI SK TR UA UK
Austria Belgium Bulgaria Switzerland Cyprus Czech Republic Germany Denmark Estonia Spain Finland France Greece Hungary Ireland Israel Iceland Italy Luxembourg Latvia Netherlands Norway Poland Portugal Romania Russia Sweden Slovenia Slovak Republic Turkey Ukraine UK
1
Backgrounds of Nonresponse 1.1 Introduction In a high-quality face-to-face study, striving for high response rates often involves serious fieldwork efforts and high fieldwork costs. This does not, however, imply that high response rates are always achievable, even with these serious efforts and high costs. Nor will it always be possible – even with great efforts, and with response rates that are generally considered to be high – to avoid nonresponse bias. Arguably, even census surveys, where participation is mandatory, still suffer from some nonresponse bias. This bias is a greater problem in cross-national than in single-nation surveys, for two reasons. Firstly, different countries and different cultures may require – or be accustomed to – different strategies to enhance response rates. To make outcomes comparable, however, a single fieldwork strategy employed across countries might be preferable, such as insisting on face-to-face interviews in all countries. Given a strategy where many elements are harmonized across countries, high response rates might be more difficult to attain than in national studies, where a different range of fieldwork techniques can be used. At the same time, however, some elements of the fieldwork may have to differ between countries. For example, whether an address or individually named sample is used for sampling will determine the efficacy of an advance letter. In these cases differences in response rates and the substantive data collected might reflect methodological differences rather than the substantive differences that they are supposed to represent. Finally, if there are differences in the direction and amount of nonresponse bias between countries, this may pose a serious threat to comparability. The trade-off between standardization on the one hand and the need for nationally specific approaches on the other, as well as the complex relationship between response Improving Survey Response: Lessons learned from the European Social Survey Ineke Stoop, Jaak Billiet, Achim Koch and Rory Fitzgerald Ó 2010 John Wiley & Sons, Ltd
2
BACKGROUNDS OF NONRESPONSE
rates and nonresponse bias, will be a recurrent theme throughout this book. On the basis of experiences from the European Social Survey, described in detail in Chapter 3, an attempt will be made to unravel the response processes in more than 30 European countries and to show how overall response rates can be enhanced. Best practices and national successes can provide guidance on how to enhance response rates elsewhere, although it will transpire that successful strategies cannot always be transplanted with equal success to other countries. The book will also demonstrate that whilst maximizing response rates remains important for improving the representativeness of survey data, even when response rates go up or are consistently high, nonresponse bias remains a problem to contend with. This first chapter sets the stage for this book. It introduces the idea of optimal comparability, discusses declining response rates and sets nonresponse within the framework of ‘total survey error’.
1.2 Declining Response Rates Almost two decades ago Norman Bradburn, in his Presidential Address at the 1992 meeting of the American Association for Public Opinion Research, said: ‘we . . . all believe strongly that response rates are declining and have been declining for some time’. De Heer (1999a) compared response rates across countries and confirmed a decline in survey response in some countries, whereas it remained stable in others. He strongly advocated setting up a databank providing information on response rates, types of nonresponse and survey characteristics to facilitate more accurate crosssurvey response rates over time. De Leeuw and de Heer (2002, p. 52) studied response trends for a series of comparable surveys and concluded that countries differ in response rates, that response rates have been declining over the years and that nonresponse trends differ from country to country. The authors attributed the majority of these cross-national differences to differences in survey design and fieldwork strategies, especially the supervision and monitoring of interviewers. It should be noted, however, that the empirical evidence for declining response rates is not unambiguous (Groves, 1989; Smith, 1995; Schnell, 1997; Groves and Couper, 1998; Smith, 2002; Stoop, 2005). In practice, it is not easy to compare response rates over time, firstly because nonresponse rates are sometimes not computed or publicized at all, but more often because they are not computed uniformly. Secondly, survey sampling and fieldwork procedures often evolve – sometimes considerably – over time, making it difficult to isolate the source of any change. Changes in response rates could, for example, occur because of mode switches, or because of a move to using an address sample instead of a sample of individuals, or because of the appointment of a different fieldwork agency. Thirdly, identical nonresponse rates may hide differences in composition of the achieved sample. For example, high noncontact rates in the past may be reduced by extra field efforts but not result in higher response because of an increase in refusals.
TOTAL SURVEY QUALITY AND NONRESPONSE
3
In several recent surveys, the (apparent) downward trend in response rates has been halted or even reversed. Response rates went from 46% in 1999 to above 70% in recent waves of an attitude survey in Flanders (Billiet, 2007a). Stoop (2008) gives an overview of response rates on a series of Dutch surveys and finds that after sometimes serious decreases they are now back at the original level. It should be noted that achieving such an increase was fairly costly. Furthermore, the high response rates obtained in earlier surveys could be partly due to less strict sampling and fieldwork procedures and the less strict calculation of response outcomes. On the basis of this fairly limited evidence, one could conclude that the decline in response rates can to some extent be halted and even reversed by increasing fieldwork costs and efforts. Nonetheless, as the preface to Survey Nonresponse stated: ‘Declining cooperation rates increase the cost of conducting surveys . . . [and] . . . can also damage the ability of the survey statistician to reflect the corresponding characteristics of the target population’ (Groves et al., 2002, p. xiii). The problem, of course, is that continually increasing survey costs is not an option in the long term for many surveys.
1.3 Total Survey Quality and Nonresponse Indicators of survey quality cover a wide range of different aspects of the survey lifecycle. Originally, the focus was primarily on statistical accuracy. Nowadays, however, the importance of additional quality criteria such as relevance (to users), accessibility, interpretability, coherence and timeliness is also widely acknowledged (Lyberg, 2001; Fellegi, 2001; Lyberg et al., 2001). The European Statistics Code of Practice1 encompasses a series of quality aspects that are also highly relevant for surveys. It emphasizes the importance of commitment to quality and sound methodology. With regard to nonresponse, the narrower concept of survey errors is most relevant. Survey errors determine the statistical accuracy of survey outcomes, that is the precision of the final results and the size of bias. Bethlehem and Kersten (1986) present a taxonomy of errors in sample surveys that can cause a discrepancy between the survey estimate and the population characteristic to be estimated (see Figure 1.1). This discrepancy is called the ‘total error’ and can be subdivided into sampling error and nonsampling error. Sampling errors are caused by the sampling design and can be subdivided into selection and estimation errors. The former can be mitigated by computing design weights (H€ader and Lynn, 2007); the latter can be minimized by increasing the sample size. Nonsampling errors can be subdivided into observation errors and nonobservation errors. Observation errors are errors made during the process of obtaining or recording answers. Overcoverage is one type of observation error, and it will occur when elements not belonging to the target population – for instance, holidaymakers from abroad – are selected in an address sample and are asked to participate in a survey 1
http://epp.eurostat.ec.europa.eu/portal/page/portal/quality/introduction/
4
BACKGROUNDS OF NONRESPONSE Total error
Sampling error Estimation error
Nonsampling error Observation error
Selection error Overcoverage error Measurement error Processing error Nonobservation error Undercoverage error Nonresponse
Figure 1.1 Components of total survey error (based on Bethlehem and Kersten, 1986, p. 13) when the target population are the regular inhabitants of that country. Another type of observation error relates to process errors, which can occur during data entry or editing. Finally, there are measurement errors. Groves (1989) names four sources: the respondent, the interviewer, the questionnaire and the mode. The respondent may, for instance, give incorrect answers that are more socially desirable or engage in satisficing; that is, choosing the answer that requires least thought, because the respondent wants to spend as little time as possible on answering questions (Krosnick, Narayan and Smith, 1996). The interviewer may not pose the questions correctly, or may skip the required introduction, or may probe too little or too much. The questionnaire will produce measurement errors (see, e.g., Saris and Gallhofer, 2007a) when there is an incomplete route from the ‘true’ value held by the respondent and the response categories provided in the survey questions. Finally, different survey modes may each elicit different response patterns or error types. Telephone questions necessarily have to be short, and showcards cannot usually be used – but neither consideration applies to face-to-face surveys in the same way. Interviewers in a face-to-face survey may receive socially desirable answers (Holbrook, Green and Krosnick, 2003), and web surveys may be prone to satisficing. Nonobservation errors can be due to undercoverage of, or nonresponse by, the target population. Undercoverage, the opposite of overcoverage, occurs when elements belonging to the target population are not included in the sampling frame. One example of undercoverage is when people with an ex-directory telephone number,
OPTIMIZING COMPARABILITY
5
only a mobile phone or no telephone connection at all cannot be sampled through the telephone book despite the stated intention to use the phone book to sample all residents within a specific geographical area. The final component is nonresponse. Nonresponse can be subdivided into item nonresponse and unit nonresponse. Item nonresponse occurs when people do not answer a particular question or set of questions. Unit nonresponse means that people who are in the sample do not participate in the survey, because they cannot be reached, cannot participate or refuse to participate. The overview of total error in Figure 1.1 shows that nonresponse is only one of the many potential causes of error in a survey. Nonresponse is sometimes considered, it could be argued erroneously, to be the main indicator of survey quality – this is erroneous because a survey with a high response rate but with very poor questions will never be a good survey. One reason why so much value is placed on the response rate is that it is a single figure that – in principle – is simple to compute (American Association for Public Opinion Research, 2008). One of the dangers of placing such a high value on the response rate is that it is tempting to calculate it creatively and, for instance, to maximize the number of noneligible cases, thus producing inflated response rates. Another danger of using the response rate as an indicator of nonresponse error is that it indicates the potential for bias but is not linearly correlated with it. The actual size of the bias depends not only on the response rate but also on the difference between respondents and nonrespondents on the actual variables measured in the survey. Response rates can be an indicator of survey quality, but can never provide concrete proof. We agree with Groves (2006, p. 668): ‘Response rate improvement efforts should be guided by some knowledge of how groups likely to be affected by the efforts relate to key survey variables.’ Despite this, low response rates can seriously undermine the representativeness of a survey, since with low response rates there is a greater potential for bias. Increases in contact attempts and the associated increase in response rates, even when small, are therefore usually expected to lead to an increase in the representativeness of the sample. As noted earlier, there is no response rate that guarantees good quality or even the absence of nonresponse bias, although 99% would be close (Groves, 2006). Even census estimates are probably prone to bias. Platek and S€arndal (2001, p. 11) discuss the misunderstandings regarding response rates in a special issue of the Journal of Official Statistics that examined the quality of surveys conducted by official statistical agencies: ‘All users view high nonresponse as one of the main threats to data quality. But in assessing the effect of a stated rate of 36 percent, they risk falling back on stereotyped ideas, found perhaps in well-meaning texts, or heard from colleagues of the type “a nonresponse rate of more than 30 percent (or a similar rule of thumb) will render the survey data useless”.’
1.4 Optimizing Comparability In cross-national surveys, the primary aim is usually to facilitate comparability between countries. Overviews of what is meant by comparability are given by Van de
6
BACKGROUNDS OF NONRESPONSE
Vijver (2003) and Johnson (1998). The ‘principle of equivalence’ (Jowell, 1998) originates from the natural sciences, where experiments are run according to strictly uniform standards in order to ensure the comparability of repeat measures of the same phenomenon. A single variable might be manipulated, whilst all others are held constant, to allow the effect of this to be monitored. With social surveys the same principle needs to be maintained, otherwise conclusions cannot be reliably drawn. When strictly identical procedures cannot be implemented – due to practical or country differences – the design should ensure that comparisons between countries can still be reliably drawn. Equivalence in cross-national studies, as argued by O’Shea, Bryson and Jowell (2003), Lynn, Japec and Lyberg (2006) and Jowell et al. (2007), is made difficult by linguistic, cultural and conceptual barriers that impede strict comparability, by country-specific differences and preferences in modes of interviewing (Skjak and Harkness, 2002), coding and, above all, sampling, and by wide variations in response rates, interviewer training and socio-demographic classifications. Chapter 3 will describe how optimal comparability is achieved in the European Social Survey; Chapter 4 explains how this general principle has been implemented in practice. As will be shown in Chapter 3, a persistent problem in the pursuit of optimal comparability in cross-national surveys is that it cannot – and is not expected to – always be achieved through the wholesale adoption of precisely the same methods or procedures across different countries. On the contrary, it may sometimes be preferable to facilitate variation in certain procedures precisely in order to achieve comparable outputs. In some cases, variation in design cannot be avoided. As there is no European sampling frame, for instance, optimal sampling procedures have to be devised for each individual country for European cross-national surveys (see Kish, 1994). To obtain high response rates and minimize nonresponse bias, knowledge of and adaptation to national factors is required. Different countries have distinct survey attitudes and survey traditions, and ease of contact and reluctance to cooperate differ between cultures. Johnson et al. (2002, p. 68) suggest that social participation patterns, socio-economic opportunities and communication styles may influence cultural values and survey nonresponse. In a cross-national analysis of nonresponse in the Labour Force Survey, de Leeuw and de Heer (2002, pp. 52–3) found that noncontact rate variation was associated with differences between countries in average household size and the percentage of young children, whilst differences in refusal rates were associated with economic indicators (unemployment rate, inflation rate). Couper and de Leeuw (2003) compiled evidence that countries differ not only in response rates, but also in the composition of nonresponse, due to differences in survey modes and fieldwork procedures. The number of requests to participate in surveys will differ across countries. In countries where surveys are rather new, higher survey cooperation may be expected. This is the scarcity (‘this is the chance to give your opinion!’) argument (Hox and de Leeuw, 2002). As mentioned earlier, survey conditions and traditions differ across countries. In a number of countries, in-person surveys are becoming rarer, in others
OPTIMIZING COMPARABILITY
7
incentives are never used, and in some countries doorkeepers and concierges might seriously impede making contact with respondents. Because of this, different strategies to enhance response rates will be required. On the other hand, unnecessary variation between strategies used in different countries should be avoided (de Heer, 2000). According to de Heer (1999a, pp. 136–7), three factors may explain crossnational differences in response on the same survey: .
General design factors, such as mode of data collection, panel versus crosssection, person or household, proxy permitted or not, and substitution permitted or not.
.
Practical fieldwork strategies, such as formal call scheduling, number of contact attempts, use of refusal conversion, interviewer incentive and respondent incentive.
.
Factors related to survey organization, such as voluntary participation in surveys, or terms of employment of interviewers.
Some of these factors – for example, the number of contact attempts – are under the influence of the researcher in the conceptual model of Groves and Couper (1998), which will be discussed in the next chapter. Others are difficult to influence (a survey being mandatory or voluntary). Still others should be seen as more or less given (interviewer payment), although it might be possible to influence these to an extent. If factors are under the influence of the researcher, attempts can be made to minimize unnecessary differences between countries. Chapter 3 will describe the attempts made in the ESS to maximize optimal comparability. Even where response-enhancing strategies and final response rates differ, a precise definition of outcome codes and a standardized computation of response rates is of paramount importance. This is a particular problem in cross-national surveys, in which nonresponse can produce different types and levels of nonresponse bias in different countries – different methods of computing across countries have been typical in the past. Nonresponse will not be a problem in comparative research where that nonresponse is completely random (see the discussion of bias in Section 2.6). It will be a problem when it is not random and therefore causes bias. Note that bias is a characteristic of individual variables, not of an entire survey. Nonresponse bias can cause differences between countries even where they have identical headline response rates. Differences are more likely when there are different causes of nonresponse in different countries (for example, high refusal rates in one country but high noncontact rates in another). Differences may also be more likely when there are (large) differences in nonresponse rates between countries. Major problems will occur when response rates differ between countries and nonresponse bias differs, either in size or in the variables that are biased. It should also be noted that the response rates, the composition of nonresponse and nonresponse bias in core variables are not necessarily stable between survey rounds in repeat studies. This makes the study of nonresponse in repeated comparative surveys even more complicated than in a single cross-national survey.
8
BACKGROUNDS OF NONRESPONSE
Documentation of the fieldwork and the response process and measurement of nonresponse bias is the only way to study and potentially solve these problems. Through standardized recording of each attempt to contact potential respondents – for example, by using contact forms (see Chapter 3) – directly comparable computation of response rates should be possible. In addition, on the basis of information from the contact forms and auxiliary data on respondents and nonrespondents, nonresponse bias can be estimated (Billiet, Koch and Philippens, 2007).
2
Survey Response in Cross-national Studies 2.1 Introduction There is a vast amount of literature on survey response and nonresponse. The list of references at the end of this book contains a wide range of readers and articles on how to enhance response rates, how to schedule calls, the impact on response of advance letters, incentives, the type of sponsor, the topic of the survey, which people are more or less likely to respond, the differences between recruitment modes, how to measure bias, how to correct for nonresponse, and many more issues. Most of the literature is based on national surveys, experiments and meta-analyses. The literature on nonresponse in cross-national studies is relatively scarce. This is a shortcoming for two reasons. Firstly, survey participation is likely to have a national or cultural component; this means that findings on the use of incentives in the United States, for instance, cannot automatically be transferred to Finland or Bulgaria or South Africa or Japan. Secondly, one of the main aims of cross-national surveys is to compare countries. In order to do this, methodological differences between countries that can have an impact on survey outcomes should be minimal. This means that large differences in response rates, in nonresponse composition or in the nonresponse bias should be minimized. Put simply: a fairly high response rate in all countries in a comparative study might be better than a very high response rate in some countries and a very low one in others. This chapter gives an overview of the literature on survey participation, and describes the factors behind nonresponse from a comparative point of view. Wherever
Improving Survey Response: Lessons learned from the European Social Survey Ineke Stoop, Jaak Billiet, Achim Koch and Rory Fitzgerald Ó 2010 John Wiley & Sons, Ltd
10
SURVEY RESPONSE IN CROSS-NATIONAL STUDIES
possible, the question will be considered as to whether the results of cutting-edge experiments or seminal books will be valid in every country, and what they mean in a cross-national study. The text starts (Section 2.2) with a discussion of a number of different models for harmonizing surveys in different countries. It then seeks to deconstruct survey response into its usual components; namely, contactability (Section 2.3), ability to cooperate (Section 2.4) and willingness to cooperate (Section 2.5). Since few persons are normally not able to cooperate, this category will only be mentioned for the sake of completeness, but will be left out of the analytical chapters of this book. Obtaining cooperation or, conversely, refusal to cooperate, will receive most attention, partly because refusal is usually the largest category of nonresponse. A second reason for the focus on cooperation is that the decision to cooperate or refuse may be influenced by the topic of the survey, whereas it is unlikely that contactability will depend on this. Refusal may therefore be more likely to cause bias. Nonresponse bias will be covered in Section 2.6. In their seminal book on nonresponse, Groves and Couper (1998) present conceptual models of both phases of survey participation: contacting target persons and obtaining cooperation. These models have had a great impact on nonresponse theory and research. They do not quite fit cross-national surveys, however, where the same factors may mean something else in different countries or may have a different impact, and where fieldwork is tendered and much less directly under researcher1 control than in the governmental surveys discussed by Groves and Couper. For this reason, adapted models on contactability and cooperation will be presented in Sections 2.3 and 2.5. Section 2.6 presents basic theories on bias, discusses how nonresponse bias can hamper comparability, and presents three types of auxiliary variables that can help us to learn more about bias: reasons for refusal, indicators of difficulty in obtaining cooperation and core information on nonrespondents. The chapter ends with a small warning in Section 2.7. Surveys and interviewing are very much a human enterprise; when conducting a survey, specifications, control and meticulous preparation should be mixed with respect, enthusiasm, sympathy and compassion.
2.2 Harmonization Models As mentioned above, very little research has been published on nonresponse from a comparative point of view. Exceptions are de Heer (1999a), de Leeuw and de Heer (2002), Johnson et al. (2002), Couper and de Leeuw (2003), Lyness and Kropf (2007), 1
The distinction between factors out of researcher control and those that are under researcher control is taken from Groves and Couper (1998, p. 30). Under researcher control assumes control over survey topic, design and implementation, and fieldwork. Factors out of researcher control are external factors (social environment, householder characteristics) that are a given. From this book it will be clear that in many cases survey design and topic, and survey implementation and fieldwork, are under the control of different parties, and that the situation is even more complicated in cross-national surveys.
HARMONIZATION MODELS
11
Billiet, Koch and Philippens (2007) and Billiet et al. (2007). According to Jowell (1998), one of the reasons for the scarcity of methodological research based on crossnational studies was that, at least at that time, cross-national surveys generally accepted lower standards of rigour than national surveys. Luckily, partly as a consequence of the methodological targets of the European Social Survey (see Chapter 3), this situation has been changing rapidly in the last few years. Another reason for the scarcity of comparative studies on nonresponse is that controlled experiments across countries are extremely rare. Finally, standardized monitoring and controlling of fieldwork across countries appears to be very difficult (see, e.g., EQLS, 2007). More recently, cross-national surveys have become increasingly important and the methodological pitfalls of cross-national research are rapidly gaining attention. Before turning to nonresponse, this section discusses different harmonization models for comparative studies and their possible ramifications for the study of nonresponse. K€ orner and Meyer (2005) distinguish three different ideal models of harmonization strategies for socio-demographic information in European statistics. The major distinction is between output and input harmonization. Output harmonization is the standard approach in European official statistics. Final definitions or target outputs are binding, but the methodology used to collect these results is allowed to vary across countries (although there are limits with respect to sampling, for example). Output harmonization can take two forms: ex-ante and ex-post. According to the first model, a new survey is developed to meet statistical requirements. The International Social Survey Programme (ISSP) is based on ex-ante output harmonization: the core questionnaire is supposed to be identical, but the way background variables are collected, the survey mode (face-to-face, telephone) and the instrument (stand-alone survey, drop-off) are allowed to differ. Ex-post output harmonization means that preexisting national sources are used and converted into a harmonized format for crossnational comparisons. Champions of output harmonization – as in the European Survey on Income and Living Conditions (EU-SILC) – claim that this is the preferred strategy for comparative research because it allows better integration of a survey into the national statistical system, and in this way the ‘best national source’ can be used. Input harmonization, the third strategy, assumes that all methods, concepts and procedures are standardized from scratch. Advocates claim that this is the only way of attaining optimal comparability. Input harmonization requires new, dedicated, standardized data collection in every participating country, because existing sources can never meet this requirement. Chapter 3 demonstrates how the European Social Survey is an example of input harmonization. Lynn (2003b) approaches the issue from a slightly different perspective, distinguishing five strategies for cross-national surveys. His maximum quality approach is roughly similar to the ex-post output harmonization: each country should aim to achieve a survey of the best possible quality. This will ensure national quality but may, of course, result in inconsistencies that cause difficulties when trying to make comparisons between countries. This strategy can also legitimize countries with low quality standards to do a poor job: so in a country with low response rates in general, a low response rate in a comparative study would be equally acceptable. On the other
12
SURVEY RESPONSE IN CROSS-NATIONAL STUDIES
hand, countries with high quality standards will be expected to invest heavily to keep up their standards, which may be considered unfair. The opposite of the maximum quality approach is the consistent quality approach, in which the lowest common denominator would be the standard for each country. For instance, in order to ensure comparability, every country would be expected to realize the response rate of the country with the lowest response rate. This would force well-performing countries to do worse than usual, and possibly to use obsolete instruments (PAPI in countries where CAPI is normal). This strategy may not be enforceable: setting a low response rate as a target might be achievable, but equal nonresponse bias will be impossible to attain. These two extreme approaches are ideals rather than practical strategies. One more realistic approach is the constrained maximum quality approach, in which a number of key aspects of the survey design are constrained and others are left to the individual countries. This approach seems similar to ex-ante output harmonization. In this model, it would be possible to enforce random sampling and use one centrally designed questionnaire, but leave the data collection mode to the discretion of the participating countries. A second, less controlled approach is the target quality approach: this is a variant of the consistent quality approach, with one major difference – instead of taking the lowest level as a standard, the practices of the best performing countries are used. Thus if a response rate of 80% is possible in one of the countries, for example, every country should aim for this. This strategy is thus aimed at raising the standards of poorer-performing countries to a higher level. It should be borne in mind here that not all countries will be able to achieve the target, and that the efforts required to reach the target may differ substantially across countries. The final strategy presented by Lynn (2003b) is the constrained target quality approach. Here, challenging targets are set that should ensure consistency, but the approach is focused on a number of key constraints. It can be seen as a mix between the aim of raising standards and quality in some countries as far as is achievable and that of achieving consistency on key dimensions. Chapter 3 will show how the latter strategy is implemented in the European Social Survey. What do these strategies mean for nonresponse? Harmonization with respect to nonresponse is possible in many areas, including in relation to the calculation of response rates, monitoring of the response process, efforts at response enhancement, the way respondents are approached (face-to-face, by telephone), and the number and timing of calls. Standard guidelines for the calculation of response rates are given by AAPOR (American Association for Public Opinion Research, 2008). However, even when using the AAPOR standard disposition codes, outcomes may be difficult to compare where national situations differ. With respect to eligibility, for instance, there might be challenges cross-nationally because of the use of different sampling frames and different migration patterns in different countries. For example, response rates in a country where the sample is drawn from the population register and many people temporarily move abroad as labour migrants will be difficult to compare with response rates in a country that uses an address sample and where people rarely move. More comparability issues can (and will, in Chapter 3) be addressed with respect to
CONTACTABILITY
13
nonresponse. Sections 2.3, 2.4 and 2.5 present the major factors behind survey participation, and focus on cross-national differences and factors that may hamper an equivalent approach. These factors will be discussed in more detail in Chapter 6, which focuses on the European Social Survey. The overview presented here is more general.
2.3 Contactability Figure 2.1 presents a conceptual model for contactability in cross-national face-toface surveys. As in Groves and Couper’s 1998 model, the likelihood of contact ultimately depends on the at-home pattern of the target persons and the number and timing of calls. The accessible at-home pattern of target persons (‘accessible’ meaning that they are at home and the interviewer can reach them there) is related to the general social environment, their socio-demographic characteristics and Social environmental attributes • General daily, weekly timeuse patterns • Neighbourhood characteristics • General survey traditions • ...
Survey characteristics • Length of fieldwork period • Type of sampling frame • Information on sampling units • Recruitment mode • …
Physical impediments • Doorkeepers • Apartment building • Intercom • Gated communities • Bad roads • …
Interviewer attributes • Training • Remuneration • Experience • Available working hours • …
Number of calls Timing of calls
Accessible at-home pattern Socio-demographic attributes • Family size • Employment • Children at home • …
Likelihood of contact • Individual sample • Contact with whom • Household/address sample • Contact with whom • Household informant for selection
Figure 2.1 A conceptual model of contactability in cross-national face-to-face surveys. Source: adapted from Groves and Couper (1998, p. 81)
14
SURVEY RESPONSE IN CROSS-NATIONAL STUDIES
physical impediments that stand in the way of contactability. The number and timing of calls is in turn a consequence of the survey characteristics and interviewer attributes. These different factors are discussed in this section. The focus is on those aspects where countries are likely to differ (for instance, the sampling frame used) and on those factors that may mean something different in different countries (for instance, the ubiquity of telephone or face-to-face surveys). The general social environment is likely to influence the socio-demographic characteristics of sample persons, as well as the presence of physical impediments, and is thus likely to have a direct and indirect influence on the at-home behaviour of target persons. The social environment is likely to differ across countries. Issues of crime and safety in inner-city neighbourhoods, commuting in large cities, the acceptability of unannounced home visits, and the general willingness to open the door to interviewers during the evening and at weekends may determine the ease of contact. Persons who spend most of their time at home will generally be easy to contact. For contact purposes, it is generally sufficient that one responsible adult is at home, who can pass on the interview request to the target respondent. This suggests that differences in female employment in Europe can be related to contactability (see Romans and Kotecka, 2007; Stoop, 2007). In general, there seem to be substantial differences in at-home behaviour between men and women in Europe (European Communities, 2004). Swedish women spend 66% of their time at home, Hungarian women 78%. For men, the figures are 60% in Sweden and 69% in Hungary. More extreme are the differences in female employment in Europe, ranging from less than 35% in Malta to more than 70% in Norway, Sweden and Switzerland. Several other socio-demographic factors have been found to contribute to contactability. Older people and larger families are usually easier to reach, while single persons, people living in large cities and those living in apartments are less easy. It is by no means certain, however, that these factors have the same effect in every country. Another factor affecting contactability is the presence of physical impediments that can obstruct contact between interviewer and sample persons in different ways (see also Groves and Couper, 1998, p. 88). Firstly, contact with the sample person is impossible when a doorkeeper refuses entry. Secondly, door gates, dead-bolt locks and alarm systems may indicate target persons who would not easily let an interviewer into their house and are probably reluctant to be interviewed. Thirdly, apartment buildings with locked communal entrances and intercom systems make face-to-face contact in the recruitment phase difficult, preventing interviewers from showing their ID cards and a copy of the advance letter. Interviewers will also be less able to tailor their behaviour to perceived features of the target person (on tailoring, see Groves and McGonagle, 2001), and under these circumstances it is easier for target persons to refuse cooperation. These impediments also conceal whether or not a dwelling is empty, or whether the resident is at home but not opening the door. This makes it impossible to determine the eligibility of the sample unit or to distinguish between ‘target person not at home’ and ‘target person not willing to be interviewed’. The presence of such impediments will differ across countries. Whereas gated communities are not common in all parts of Europe, doorkeepers may be hard to get past in
CONTACTABILITY
15
major cities. Alarm systems, vicious dogs and intercoms at the entrance to apartment buildings will be more usual in some countries than in others. The general social environment, socio-demographic characteristics and presence or absence of physical impediments will determine the at-home behaviour, or rather the accessible at-home behaviour, of target persons. It should be noted that the factors in each box of Figure 2.1 may not only have different values, but also different effects in different countries. Contacting target persons in rural Russia may be quite different from rural Luxembourg; inner-city neighbourhoods in London may be more challenging than in Ljubljana; or trying to find target persons at home in Parisian suburbs may be more difficult than in the suburbs of Lisbon. In addition, living in an apartment in France may be different from living in an apartment in Bulgaria. In fact, living in an apartment in central Paris may be quite dissimilar from living in an apartment in a suburb of Paris. Contactability is not only a characteristic of target persons; it also depends on the survey design and the interviewers. In a multi-country survey, different sampling frames will generally be used out of necessity. In an individual sample, the target person has been identified in advance. Contact in this case can mean contact with that person, or with someone else in the household who is not the target person. In a country with an individual sample and a high rate of within-country migration, a high proportion of target persons may have to be followed to new addresses. The absence of forwarding addresses will therefore result in many noncontacts. Having moved may result in ineligibility when the move is to a new country, or unknown eligibility when it is unknown where the persons concerned have moved to and whether this move is temporary or permanent. In a household or address sample, a household member will often have to help the interviewer to select the designated respondent. The extra step required may result in a refusal before the target person has been identified. The type of sampling frame can therefore have a major impact on the contact rate. Other aspects of the calling procedure will also influence contactability. The longer the fieldwork period, the more calls can be made: if the length of fieldwork differs across countries, this will have an effect on the number of calls that can be made and ultimately on the contact rate. Many survey organizations restrict the number of calls to be made and prescribe specific times to call (for instance, at least one evening call). Regardless of the at-home pattern of the target person, increasing the number of calls and introducing greater variety in the timing of calls will increase the contact rate. The mode in which first contact is made and the timing of these calls is also important. Even in some face-to-face surveys, respondents may be recruited by telephone (see Chapter 3). Telephone calls are less expensive than in-person calls, and it is much easier to make evening calls by telephone than travel in the dark to distant addresses. The mode of calls should therefore be incorporated in an analysis of contactability across countries. National habits also play a role. In-person calls without prior warning are considered impolite in some Nordic countries. With regard to the timing of calls, it should be noted that evening visits may be considered less usual, or less safe in some countries, regions or neighbourhoods, especially in winter. A complicating factor here is that there are vast differences in Europe with respect to
16
SURVEY RESPONSE IN CROSS-NATIONAL STUDIES
when people take their meals (and should perhaps not be disturbed) and when the evening starts (McCrum, 2007). In addition, the acceptability of weekend calls differs between countries. Even an identical call schedule may therefore have a different impact on the contact rate in different countries. Interviewers, of course, play a central role in the contacting phase of a face-to-face survey. The likelihood of an interviewer making contact will depend on his or her tenacity and knowledge of local situations. In a cross-national survey, interviewer training may vary, not only in terms of necessary practicalities (how to select a respondent), but also in its duration or the emphasis placed on the importance of establishing contact. Interviewers who are not used to random sampling (in those countries where quota sampling is mostly used by commercial organizations) are likely to become easily demoralized when they have to make many calls to try and reach the target person, rather than simply interviewing someone who happens to be available when they call and who conforms to the quota criteria. Remuneration differences across countries can also have a major impact. Interviewers who are paid only for successful interviews may be unlikely to make great efforts to contact people who are rarely at home. These customary national practices may be hard to change and may cause contact patterns to vary between countries.
2.4 Ability to Cooperate Not everyone will be able to cooperate in a survey. Persons who suffer from dementia or learning disabilities will have difficulty answering cognitively demanding questions; persons who are illiterate or blind will struggle to complete mail questionnaires; persons who are deaf will find it difficult to participate in a face-to-face interview; and persons who do not speak the language in which the questionnaire is formulated will not be able to participate. Of course, survey design characteristics can be adapted to allow these persons to participate. Someone who is ill may be perfectly willing to cooperate following his or her recovery, if the fieldwork period is long enough; and blind or illiterate persons can answer questions in a face-to-face survey. Labelling an elderly target person as mentally incapable of being interviewed may be an easy way out for an interviewer afraid of a very long interview. Setting upper age limits and excluding the very elderly may be normal practice in some countries but not in others. To ensure comparability, equal age limits (or none at all) must be uniformly prescribed for all countries. This means that interviewers in some countries may be more used to approaching and interviewing the elderly than in others. In most social surveys, the target population excludes the nonresidential population. This means that, for instance, persons who live in institutional homes for the elderly or other types of senior housing facilities will be excluded from the sample. In countries where a high proportion of the very elderly live in these facilities, the response rate is likely to be higher than in countries where the very elderly usually live with their families. In the latter case, their nonparticipation will be recorded as nonresponse. Also, in most social surveys the questionnaire is only fielded in the main
WILLINGNESS TO COOPERATE
17
language or the main languages spoken. People who only speak a minority language will then generally not be able to participate. Differences in population structure and in the number and types of migrants and language problems mean that the proportion of people not able to participate will differ between countries. It will not always be clear whether a person is unable or unwilling to cooperate. If a person is moving house, has a death in the family or is struggling with flooded sewers, survey cooperation is very unlikely and will have little to do with interviewer efforts to obtain an interview. Despite this, these cases are usually classified as refusals, largely because at the next visit the person may be perfectly willing to cooperate. On the other hand, someone who does not speak the language very well may use this to hide a refusal, and – as mentioned above – someone who is ill may get better and cooperate later.
2.5 Willingness to Cooperate Figure 2.2 depicts a conceptual model for cooperation in cross-national surveys, again adapted from Groves and Couper (1998). One difference compared with the original model is that in a cross-national study much less is under the control of the researcher. This will be the case particularly where the fieldwork is conducted by different survey organizations in different countries. In this case, great efforts will have to be made to minimize house effects. Another major difference is that, as in the case of the contact model, similar factors may play a different role in different countries. This overview does not make the distinction as presented by Groves and Couper between factors that are under the researchers’ control and factors that are not. In a cross-national study, neither the survey design nor characteristics of the fieldwork organization and interviewers are generally under the national researchers’ control: the picture is therefore simplified. The aspects shown in the different boxes in Figure 2.2 are not exhaustive: the overview is intended to give a general idea of the additional problems that can be encountered when trying to achieve survey cooperation in a cross-national survey. The general idea of the model is that the interaction between the target person and the interviewer determines the degree of survey cooperation. This interaction is a result of characteristics of the target persons and their social environment, and the characteristics of the interviewer and the survey organization, and the survey design.
2.5.1
Social environment and survey culture
As mentioned earlier in this chapter, there are important differences in social environments in cross-national studies, and it is therefore possible that identical characteristics or design will have a different impact in different countries. Important aspects of the social environment influencing survey culture could be major events that have an impact on attitudes towards surveys, the survey sponsor or the survey topic; the economic and political environment and the trust in institutions; the
18
SURVEY RESPONSE IN CROSS-NATIONAL STUDIES
Social environment • General daily, weekly time-use patterns • Neighbourhood characteristics • General survey traditions • Privacy regulation • ...
Households and individuals • Urbanicity • Household structure • Socio-demographic characteristics • Sex • Age • Education • Socio-economic position • Minority ethnic group • Survey attitudes and experience • …
Survey characteristics • Sponsor • Type of survey organization • Length of fieldwork period • Timing fieldwork • Type of sampling frame • Recruitment/interview mode • Respondent selection • Advance letter, incentive • Efforts (number of calls, refusal conversion) • Fieldwork monitoring • Length questionnaire • Topic • …
Interviewer attributes • Training and briefing • Remuneration • Experience and expectations • Socio-demographic characteristics • Affinity with topic • …
Household–interviewer interaction
Cooperation or refusal
Figure 2.2 A conceptual model of cooperation in cross-national face-to-face surveys. Source: adapted from Groves and Couper (1998, p. 120) predominance of survey administration modes (face-to-face, telephone, mail, web); the perceived legitimacy of surveys; survey scarcity; and regional or neighbourhood characteristics. As an example, telephone surveys are quite predominant in the Nordic countries and Switzerland, but rare in central European countries (see Figure 2.3). In former communist countries, social surveys were scarce; in small countries such as Switzerland and Luxembourg, the survey burden for individuals is likely to be much higher than in larger countries. If social surveys are rare or new, people may either be pleasantly surprised that their opinions are considered important for research and policy-making, or they may be wary of unknown interviewers who seem to be prying. Earlier studies (see Groves and Couper, 1998) have identified urbanicity, population density, crime rates and social disorganization as social environmental influences on survey cooperation. Here, too, the problem mentioned earlier with regard to contactability arises: superficially identical neighbourhood characteristics may have
WILLINGNESS TO COOPERATE
19
60 50 40 % 30 20 10
Bulgaria
Romania
France
Poland
Slovak Republic
Hungary
Latvia
United Kingdom
Greece
Denmark
Portugal
Czech Republic
Netherlands
Spain
Russia
Slovenia
Belgium
Sweden
Finland
Norway
Switzerland
0
Figure 2.3 The proportion of turnover of telephone research as a percentage of turnover of quantitative research. Source: Ó Copyright 2008a by ESOMARÒ—The World Association of Research Professionals. This paper first appeared in ESOMAR, published by ESOMAR. different meanings in different countries. Urbanicity and population density may reflect a lack of social connectedness in some countries or cities, but not in others.
2.5.2
Households and individuals
Survey response is often related to background variables, although it is generally acknowledged that factors such as age, sex and urbanicity are not causes of nonresponse but, at most, correlates. When identifying advanced age as a correlate of survey cooperation, for instance, it is important to specify what ‘age’ stands for. Elderly persons may be more likely to form a single-person household, have a lower education level, be less adroit at filling in complicated forms, and have a greater distrust of strangers or a higher sense of civic duty. Younger people may be confronted with a wide range of stimuli that vie for their attention. Both of these are reasons not to cooperate. On the other hand, younger people might be more curious about efforts to seek information from them, and have more experience of standardized information and form-filling. Age could be a correlate of many different factors, which might explain why Groves and Couper (1998, p. 136) found few robust effects of age on survey cooperation. A complicating factor in a cross-national study is that the relationship between background characteristics and survey response might differ across countries. Age groups may not show the same cooperation pattern in different countries, and age generational groups (teenagers, the middle-aged) may encompass different age cohorts
20
SURVEY RESPONSE IN CROSS-NATIONAL STUDIES
in different countries. In addition, the impact of socio-demographic characteristics may differ. Older people may be more susceptible to authority in some countries than in others, have a higher status or feel more socially isolated. Evidence on background variables as correlates of survey cooperation may be found in Goyder (1987), Bronner (1988), Brehm (1993), Tourangeau and Smith (1996), Couper and Groves (1996), Campanelli, Sturgis and Purdon (1997), Groves and Couper (1998), Goyder, Lock and McNair (1992), Goyder, Warriner and Miller (2002), Schmeets and Janssen (2002), Holbrook, Green and Krosnick (2003) and Stoop (2005). Frequently mentioned variables are sex, age, socio-economic status, belonging to an ethnic minority group and urbanicity. Lower cooperation by men in face-to-face surveys may be partly due to them being less frequently approached directly by the interviewer, as they are less often at home during the day and therefore less likely to receive a contact attempt. Other suggestions included in the literature are that people from lower socioeconomic backgrounds, people with a lower education level and immigrants participate less in surveys. Results from recent German and Dutch studies (Blohm and Diehl, 2001; Feskens et al., 2007) suggest that the lower participation by immigrants may be mainly due to language problems, contact difficulties, and a trade-off between less experienced native-language interviewers and more experienced interviewers who only speak the majority language. The well-known fact that survey cooperation in urban areas is generally low could illustrate why the possible effect of background variables is so difficult to pin down. Low response rates in urban areas can be due to: interviewer staff shortages; specific characteristics of the population (more singles, less stable resident population, more students, more ethnic minorities); the presence of apartment buildings with restricted entrances; or lower social cohesion in urban areas, which might result in less trust and greater fear of letting strangers into one’s home. Lack of trust of strangers, privacy concerns and fear of government intrusion might also be indicative of social isolation and thus affect responsiveness to surveys. There are hardly any studies that analyse whether individual characteristics have a different impact on survey cooperation in different countries. One interesting exception is a survey that looked at an appeal to civic duty in which Dutch respondents reacted with amusement and incomprehension to a statement adopted from a US study: ‘I think it is my civic duty as a citizen to agree to participate when asked in most surveys that come from a government organisation’ (Couper and de Leeuw, 2003, p. 170).
2.5.3
Survey design
A central dilemma when designing and administering cross-national surveys is that, in order to attain optimal comparability, researchers try to standardize design and procedures across countries (at least when they employ input harmonization). However, this is not always feasible (due to different constraints in countries), nor always desirable (because the same procedures may have different effects in different countries) (see Section 2.2). As a result, it will usually be the case that some elements
WILLINGNESS TO COOPERATE
21
of the survey design will be fixed in a cross-national survey, whereas others will be subject to national adaptation with greater national control. Figure 2.2 lists a series of survey characteristics that can have an impact on cooperation and which may differ, or have a different impact, in different countries. Firstly, where the fieldwork period is longer, there will be more scope for revisiting reluctant target persons. The timing of fieldwork may also have an effect on cooperation; fieldwork carried out during the Christmas period, for example, may find many target persons at home, but few who are willing to participate. The type of sampling frame (individual/household/address) can have an impact on contact (Section 2.3) and on cooperation. In an individual sample, for instance, a personal letter can be sent to the target person, and the interviewer does not have to go through the intermediate step of respondent selection. Individual sampling frames can sometimes also provide information that can be used to tailor the approach (age, sex, country of birth of target persons). The recruitment mode may also determine the cooperation rate. In general, it is assumed that refusing is easier by telephone. The type of sponsor and the type of survey organization are generally assumed to have an impact on survey cooperation. Government and academic surveys achieve higher response rates, probably because they appeal more to a sense of civic duty and authority than commercial surveys (Groves and Couper, 1998, p. 139; Dillman, 2000, p. 20). This may result in cross-national differences when a multi-country survey is conducted by different types of organizations in different countries. In most Western countries, advance letters are nowadays considered to be an important means of conveying information about the survey and inspiring trust in the fieldwork organization, interviewer and sponsor. Even if the target persons do not read it, a friendly, businesslike letter and a brochure describing the purpose and procedure of the survey will convey the message that this is a serious enterprise and that the people involved can be trusted. Advance letters may have an indirect effect too, as they might increase the interviewers’ confidence while seeking cooperation (Groves and Couper, 1998, p. 276; Dillman et al., 2002, p. 11). An advance letter may be less effective when it cannot be personalized and is addressed to ‘The Householder’ or ‘The Occupant’, which may reduce trust or the sometimes pleasant feeling for the prospective respondents of being specially selected (MortonWilliams, 1993, p. 61). This means that advance letters will be less effective when the survey is based on an address or household sample. Advance letters are considered to be most effective when they are received shortly before the interviewer makes first contact. A meta-analysis of advance letters in telephone studies by de Leeuw et al. (2007) showed universally positive effects. Their analysis comprised 29 studies, 21 of which were performed in the United States, four in Europe and four in Australia. It is not clear whether their results can be transferred to other countries, cultures or survey modes. There are few comparative studies on the effect of incentives. Most researchers seem to agree that a small incentive is generally seen as a nice gesture, both by interviewers and respondents, that a large incentive is not necessarily more effective than a small one, and that incentives do not have a detrimental effect on survey quality.
22
SURVEY RESPONSE IN CROSS-NATIONAL STUDIES
Prepaid, unconditional incentives seem to work best (Warriner et al., 1996). A recent Dutch study (Feskens et al., 2008) showed that incentives did increase response rates in general, but had no effect on immigrants, a group that was underrepresented from the beginning. Most studies on incentives have been conducted in the United States, Canada and the United Kingdom, countries where providing incentives is now quite usual. In many European countries, offering incentives is far less usual. Singer et al. (1999) made it clear that their analyses of incentive-giving was not necessarily generalizable outside of the United States and Canada: ‘Because the meaning of gifts as well as monetary incentives is likely to differ between cultures, and also because we were less confident about our ability to retrieve all relevant studies of incentives in countries other than the United States and Canada, the analysis is limited geographically as well’ (p. 219). There is little evidence for the differential impact of incentives in cross-national studies. In the ESS, the practice with regard to incentives differs widely across countries (see later chapters). This is partly because incentives are more common in some countries than in others. This could be due to initially high response rates (so no incentives necessary), or to habits and traditions. There is a fear that once incentives become a part of the survey process, respondents will expect them on every occasion, and there are even fears that they might increase nonresponse bias or lead to poor data quality. There is little evidence, however, that incentives have a negative effect on survey quality, and there is little reason for concern that respondents will always expect incentives when they have once received them (Singer, Van Hoewyk and Maher, 1998; Singer et al., 1999). Effects on response rates and survey outcomes stemming from the differences in procedures used by different fieldwork organizations are generally called ‘house effects’. House effects occur because – even when the keeping of call records, response monitoring, number and timing of calls, response calculation and so on are prescribed in great detail, and central guidelines and background papers are available – the fact that fieldwork in different countries is conducted by different organizations with their own staff, interviewers, policies and traditions is likely to have some impact on the data collected and can result in different response rates. The presence of house effects is well known in political polling, for example (see, e.g., Martin, Traugott and Kennedy, 2005). House effects are due to organizational traditions in the use of incentives and advance letters, interviewer training, refusal conversion practices and so on. Some of these differences may be due to cultural differences and be related to differences in the effectiveness of particular strategies. However, this still leaves a number of unnecessary differences that are mainly the result of different organizational traditions (de Heer, 1999a). These should be minimized where possible. Before this can be done, the presence of house effects should be identified and analysed. This requires a detailed insight into the practices and procedures of survey organizations. A specific survey characteristic that can have an impact on response rates is the presence of penalty clauses in contracts with sponsors. In some countries this is considered to be unethical, especially if they are related to achieving specific
WILLINGNESS TO COOPERATE
23
outcomes that are dependent on target persons. There are also concerns that they can lead to fraud. Instead of specifying a penalty for not achieving a specific contact rate, for example, the clause might therefore sometimes stipulate that every target person receives at least four contact attempts. In other countries, detailed penalty clauses are seen as a useful and necessary means of making sure that fieldwork will be carried out according to the specifications.
2.5.4
Interviewers
Interviewers are key agents in obtaining cooperation in face-to-face surveys (Loosveldt and Philippens, 2004). The prevailing belief is that interviewer experience is a critical factor in securing cooperation and that interviewers who are confident about their ability to elicit cooperation achieve higher cooperation rates (Groves and Couper, 1998, pp. 200, 215). There is little evidence that socio-demographic characteristics of interviewers may be related to survey response. These are usually out of the researcher’s control, and may in any event have effects that are not comparable across countries. It may, however, sometimes be fruitful to deploy an interviewer with different characteristics than one who has previously failed: ‘. . . refusal conversion efforts . . . often entail the use of interviewers who are different in sex and age from the initial interviewer’ (Groves, 1989, p. 218). Clear prescriptions and extensive training and briefing may help to reduce differences between organizations and countries. Two differences in interviewer staff between organizations may be particularly difficult to overcome; namely, experience with random sample face-to-face surveys and the remuneration of interviewers. These factors can be of importance in both the contacting and the cooperation phases of a survey. If face-to-face surveys are relatively unknown, or interviewers are used to quota sampling, standard random sampling procedures may seem strange and forbidding. Interviewers who are inexperienced in these areas may easily drop out when they are confronted with unknown procedures and unwilling target persons. Secondly, survey interviewing is neither a well-paid nor a very high-status job. Considering the importance of the role of the interviewer, it is key that their payment reflects the efforts required. Levels of interviewer pay and the pay structure are highly likely to affect interviewers’ willingness to try and enhance their response rates. The pay rate for the survey should reflect the length and complexity of the interview, the expected difficulties in obtaining cooperation and the amount of record-keeping demanded of the interviewer. ‘Bonus’ payments for interviews achieved above a certain response rate target may have a positive effect. However, any bonus system must be perceived as being fair. Different areas in which the interviewers work can vary considerably – often unpredictably – in the challenges they pose to the interviewers. Payment systems, assignment sizes, training programmes and other practical issues are specific to each fieldwork organization. Differences between countries can be expected. No empirical studies are available on the effect of such differences.
24
2.5.5
SURVEY RESPONSE IN CROSS-NATIONAL STUDIES
Interviewer–respondent interaction: why people cooperate
A key question in nonresponse research is why target persons choose to respond. The willingness to cooperate will be due to personal, survey and interviewer characteristics, the interviewer–respondent interaction, environmental conditions and the survey climate. Most theories on survey cooperation seek in some way to unravel the costs and benefits of being interviewed. According to the rational choice theory of decision-making, an individual will weigh the costs of participation in a survey against the perceived benefits, the cognitive burden and confidentiality risks (Groves and Couper, 1998, pp. 121–5). There is not much evidence for a fully rational approach, however, either with regard to survey participation or household finances, nor any other areas. Contrary to the rational choice approach, for instance, is the observation that the decision to cooperate can be taken within seconds (Sturgis and Campanelli, 1998, p. 7). Survey researchers (and interviewers) are well aware of this. Dijkstra and Smit (2002) analysed the impact of the target person answering ‘good evening’ to the first introduction in a telephone interview; Groves and Benki (2006) looked at the acoustic properties of the way target persons said ‘hello’. The importance of these first impressions means that the decision to cooperate may to some extent have been taken before the survey request has been formally made. Table 2.1 presents an overview of the different costs and benefits of survey cooperation identified in earlier national studies. These factors may partly be characteristics of the target persons and partly characteristics of the survey; they may partly result from experiences with previous surveys and they may be partly related to the interaction between the interviewer and the target person. The bottom row indicates that the length of the interview might increase the costs. Empirical findings on this issue are not clear-cut. A survey might also be considered too short (Dillman, 2000); respondents might like to have the opportunity to express their opinion fully, and talking at length about an interesting topic with a friendly interviewer may not be a cost at all. Related to this is the relationship between being busy and survey cooperation. Rather than being an impediment, it seems that busy people are in fact more likely to cooperate. This could, of course, also be a consequence of other characteristics of busy people. The most important reason for survey cooperation could be that people like surveys, or feel that surveys are important. Is there such a thing as an attitude towards survey participation? Many nonresponse researchers have addressed these questions by measuring the general attitude towards surveys, the attitude towards particular surveys, and the impact of substantive survey characteristics such as topic and sponsor (Goyder, 1986; Hox, de Leeuw and Vorst, 1995; Campanelli, Sturgis and Purdon, 1997; Couper, 1997; Loosveldt and Storms, 2001, 2003, 2008; Rogelberg et al., 2003; Singer, Van Hoewyk and Neugebauer, 2003; Stocke and Langfeldt, 2004). No crossnational information is available here. Table 2.2 gives a summary of studies in which survey attitudes are related to expressed or actual willingness to participate, survey outcomes and data quality.
WILLINGNESS TO COOPERATE Table 2.1
25
Costs and benefits of survey cooperation
Costs
Benefits
Survey fatigue, survey saturation . Goyder(1986)
Scarcity (this is the chance to give your opinion!) . Hox and De Leeuw (2002)
Selling under the guise of a survey (sugging) . Groves and Couper (1998) . De Leeuw (2001) . Stock e and Langfeldt (2004)
Trust, reassuring effect of interviewer Dillman (2000) . Holbrook, Green and Krosnick (2003) .
Cognitive burden, sensitive or difficult questions . Tourangeau and Smith (1996)
Enjoyment of thinking about interesting topics Satisfaction of being part of a socially useful or academically interesting enterprise Chance to influence government policy-making
Perceived invasion of privacy . Singer, Mathiowetz and Couper (1993) . Singer, Van Hoewyk and Neugebauer (2003)
Satisfaction of fulfilling a civic duty . De Kruijk and Hermans (1998) . Couper and De Leeuw (2003) . De Kruijk and Hermans (1998) . Loosveldt and Carton (2002) . Loosveldt and Storms (2001) . Loosveldt and Storms (2003) Being respected and valued Receiving social validation Being socially and politically involved . Brehm (1993) . P€ a€akk€onen (1999) . Voogt (2004) . Groves et al. (2004)
Feeling of being treated disrespectfully Morton-Williams (1993) . Dillman (2000) .
Fear of letting stranger into home . Holbrook, Green and Krosnick (2003)
Length of survey . Bogen (1996) . Dillman (2000) . Dijkstra and Smit (2002) . Loosveldt and Storms (2003) . Stock e and Langfeldt (2004)
Incentive Feedback on survey findings? Busy people (involvement, questionnaire routine) . Goyder (1987) . Zuzanek (1999) . P€ a€akk€onen (1999) . V€ ais€anen (2002) . Abraham, Maitland and Bianchi (2006) . Van Ingen, Stoop and Breedveld (2009) . Stoop (2007)
26
Table 2.2
Survey attitudes, participation and outcomes Relationship
Indicators
Outcomes
Smith (1984)
Reason for initial refusal, cooperation, survey outcomes
Propitiousnesss: situational factor (bad timing) Inclination: . transient problems: family problems, work pressure . permanent attitudes: fear of inspection, unpleasant experiences with earlier surveys, general concerns about confidentiality and privacy . personality traits: suspiciousness, misanthropy, misogyny, reclusion and paranoia
Propitiousness and inclination hard to separate: ‘Review of detailed interviewer comments on final refusals indicates that reasons for refusing are not always clear and that some people seem to offer a string of reasons hoping that the interviewer will accept one and leave them alone.’ (p. 486)
Hox, de Leeuw and Vorst (1995)
Survey attitudes and cooperation in follow-up survey
Attitudes do not predict final response behaviour very well
Couper (1997) Campanelli, Sturgis and Purdon (1997)
Reasons for refusal, introductory remarks, survey cooperation
General attitude towards surveys, specific intention to cooperate in a survey similar to the one that was later presented Lack of interest in topic
Too busy
Lack of interest in the topic: less likely to grant an interview; more items missing, differences in survey outcomes Too busy: little systematic variation
SURVEY RESPONSE IN CROSS-NATIONAL STUDIES
Authors
Survey attitudes and data quality
Attitudes towards ‘surveys like this’
Laurie, Smith and Scott (1999)
Reasons for initial refusal and final participation
Loosveldt and Storms (2001)
Survey attitudes, doorstep reactions, expressed willingness to cooperate in the future, survey outcomes Survey attitude and expressed willingness to cooperate in future survey
Initial refusal survey-related (confidentiality, too long, waste of time) Initial refusal respondent-related (too busy, notable, stressful situation, refusal by proxy) Singer, Van Hoewyk and Maher (1998) questions, doorstep reactions
Rogelberg et al. (2001)
Survey enjoyment: like filling in surveys, surveys are fun Survey value: a lot can be learnt from survey information; nothing good comes from completing a survey; useful ways of gathering information
Respondents who said they would not do the survey again did consider it a waste of time or not useful Respondents who disagreed that responsible persons should cooperate provided data of poorer quality Initial survey-related refusal more likely to be final
WILLINGNESS TO COOPERATE
Singer, Van Hoewyk and Maher (1998)
Doorstep reactions related to survey attitude Asking questions on the doorstep: negative attitude Survey enjoyment and survey value predict expressed willingness to participate in future surveys
27
(continued)
28
Table 2.2 (Continued) Relationship
Indicators
Outcomes
Rogelberg et al. (2003)
Behavioural intention and response behaviour in follow-up survey
Active nonrespondents less satisfied with the survey sponsor, less conscientious, more likely to leave the university and less agreeable Passive nonrespondents very similar to respondents
Voogt (2004)
Reason for refusal and willingness to participate in short telephone interview
Active nonrespondents: those who had indicated in the original survey that they would definitely not complete the follow-up survey Passive nonrespondents: those who did not express this negative intention, but ultimately did not respond Reason for refusal ‘not interested’ Other reasons
Source: Stoop (2005).
No differences in willingness to participate Not interested more often agreed that politics are too complicated for them
SURVEY RESPONSE IN CROSS-NATIONAL STUDIES
Authors
NONRESPONSE BIAS
29
Most empirical findings, with the exception of the 1995 study by Hox, de Leeuw and Vorst, indicate that situational factors result in random survey noncooperation. However, if nonparticipation is largely determined by the topic or the sponsor of the survey, nonresponse will not be ‘random’, and therefore cannot be ignored. This substantive aversion might be compensated for by external incentives. In addition, persons who harbour a strong dislike of surveys will be more difficult to convert than persons who do not cooperate for accidental reasons. This does not necessarily lead to bias, except when survey attitudes are related to survey topics, such as trust in government. In introducing surveys, it is considered good practice to emphasize that the topic of the survey is relevant to the interviewee, on the assumption that topic saliency is related to response behaviour (Brehm, 1993; Groves and Couper, 1998; Voogt, Saris and Niem€ oller, 1998; Mathiowetz, Couper and Butler, 2000). Groves et al. (2006) found that a topic would invite response if it was relevant to the respondent and if it was an agreeable topic. In a study by Groves et al. (2004), persons cooperated at higher rates in surveys on topics that were likely to be of interest to them. People who contributed to political organizations cooperated more on all topics, however, which might point more to social involvement than to topic relevance as a determinant of survey participation. Groves et al. (2006) concluded that emphasizing the topic might well result in an overrepresentation of respondents greatly interested in the topic, which would cause bias. Providing incentives might be one way to compensate for this. Although differences in survey culture are often presented as a reason for differences in response rates between countries, no studies on national differences in perceived benefits and costs of survey participation have been conducted, nor on differences in survey attitudes. This means that these general findings can only serve as a general background in the subsequent chapters.
2.6 Nonresponse Bias Unlike nonresponse rates, nonresponse bias is a characteristic of individual survey outcomes rather than the survey overall. In one and the same survey, one variable may be severely biased and another not at all (Groves, 2006). This complicates the study of nonresponse bias. This section discusses several definitions of nonresponse bias and explores the consequences for comparative studies.
2.6.1
What is nonresponse bias?
Nonresponse in surveys is important because it can have an effect on the precision and the bias of survey estimates (Lynn and Clarke, 2001; Bethlehem, 2002). The total nonresponse error is the difference between the outcomes of a survey with and without nonresponse. The Mean Square Error (MSE) of the actual outcome of the survey (with nonresponse) is the expected difference between all possible realizations of the survey and the true value (without errors). The higher the MSE, the lower is the accuracy of
30
SURVEY RESPONSE IN CROSS-NATIONAL STUDIES
the outcomes. The MSE comprises two components, the variance and the squared bias. The smaller the variance, the higher is the precision of the outcome. If nonresponse in a survey is high but is not related to the topic of the survey, the outcomes of the survey across all possible realizations will show a large variation, and the larger the variance, the lower will be the precision. If nonresponse affects the precision of a survey, this can be remedied by increasing the sample size. The bias is the type of error that affects the outcome of individual variables in all implementations of the survey design. Bias occurs when factors that cause nonresponse also affect the distribution of survey variables. Put differently, nonresponse bias occurs when there is no random process underlying response. In this case, increasing the sample size does not help. Figure 2.4 presents the Mean Square Error in a number of
a. No bias, middle precision
b. No bias, high precision
c. No bias, low precision
d. Bias type 1, high precision
e. Bias type 2, high precision
Figure 2.4 countries
f. Bias type 3, middle precision
The Mean Square Error (MSE) in high/low precision and high/low bias
NONRESPONSE BIAS
31
hypothetical countries (based on Biemer and Lyberg, 2003). In some countries bias is small (a, b and c: the average of all possible implementations of the survey is precisely in the middle). In countries d, e and f, bias would give us results that are far from the true mean: here we have bias. In countries b, d and e, the precision is high, but only in country b would we obtain correct and precise results. It should be noted that variance can differ according to size, but bias can affect different variables and cause an estimate to be too low or too high; hence the different types of bias in the figure. Bias can be seen as being caused by respondents differing from nonrespondents, or by survey participation being correlated to survey variables (see Groves, 2006, pp. 648–9). According to the first (stochastic) approach, nonresponse bias in a respondent mean could be expressed as: M ðY r Y m Þ Biasðyr Þ ¼ N where Biasðyr Þ is the nonresponse bias of the unadjusted respondent mean; yr is the unadjusted mean of the respondents in a sample of the target population; Y r is the mean of the respondents in the target population; Y m is the mean of the nonrespondents in the target population; M is the number of nonrespondents in the target population; and N is the total number in the target population. In thie way, the respondent mean differs from the mean of the full target population by a function of the nonresponse rate and the difference between respondent and nonrespondent means. Another way of formulating this is by highlighting the relationship between factors determining survey participation and key variables of the survey. Acknowledging that response rates vary greatly over different surveys and topics, it may be assumed that everyone has an unobservable ‘propensity’ (a probability, a likelihood) of being a respondent or a nonrespondent, which can be represented by ri. This led Bethlehem (2002) (also mentioned in Groves, 2006) to the following definition of bias: Biasðyr Þ ffi
syr r
where syr is the population covariance between the survey variable y and the response is the mean propensity in the target population over sample propensity r, and r realizations, given the sample design, and over recruitment realizations, given a recruitment protocol design. Another way of assessing the impact of nonresponse and distinguishing between random effects and bias are the three types of missing data from Little and Rubin (1987). Firstly, units may be ‘missing completely at random’ (MCAR). This would happen if bad sectors of a computer disk made a random part of the data inaccessible, or if interviewers who had their birthday in the fieldwork week sent their assignment back unused. MCAR nonresponse would reduce precision due to smaller final sample sizes. Precision can be increased by increasing the initial
32
SURVEY RESPONSE IN CROSS-NATIONAL STUDIES
sample size. This simple situation is, however, rather unlikely to occur in practice. A more likely scenario is that units are ‘missing at random given covariates’ (MAR). This would occur, for instance, if women responded less often than men, but where there were no differences between responding and nonresponding women. Weighting by sex will adjust the estimates but will reduce precision when weights are large. A larger sample size for women could restore precision. This type of nonresponse bias can thus be easily corrected by weighting with the relevant auxiliary variables. The worst case occurs when survey variables are related to response propensity and are ‘not missing at random’ (NMAR). This is called ‘nonignorable nonresponse’ and occurs when nonresponse is related to survey variables that are not measured for nonrespondents. NMAR occurs, for instance, when those interested in politics are eager to participate in election surveys, socially isolated persons refrain from answering questionnaires on social participation, crime victims do not open the door to an interviewer, hospital patients cannot participate in a health survey, and healthy, outdoor people cannot be contacted with the request to participate in a time use study. In this case, nonresponse will not only reduce precision, but will also increase nonresponse bias (Lynn et al., 2002b). In the NMAR case, increasing the sample size or weighting for nonresponse does not solve the problem and thus does not produce more accurate estimates. This implies that nonresponse rates are not necessarily the main problem but, rather, bias due to nonresponse. Enhancing response rates may not help in the NMAR case if the additional respondents are dissimilar from the final refusers. This may explain why a number of recent studies report that high response rates are not necessarily better. Keeter et al. (2000) compared the results of a rigorously conducted survey with a less rigorous version and found few differences. Merkle and Edelman (2002) concluded from an analysis of nonresponse in exit polls that there is no relationship between the nonresponse rate (here mainly due to refusal) and error. They feel that their results ‘buttress Krosnick’s (Krosnick,1999) conclusion that “the prevailing wisdom that higher response rates are necessary for sample representativeness is being challenged” and that “it is no longer sensible to presume that lower response rates necessarily signal lower representativeness”’ (p. 541). They also cite a case where a slight increase in response rates significantly increased bias, and end by stating: ‘This should serve as an eye-opener to those who pursue higher response rates for their own sake without considering the impact of survey error. Devoting limited resources to increasing response rates with little or no impact on survey error is not money well spent, especially when that money might be better spent reducing other sources of error (see Groves, 1989).’ Extensive analyses of nonresponse bias in the European Social Survey, and more theory, will be presented in Chapter 8. Here, it will be sufficient to highlight that nonresponse bias is not linearly related to nonresponse rates, that nonresponse bias is affected by the differences between respondents and nonrespondents, and that nonresponse is a problem particularly when response behaviour is related to core variables of the survey.
NONRESPONSE BIAS
2.6.2
33
Combating and adjusting for nonresponse bias
There are three general strategies for combating nonresponse bias.2 The first is by enhancing response rates up to a very high level (this would be the strategy of the reducers). The higher the response rates, the lower is the maximum bias that can occur. In other words, when response rates are low, bias can be very small or very large, but at high response rates bias is unlikely to be very large. High response rates will always reduce the maximum bias. Increasing response rates in the ‘missing completely at random’ (MCAR) case will have an effect on precision, but not on accuracy. The estimated value of the mean will be subject to random variations when the number of respondents is small, but there will be no bias. In the ‘missing at random given covariates’ (MAR) case, increasing response may be successful for one group and not for others. Weighting can correct for this, but is likely to increase the variance. However, no effects on bias should be expected. In the ‘not missing at random’ (NMAR) case, however, the relationship between response rates and nonresponse bias becomes complicated. If the extra respondents who are recruited differ from the original respondents, bias can be expected to reduce. If increased field efforts are most effective for those who are similar to the original respondents, bias may even increase. For instance, if response rates in large cities are low because students who frequently live in these cities do not participate, enhancing response rates by persuading more nonstudents to participate may result in an even more selective view of big-city inhabitants. As there is no linear relationship between response rates and nonresponse bias (Groves, 2006), and even in the same survey nonresponse bias can differ greatly across variables, the second strategy might be preferable. This would be to make sure specific groups – who are likely to differ from most respondents – have an equal ‘opportunity’ to participate. This could mean using different interview modes, handing out different types of incentives, calling at different times of the day and on different days of the week, fielding the questionnaire in minority languages and so on. This could be the strategy of the informed reducers: trying to increase response rates in an informed way (Groves, 2006). The third strategy is the strategy of the adjusters: to minimize nonresponse bias by converting ‘not missing at random’ (NMAR) into ‘missing at random given covariates’ (MAR) by discovering covariates of nonresponse. With auxiliary variables that explain the response process, nonresponse can be adjusted for. Several types of auxiliary data could be used for this purpose, such as information from the sampling frame, reasons for refusal, difficulty of obtaining response and asking core questions. A short typology of auxiliary variables is presented below. A more detailed discussion is presented in Chapter 8, which also provides a detailed description of how different types of auxiliary variables have been used in different weighting models in the European Social Survey.
2
More specific approaches to adjusting for nonresponse will be introduced in Chapter 8.
34
SURVEY RESPONSE IN CROSS-NATIONAL STUDIES
2.6.2.1 Information from population registers and the sampling frame Samples of individuals comprise names, addresses and usually a number of background variables such as date of birth and sex. Statistical offices in some countries (see, e.g., Schouten and Cobben, 2006) can link information from the population register to public registers and administrative records. These can provide auxiliary information that can be of great value when adjusting for nonresponse in factual surveys such as the Labour Force Survey and the Time Use Survey. Household, and particularly address, sampling frames are usually much less rich. In some cases lowlevel regional information can be linked to the sample units: neighbourhood characteristics, types of dwelling and aggregate information on the residents. If it is not possible to link individual information directly to the sampling frame, population data can also be used to assess the amount of bias (see Section 8.3.1).
2.6.2.2 Reasons for refusal In many studies, the reason(s) for refusal are recorded; for instance, ‘no time’, ‘wrong subject’, ‘surveys not useful’ or ‘surveys not pleasant’. Smith (1984) distinguishes between nonresponse due to situational factors and for more permanent reasons, and Rogelberg et al. (2003) distinguish between active and passive nonrespondents (see Table 2.2). It is generally expected that a specific refusal is more harmful than a general refusal. When studying bias, and when comparing response and cooperation across countries, the reasons for refusal can be useful auxiliary variables. There are two caveats here. Firstly, recorded reasons often comprise categories such as having ‘no time’ (bad timing, too busy, no time), it being the ‘wrong subject’ (not interested, don’t know enough/anything about subject, too difficult for me, object to subject), that ‘surveys (are) not useful’ (waste of time, waste of money) or that ‘surveys (are) not pleasant’ (interferes with my privacy/don’t give personal information, never do surveys, cooperated too often, don’t trust surveys, previous bad experience). Of course, the reasons expressed for refusal may not reflect the sample person’s ‘real’ opinion, and may just be a way of getting rid of the interviewer as soon as possible and avoiding discussions. In addition, different interviewers may have different ways of coding the reasons suggested and different field organizations may instruct their interviewers in different ways. Finally, it may be difficult to establish which reason for refusal reflected the deciding factor. Verhagen (2008) classified reasons for refusal in a Dutch attitude survey, which was complicated by the fact that the interviewer could record different reasons for refusal when the target person did not want to cooperate, and by the fact that many of the refusals were re-approached and different reasons again could be recorded if they still did not want to be interviewed. A comparison of reasons for refusal across countries should take these factors into account. A second caveat is that it may not always be clear who refuses. The refusal to cooperate may come from the target respondent or from another household member, or even from someone outside the household. This can be the parent of a younger
NONRESPONSE BIAS
35
person, who withholds parental consent, or a household member who refuses to refer the interviewer to the right person or refuses to help in selecting the target respondents. 2.6.2.3 Difficulty in obtaining cooperation Bradburn (1992) considers that the nonresponse problem is best resolved by: ‘. . . learning more about the characteristics of hard-to-get respondents and, with some additional efforts, even the nonrespondents. The data on these difficult respondents can then be used to do better weighting that can incorporate more information into the weighting procedure. While we may not have the resources to get high response rates across the board, we can allocate the data collection resources in a more targeted manner to learn more about the possible bias arising from low response rates.’ To identify respondent difficulty, one needs paradata – that is, information on the response process – and clear definitions of difficulty. Measuring difficulty, or willingness to cooperate, is not easy. The simplest indicator of reluctance is whether the respondent cooperated immediately (including securing an appointment for an interview) or whether more visits were required. The latter is often the case, when the target respondent refused initially but was later able to be converted to cooperate. Whether an initial refusal is permanent or temporary can only be known if the case is reissued and a contact is established once again. The process of re-approaching an initial (hopefully temporary) refusal3 and asking them again to participate in the survey is generally called ‘refusal conversion’. In some countries, re-approaching households after an explicit refusal is forbidden by confidentiality laws (D€aubler, 2002, p. 24). Estimates of the success rate of refusal conversion differ from 10 to 20% of the contacted nonparticipants for the United Kingdom (Lynn et al., 2002b, p. 147) to 20 to 50% from an overview of the literature by Schnell (1997, p. 190). If refusal conversion is allowed, different strategies can be implemented. One option is that all refusals are re-approached. This could result in complaints from angry target persons who made it explicitly clear on the first occasion that they had no intention of ever taking part in the survey. Such an option is in any case likely to be expensive and time-consuming. A second, more common option is that specific types of refusals may receive another visit; for instance, those refusals who live on the route of a second interviewer, or those who refused an interviewer who delivered low response rates in a particular survey. Yet another option is that interviewers are instructed only to revisit those sample units where, in the opinion of the original interviewer, there is an acceptable chance of them cooperating. The latter strategy is usually the cheapest and perhaps the most common, but it may also result in a more unbalanced final sample, with increased rather than decreased error. Deploying a new interviewer is the most common practice in refusal conversion (Groves, 1989, p. 218). 3
Temporary refusals are also somewhat euphemistically called ‘initially reluctant’ sample persons. One reason for this is that it is not always permitted to re-approach refusals (and many ‘initially reluctant’ sample persons cooperate in the end).
36
SURVEY RESPONSE IN CROSS-NATIONAL STUDIES
Sometimes better trained, more experienced interviewers or supervisors are utilized. Sometimes a new interviewer is explicitly chosen because he or she is of a different sex, age or ethnic background than the first interviewer. Sometimes, a second approach may be effective simply because the target respondent is in a better mood, less busy or likes the new interviewer better. Depending on the strategy followed, the meaning of being a converted respondent will differ across countries. Another indicator of difficulty is the number of unsuccessful contacts with the respondent before obtaining cooperation. It is also possible to record and classify doorstep interaction (Campanelli, Sturgis and Purdon, 1997; Couper, 1997; Loosveldt and Storms, 2001). Additional indicators would be interviewer assessment of future cooperation, and the deployment of additional incentives to convert initial refusers. In most studies, the simplest indicator of reluctance is used; namely, whether there was an initial refusal that was overcome at a later visit. The initial refusal is often called a ‘soft’ refusal. Sometimes it will be totally clear whether an initial refusal is final, often called a ‘hard’ refusal. In many cases, however, it will only be clear whether a refusal is soft or hard if an interviewer withdraws, returns at a later time and is then able or not able to complete an interview, or if the interview is reissued after a clear refusal and a new contact is established. The distinction between ‘soft’ and ‘hard’ refusals is difficult to quantify and very difficult to compare across countries, mainly because there will be large differences in the proportion of refusals between countries and the need and willingness to re-approach refusals, even ‘soft’ refusals. Being a ‘soft’ refusal may be more related to fieldwork strategies than to respondent characteristics. For this reason, caution is needed in treating soft refusals who are later converted as a proxy for the characteristics of final refusals. Brehm (1993, pp. 128–30) studied the relationship between increasing fieldwork efforts (more calls, sending a letter to try to persuade reluctant sample persons, trying to convert a refusal) and survey participation. The difficulty he found is that additional persuasion letters are only sent to reluctant respondents, and therefore seem to have a negative effect (as reluctant respondents more often turn into final refusers, and no persuasion letters are sent to respondents who cooperate instantaneously). As he remarks in a footnote (p. 130): ‘If one’s interest lies in how effective these techniques are . . . the persuasion letters and refusal conversions would have to be randomly assigned treatments, not treatments assigned on the basis of an initial refusal.’ As we will see throughout this book, the field of nonresponse generally suffers from a lack of experiments that allow the effect of different efforts to reduce nonresponse to be identified. Despite the attractiveness of the idea of using difficult respondents (with respect either to contact or to cooperation) as a proxy for final nonrespondents, previous studies are not very optimistic about such an approach. Many studies suggest that difficult respondents are not necessarily similar to final nonrespondents (Stinchcombe, Jones and Sheatsley, 1981; Smith, 1983; Smeets, 1995; Voogt, Saris and Niem€ oller, 1998; Borg, 2000; Curtin, Presser and Singer, 2000; Keeter et al., 2000; Lynn and Clarke, 2001; Teitler, Reichman and Sprachman, 2003; Stoop, 2004, 2005; Neller, 2005; Abraham, Maitland and Bianchi, 2006; Van Ingen, Stoop and
ETHICS AND HUMANS
37
Breedveld, 2009). An exception is Voogt (2004), who found an almost linear relationship between voter turnout and willingness to participate in a survey. 2.6.2.4 Information on core variables Information on core variables from nonrespondents is, of course, the best variable when trying to identify covariates of nonresponse.4 Two methods can be identified in the literature; namely, the Basic Question Approach (Bethlehem and Kersten, 1985; Bethlehem, 2009) and the Follow-up Survey among nonrespondents (Hansen and Hurvitz, 1946). Elliot (1991) compares the two methods. Both methods require a high response rate (Groves and Couper, 1998) to minimize the possible effect of bias from nonresponse by the nonrespondents. Bethlehem and Kersten (1985) introduced the Basic Question Procedure, similar to the Pre-Emptive Doorstep Administration of Key Survey Items (PEDAKSI) method put forward by Lynn (2003a). This boils down to putting a small number of basic or core or topical questions to all nonrespondents. One reason why refusers may still be willing to answer a small set of questions is because this is an example of the door-in-the-face technique (see Mowen and Cialdini, 1980; Hippler and Hippler, 1986; Groves, Cialdini and Couper, 1992): preceding a modest request (the basic questions) by a large request (long interview) appears to be a good strategy for prompting refusers to at least give some information. Some drawbacks remain, however, notably that it may be difficult to decide which questions are the key survey items, especially in a multi-topic survey, and that single core questions, when asked out of the context of the survey, may measure something different from when the same question is posed as part of a battery of similar questions in a long questionnaire. Crucially, of course, some nonresponse remains. A Follow-up Survey among nonrespondents implies drawing a subsample from the nonrespondents and asking them to answer the questionnaire. In a well-conducted survey, it may be assumed that the nonrespondents will be mainly refusals who could not be converted in the main survey. Obtaining their cooperation will require very well-trained and highly motivated interviewers, and possibly larger incentives (Stoop, 2004). This has the drawbacks that the sample will generally be small (because of the costs of deploying the best interviewers, handing out larger incentives and other response-enhancing methods) and there will usually be some delay in completing the fieldwork.
2.7 Ethics and Humans Survey researchers adhere to codes of standards and ethics. In the European Social Survey, fieldwork organizations are asked to sign the International Statistical 4
In rare cases, information on core variables is available from the sampling frame or from registers that can be linked to the sampling frame (see Schouten and Cobben, 2006).
38
SURVEY RESPONSE IN CROSS-NATIONAL STUDIES
Institute’s Declaration on Professional Ethics (1985). Similar codes of ethics have been developed by the American Association for Public Opinion Research (2005), the Council of American Survey Research Organizations (2009) and ESOMAR (2008b). These codes have in common the stipulations that conducting research may not be used as a guise for sales or solicitation purposes; the confidentiality of information provided by respondents must be protected; target persons must be informed about the purpose of the survey and the length of the interview; and, in general, target persons must be respected and informed. Survey ethics should guide expectations of the demands from interviewers and respondents. When studying the response process, it is all too easily forgotten that respondents and interviewers are real people, and not simply elements in a scientific experiment. Goyder (1987, p. 25) makes a sharp distinction between ‘heartless behaviourists’ who seem bent on cajoling and coaxing all these stupid ‘sample cases’ who don’t know what is good for them into responding, and those who feel that every citizen is perfectly able to make up his or her own mind on whether or not to cooperate, and should therefore be left to make their own informed choice without further interference and prodding from the interviewer: ‘In all, the profile of the uncooperative respondents identifies precisely the group who, in many sociologists’ eyes, forms the most maladjusted and missocialized segment of a society. . . . Implicitly, then, nonrespondents have been conceived as deviants too selfish or ignorant to perform their duty when approached by researchers’ (Goyder, 1987, p. 16). On the other hand, at the end of the day quite a lot of the refusals do cooperate, as we shall see in Chapters 6 and 7. This will certainly improve survey quality. What should always be ensured, however, is that target persons are treated with respect. This brings us to the second group of humans: the interviewers. They have a very tough job, are sometimes not well trained and are expected to record their approaches to target persons in terms of a few pre-coded categories that they may or may not understand. Groves (1989, p. 102), for example, discusses definitional and operational problems when a list of housing units is the frame: ‘Hours of discussion can be spent by those with years of experience in household surveys, trying to determine the appropriate status of particular living situations. In fact, however, most applications of the definition in the field are made by the interviewers or part-time staff for whom the listing operation is a small part of their full activities and who may have little experience in the task.’ The same holds for recording reasons for refusal, as differences in recording practice between countries and rounds show. Even contact forms as detailed as the ones used in the ESS can give only a superficial idea of what happened in the interaction between the target person and the interviewer, providing only partial insight into the decisive factors leading to cooperation or noncooperation. Neither interviewers nor other members of the fieldwork staff are robots (Billiet, 2007b). This should be acknowledged when interpreting the results of nonresponse analysis.
3
The European Social Survey 3.1 Introduction The European Social Survey (ESS) is an academically driven, large-scale, European, repeat cross-national social survey that was formally established in 2001. Most of the material in this book is based on data and experience from the first three rounds of the ESS (ESS 1, ESS 2 and ESS 3). This chapter presents an overview of the survey and provides a description of its approach to handling nonresponse. Section 3.2 starts with the aims, history and philosophy of the project; it then moves on to the content of the questionnaire, the number of participating countries and their characteristics and the general organization of the project. Section 3.3 gives a short summary of ESS design intentions, and the wide range of methodological measures used with a view to ensuring that these intentions are achieved. Section 3.4 focuses on one aspect of ESS methodology, namely nonresponse, and on instruments and tools that have been developed to measure and minimize it. Special attention is devoted to the ESS ‘contact forms’, which provide data on the fieldwork process and intermediate and final response outcomes. More information on the background, design and implementation of the ESS may be found in Jowell et al. (2007) and at www.europeansocialsurvey.org.
3.2 What is the European Social Survey? 3.2.1
Aims, history and philosophy
The ESS has three main aims: firstly, to produce rigorous data about trends over time in people’s underlying attitudes, values and behaviour within and between European
Improving Survey Response: Lessons learned from the European Social Survey Ineke Stoop, Jaak Billiet, Achim Koch and Rory Fitzgerald Ó 2010 John Wiley & Sons, Ltd
40
THE EUROPEAN SOCIAL SURVEY
nations; secondly, to rectify long-standing deficits in the rigour and equivalence of comparative quantitative research, especially in attitude studies; and thirdly, to develop and secure lasting acceptance of social indicators, including attitudinal measures, that are able to stand alongside the more familiar economic indicators of societal progress. The ESS was formally established in 2001, following five years of design and planning conducted jointly by the European Science Foundation and its members representing the main European academic funding bodies. Against a background of increasing European interdependence and integration and the wider context of globalization, there was significant support for a new pan-European general attitudinal social survey. The survey was designed to complement the behavioural and sociodemographic data generated by Eurostat and to fill a clear gap in the availability of high-quality comparative attitude data. The central aim of the ESS is therefore to provide comparative time-series trend data on underlying values. The second aim of the ESS – to improve standards of rigour in cross-national survey measurement – is also critical. When the discussions about the design of the ESS were taking place, there was a view that survey standards in the developed world were declining. It appeared that the challenges to high-quality survey measurement were often being met with a resigned acceptance that nothing could be done to address them. Examples include a rush to the use of web-based methodologies and online panels without a full awareness of the consequences, a gradual lowering of response rate targets and increasing use of quota sampling, all evidence that Europe might be ‘sleepwalking’ into an acceptance of lower survey standards (Jowell and Eva, 2009). The ESS was established in part to try and make the case for and demonstrate the importance and feasibility of higher survey standards. The third aim of the ESS is to complement the existing, mostly economic, indicators used by governments to compare countries using new attitudinal social indicators. ESS data will be expected to play a key role in this task. Work on this third aim has only recently got under way, with consideration of potential indicators for the future (see Jowell and Eva, 2009). The reliability of quantitative social scientific research is crucially dependent on the ‘principle of equivalence’ (Jowell, 1998). In the natural sciences, experiments are conducted according to strictly uniform standards in order to ensure the comparability of repeat measurements of the same phenomenon. A single variable might be manipulated, while all others are held constant, to allow the effect of this to be monitored. Social surveys attempt a similar process with human ‘subjects’. This principle of equivalence guides much of the ESS blueprint and the operationalization of the survey. As outlined in Section 2.2, various terms are used in cross-national research debates that refer to achieving equivalence (Jowell, 1998). Some crossnational scholars refer instead to ‘avoiding unnecessary variation’ (de Heer, 2000) or to a ‘constrained target quality approach’ (Lynn, 2003b). All these terms are variations on the theme of equivalence, representing different levels of methodological standardization. As noted in Chapter 2, what Lynn describes as a constrained target quality approach, where challenging targets are set to try and ensure consistency, but
WHAT IS THE EUROPEAN SOCIAL SURVEY?
41
where the approach is focused on a number of key constraints rather than on each individual step of the survey lifecycle (Lynn, 2003b), is the closest description from his typology that describes the ESS approach. It can be seen as a mix between the aim of raising standards and quality in some countries as far as is achievable and the aim of achieving consistency on key dimensions. However, for the remainder of this chapter, and throughout the rest of this book, we will refer to a desire for the ESS to achieve ‘optimal cross-national comparability’, since this perhaps better describes the ESS approach. This essentially involves a default position that where possible and optimal, uniform methodology is employed. At the same time it is accepted that different approaches will sometimes be needed across countries to achieve the same aim, either because an identical approach is not possible or because an identical approach would lead to lower measurement quality: ‘The extent to which it is possible to standardise the design may depend as much on the infrastructure and processes for coordination and control as on statistical considerations’ (Lynn, Japec and Lyberg, 2006, p. 15). Jowell et al. (2007) argued that in national surveys comparability is achieved by ensuring that a range of methodological issues are appropriately designed, implemented and monitored. These include ensuring that the probability of an individual’s selection in a sample should be known and not equal to zero, that cooperation and response rates do not vary greatly between different subgroups, that questions should have broadly similar meanings for all respondents so that data differences derive from differences in answers rather than interpretation, and that coding schemes are devised so that codes – rather than coders – account for differences. Cross-national studies trying to ensure appropriate harmonization in these areas are confronted with a number of additional challenges. Cultural, organizational, technical and financial barriers may undermine comparability. Language, for example, can be a serious barrier, posing major challenges to any attempt to ensure conceptual and linguistic equivalence across countries. Differing methodological ‘habits’ also cause serious problems, and allowing them to continue has seriously undermined cross-national comparability in many studies (Jowell et al., 2007). Differing response rates and their likely impact on nonresponse bias between countries in a cross-national study are also important potential sources of error. The designers of the ESS therefore set out to try to promote a methodology that would allow robust cross-national comparisons, and engaged in widespread consultation across Europe about the design of the project. The eventual blueprint for the survey (ESF, 1999) set out to mitigate some of the most serious challenges to high-quality cross-national measurement and to promote comparability across countries as its main priority. The ESS sought to promote comparability in all these areas through a combination of complete standardization in some cases, whilst allowing output-driven national variation on other occasions. For instance, the questionnaire is designed to be identical in all countries except for the language, of course, and except for questions on education and religion that are clearly country-specific. The ESS is explicitly designed to compare countries and to measure change over time. Ideally, the only feature that is designed to ‘vary’ is the measurement period. All other variables should remain constant, or at least be ‘controlled for’ in later analysis.
42
THE EUROPEAN SOCIAL SURVEY
Thus variables such as the countries themselves, the samples, questions asked and the mode of interview should ideally remain the same over time, in order to allow change to be attributed to actual substantive changes rather than being methodological artefacts. Where such variables cannot be the same from the outset due to changes in the availability of sampling frames or the resources available, they should be operationalized in a way that, although different, can ultimately still allow comparisons to be reliably drawn post hoc. The ESS has aimed to implement a range of methods, from up-front uniformity of measurement through to post hoc harmonization to facilitate cross-national comparability. When changes are required or are considered desirable between different points in time, such as a large increase in response rates between rounds in a certain country or the use of a different sampling frame, the effect of these changes should be measurable, known and documented. Only then can it be ‘accounted for’ in the substantive analysis of survey data. The analyst will want to know if a change in the proportion of respondents agreeing with a statement between 2002 and 2008 reflects a real change in attitudes rather than influences such as the positioning of the question in the survey, a change in the data collection mode or agency, a large decline in the cooperation rate, a revision of the sampling frame or even the inconsistent use of a showcard.
3.2.2
Content
It is known that the topic of a survey can influence response rates. Groves, Presser and Dipko (2004) note that the topic of a survey leads to an increase in cooperation amongst those interested in that area. Surveys relating to young children, for example, often prove especially popular among parents, whilst general social surveys struggle, perhaps because they do not have such a clear focus. As a general social survey, the topics of the ESS are wide-ranging. The shifting subjects of the survey could result in differences in saliency to respondents from round to round. The specific content of the rotating modules (see Box 3.1) is not highlighted in ESS advance letters. The ESS fieldwork documents1 recommend only mentioning those topics with the most appeal to potential respondents in each country. This is intended to enhance response rates, but of course this strategy could theoretically backfire: ‘When survey introductions and survey materials emphasize the topic of the survey, they risk stimulating participation among persons whose self-interests can be served by responding and depressing participation among those who perceive no such interests. When the survey variables are correlated with those interests, differential nonresponse bias can result, varying by the stated topic of the survey’ (Groves et al., 2006, p. 735). ESS fieldwork is scheduled to take place biennially and the questionnaire takes around one hour to administer. Half the questionnaire is repeated at each round. In this core part of the questionnaire, the ESS aims to cover three broad domains. The first is people’s value and ideological orientations (their world views, including their 1
See www.europeansocialsurvey.org: Fieldwork Documentation, Advance Letters.
WHAT IS THE EUROPEAN SOCIAL SURVEY?
43
Box 3.1 The Content of the European Social Survey Submodules of the core questionnaire
Rotating module topics to date
Trust in institutions Political engagement Socio-political values Social capital, social trust Moral and social values Social exclusion Human values National, religious, ethnic identities Well-being and security Demographic composition Education and occupation Financial circumstances Household circumstances
ESS 1 (2002/2003) Immigration Citizen involvement and democracy ESS 2 (2004/2005) Family, work and well-being Economic morality Health and care-seeking ESS 3 (2006/2007) Indicators of quality of life Perceptions of life course ESS 4 (2008/2009) Attitudes to welfare Experience and expressions of ageism
religiosity, their socio-political values and their moral standpoints). The second is people’s cultural/national orientations (their sense of national and cultural attachment and their related feelings towards outgroups and cross-national governance). The third domain, finally, is the underlying social structure of society (people’s social positions, including class, education and degree of social exclusion, plus standard background socio-demographic variables and media usage). As Box 3.1 shows, these domains have been realized through a series of submodules. The other half of the questionnaire consists of two or more rotating modules, the topics and authors of which are determined via a round-by-round competition across Europe. These modules may be repeated in later rounds. In addition to the core and rotating modules, there is also a supplementary questionnaire that includes identical or similar questions to those included in the main questionnaires with the aim of assessing the reliability and validity of the questions, thus allowing post hoc corrections for cross-national measurement error (Saris and Gallhofer, 2007b).
3.2.3
Participating countries
Up to the third round (ESS 3), 32 countries had taken part in the ESS, with 17 of these having taken part in each round. To date, no nation has permanently exited the ESS, and new countries are still joining. In ESS 3, Bulgaria, Cyprus, Latvia, Romania and Russia joined the ESS, with more countries planning to join in later rounds. The ESS now statistically represents the attitudes of the 900 million Europeans who make up the populations of the countries included in the first three rounds. Its geopolitical coverage is comprehensive, including by ESS 3 most EU states (lacking participation in the
44
THE EUROPEAN SOCIAL SURVEY
project only from Lithuania and Malta), as well as covering Iceland, Israel, Norway, Russia, Switzerland, Turkey and Ukraine. The entry of Russia into the project saw the inclusion of a major world power, while the entry of Turkey saw only the second nonChristian country (the first being Israel) joining the project. Both of these additions greatly increased the diversity of the populations and contexts covered by the ESS. Table 3.1 highlights the diversity of nations in the ESS. To start with, of course, there is a wide range of major languages and currencies. In addition, Europe has very large differences on other key societal measures. Average life expectancy for men, for example, varies from 59 to 80 years across the ESS countries, and there is similar variation among women. There are also large differences in national wealth. Per capita Gross National Income (GNI) range ranges from just US$1520 in the poorest participating country to as much as US$65 630 in the richest. The ESS also covers a diverse range of political, welfare, health and education systems that reflect the differing approaches and histories of the various participating countries alongside their very different family and societal structures. This extensive coverage of geographical and political Europe also means that the ESS operates over a vast range of different ‘survey climates’ and infrastructures. The diversity of countries included in the survey has created challenges that have required adaptation and clarifications as new countries have entered the project, most notably in the questionnaire (Fitzgerald and Jowell, 2008). It is equally of note that the survey traditions and infrastructures across Europe are also diverse. For example, in some countries quota sampling has traditionally dominated, in others a quasi-random system that allowed substitution has been the norm, whilst in others strictly random probability sampling without substitution was standard. As Chapter 4 will show, there is a survey infrastructure divide, with around half the ESS-participating countries utilizing computer-assisted interviewing (CAPI), but the remainder continuing to use paper and pencil methods of data collection. Efforts to promote harmonization have therefore had very different implications for countries participating in the ESS, with some countries more or less conducting fieldwork as they would normally do for a general social survey, while others were faced with having to make major changes to their usual approaches. In order to facilitate comparability, a harmonized methodology that aimed to reduce or minimize the impact of methodological differences was developed (see Section 3.3). A specific organizational structure was then established for the ESS to successfully implement this methodology.
3.2.4
Organization and structure
In survey research in general and in cross-national survey research in particular, organizational structure and management play a key role in promoting comparability. Whilst most discussions about achieving cross-national comparability are focused on methodology, the effective organization of cross-national projects is an important prerequisite for their implementation. The International Social Survey Programme (ISSP) is the cross-national project that most closely influenced the design of the ESS.
Profiles of countries in ESS 1–3a BE Belgium
BG Bulgaria
CY Cyprusb
CZ Czech Rep.
DK Denmark
EE Estonia
ESS Rounds Population Area (km2) Major language
1, 2, 3 8.4 m 83 871 German
3 7.6 m 110 994 Bulgarian
3 855 000 9 251 Greek/Turkish
1, 2 10.2 m 78 866 Czech
1, 2, 3 5.4 m 43 098 Danish
Major religion
Roman Catholic 77/83 US$ 36 980 Euro
1, 2, 3 10.5 m 30 528 Dutch/French/ German Roman Catholic 76/82 US$ 35 700 Euro
Eastern Orthodox 69/77 US$ 3 450 Lev
Islam
Protestant
76/82 US$ 16 510 Euro/Lira
Roman Catholic 73/82 US$ 10 710 Euro
76/81 US$ 47 390 Krone
2, 3 1.3 m 45 277 Estonian/ Russian Eastern Orthodox 66/77 US$ 9 100 Kroon
FI Finland
FR France
DE Germany
GR Greece
HU Hungary
IS Iceland
IE Ireland
1, 2, 3 61.6 m 543 965 French
1, 2, 3 82.5 m 357 027 German
1, 2 11.1 m 131 957 Greek
1, 2, 3 10 m 93 030 Hungarian
2 301 000 103 000 Icelandic
1, 2, 3 4.3 m 70 182 English
Major religion
1, 2, 3 5.3 m 338 145 Finnish/ Swedish Protestant
Roman Catholic
Eastern Orthodox
Roman Catholic
Protestant
Roman Catholic
Life expectancy GNI per capita Currency
76/82 US$ 37 460 Euro
77/84 US$ 38 500 Euro
Protestant Roman Catholic 76/81 US$ 34 580 Euro
77/82 US$ 19 670 Euro
69/77 US$ 10 030 Forint
80/83 US$ 46 320 Krona
76/81 US$ 40 150 Euro
Life expectancyc GNI per capita Currency
ESS Rounds Population Area (km2) Major language
(continued)
45
AT Austria
WHAT IS THE EUROPEAN SOCIAL SURVEY?
Table 3.1
Table 3.1
(Continued ) 46
IT Italy
LV Latvia
LU Luxembourg
NL Netherlands
NO Norway
PL Poland
ESS Rounds Population Area (km2) Major language
1 6.9 m 22 072 Hebrew/ Arabic
1, 2 58.9 m 301 338 Italian
3 2.3 m 64 589 Latvian/ Russian
1, 2, 3 16.4 m 41 684 Dutch
1, 2, 3 4.7 m 323 759 Norwegian
1, 2, 3 38.1 m 312 685 Polish
Major religion
Judaism
Roman Catholic
Protestant
1, 2 467 000 2586 French/ German Luxembourg Roman Catholic
Protestant
Roman Catholic
Life expectancy GNI per capita Currency
79/83 US$ 18 620 Shekel
78/83 US$ 30 010 Euro
67/78 US$ 6 760 Lat
76/82 US$ 65 630 Euro
Roman Catholic Protestant 78/82 US$ 41 864 Euro
78/83 US$ 59 590 Krone
71/80 US$ 7 100 Zloty
PT Portugal
RO Romania
RU Russia
SK Slovak Rep.
SI Slovenia
SE Spain
SE Sweden
ESS Rounds Population Area (km2) Major language
1, 2, 3 10.6 m 92 345 Portuguese
3 21.4 m 238 391 Romanian
3 142.5 m 17 m Russian
2, 3 5.4 m 49 033 Slovak
1, 2, 3 2m 20 273 Slovene
1, 2, 3 9.1 m 449 964 Swedish
Major religion
Roman Catholic
Eastern Orthodox
Roman Catholic
Roman Catholic
Life expectancy GNI per capita Currency
75/81 US$ 16 170 Euro
69/76 US$ 3 830 Leu
Eastern Orthodox Islam 59/73 US$ 4 460 Rouble
1, 2, 3 44.2m 505 988 Spanish/ Catalan Roman Catholic
71/79 US$ 7 950 Koruna
74/82 US$ 17 350 Euro
78/84 US$ 25 360 Euro
79/83 US$ 41 060 Krona
Protestant
THE EUROPEAN SOCIAL SURVEY
IL Israel
Major religion
Life expectancy GNI per capita Currency
TR Turkey
UA Ukraine
UK United Kingdom
1,2,3 7.4 m 41 284 German/ French/ Italian Roman Catholic Protestant 79/84 US$ 54 930 Franc
2 74.8 m 779 452 Turkish
2,3 46.2 m 603 700 Ukrainian/ Russian
1,2,3 60.7 m 242 514 English
Islam
Eastern Orthodox
Protestant
69/74 US$ 5 400 Lira
62/74 US$ 1520 Hryvnia
77/82 US$ 37 600 Sterling
WHAT IS THE EUROPEAN SOCIAL SURVEY?
ESS Rounds Population Area (km2) Major language
CH Switzerland
a Number of countries in each round: Round 1, 22; Round 2, 26; Round 3, 25. Country profile information from BBC Monitoring (part of the BBC Global News Division); population and life expectancy figures from the UN (2007) and per capita GNI figures from the World Bank (2006). Religious denomination based on ESS data from most recent round (if available; otherwise from various Internet sources). b Turkish-controlled areas are not included in the ESS sample, but the figures presented here cover all of Cyprus. c
Life expectance for men/women.
47
48
THE EUROPEAN SOCIAL SURVEY
The ISSP is a remarkably strong voluntary grouping of international teams, but there is no central coordination in the same way as in the ESS. The result is that the ISSP has sometimes struggled to achieve national compliance with mutually agreed best practice (Park and Jowell, 1997). By contrast, the ESS is an early example of the European Research Area at work, with central coordination funds provided by the European Commission, national coordination and national fieldwork costs met by scientific funding councils in each country, and scientific liaison costs being met by the European Science Foundation. This broad support has been beneficial for the ESS. However, national funding decisions and their associated timetables have led both to uneven participation and delays in fieldwork getting under way in some countries. Table 3.1 highlights how countries have sometimes been forced to miss a round or have joined the project late. To maximize compliance, the ESS has established a clear organizational structure based on ‘top-down’ and ‘bottom-up’ elements, with strong central design and coordination on the one hand and devolved national implementation on the other (see Figure 3.1). The Central Coordinating Team (CCT) takes responsibility for the
Specialist Advisory Groups
Scientific Advisory Board
Funders’ Forum
Questionnaire Module Design Teams
Methods Group
Central Coordinating Team (CCT) Centre for Comparative Social Surveys, City University London, UK (Coordinator)
Sampling Panel
Norwegian Social Science Data Services (NSD) Translation Taskforce
The Netherlands Institute for Social Research/SCP
National Coordinators (NCs) and Survey Organisations
Country 1
Country 2 Country 3
Country 4 GESIS–Leibniz Institute for the Social Sciences, Germany
Country ... Universitat Pompeu Fabra, Spain University of Leuven, Belgium
University of Ljubljana, Slovenia
Figure 3.1
The ESS organizational structure
ESS DESIGN AND METHODOLOGY
49
design and specification of the ESS, monitors its progress and quality, and is responsible for archiving and disseminating ESS data. At the design stage and throughout each round of the project, there are opportunities for input from all those involved in the ESS. Ultimately, however, final decisions and specifications are made by the CCT. When signing up to the project, each participating country explicitly accepts this part of the design. In doing so, all those involved in the ESS prioritize cross-national comparability, sometimes at the expense of national priorities. Following the confirmation of central funding for each round, each participating country appoints a National Coordinator (NC) and a fieldwork agency. The NC is responsible for the implementation of the ESS in his or her country according to the central specification. The CCT and NCs are supported in their roles by a methods group, question module design teams (QDTs), and translation and sampling panels, as well as a Funders’ Forum and Scientific Advisory Board. The Funders’ Forum meets to coordinate funding between over 30 national funding bodies and the EC and has successfully secured funding for the project to date.
3.3 ESS Design and Methodology 3.3.1
The central specification
As noted above, the CCT writes a central specification that is updated for each round of the project. The first Specification for Participating Countries was based on the original blueprint (ESF, 1999) developed under the auspices of the ESF and thus reflected the views of the majority of its member organizations. In subsequent rounds some fine-tuning took place and clarifications were added, but by and large the Specification has remained fairly consistent (see European Social Survey, 2007b). It serves as a form of contract between the CCT and the countries participating in the project. Each national research council agrees to take part in the project on the understanding that they will endeavour to comply with the Specification. This compliance is realized through a number of steps during the preparation, execution and archiving of the survey, along with a series of monitoring exercises conducted by the CCT. It is worth briefly considering these, because they show the way in which the ESS operates and how the other (sometimes competing) requirements alongside which the response and cooperation requirements discussed in later chapters have to compete for time, effort and resources. The Specification covers every aspect of implementing the ESS, from initial appointment of a National Coordinator in each country through to sampling, translation, data collection, coding and archiving requirements. The Specification also outlines key requirements related to achieving and calculating response rates, which will be covered later in this chapter. The document is complemented by detailed protocols and guidelines that provide much more extensive information about implementation (see the overview of the range of documentation sent to national teams in Box 3.2). The protocols and documents describing all the methodological
50
THE EUROPEAN SOCIAL SURVEY
Box 3.2 ESS Protocols and Guidelines for National Coordinators in the First Three Rounds Document
Description
Specification for Participating Countries
Overall specification outlining requirements for participation and the conduct of sampling, translation, fieldwork and data preparation. Key targets and minimum standards are included here.
Questionnaire, Supplementary Questionnaire and showcards Project Instructions
Source questionnaire in British English.
Translation guidelines and templates
Translation ‘Asked Questions’ document Fieldwork checklist
Source project instructions covering key issues for interviewers. The document is not designed to be translated verbatim and requires adaptation to national sampling designs, administrative procedures and so on. Detailed guidelines on all aspects of the translation, review, adaptation, pre-testing and documentation process. Includes information on appointing appropriate translation staff through to documenting each translation decision. Outlines questions that arise during the translation process in each of the participating countries. Online questionnaire that asks NCs to detail the way in which they will meet the fieldwork requirements of the Specification for Participating Countries.
Guidelines for enhancing response rates
Guidance document produced by the CCT suggesting methods that might help to increase response rates and minimize bias.
Guidelines for monitoring fieldwork
Guidance document produced by the CCT suggesting ways in which NCs can monitor fieldwork. Document produced by the CCT outlining (where applicable) fieldwork progress in the most recent round and asking for projections from national teams for the
Fieldwork figures obtained in previous round and projections for current round
ESS DESIGN AND METHODOLOGY
51
forthcoming round. These projections are then used to check progress. Sampling guidelines
Detailed guidelines for NCs on how to meet the ESS sampling requirements. A summary version is available for those who have participated in earlier rounds.
Contact forms, contact form instructions, algorithm for computing response rates
Forms that record every contact attempt to each sample unit along with instructions for their completion and an algorithm for how to compute the ESS response rate from this data.
Data protocol
Detailed description provided by the CCT of the precise format in which data must be delivered to the ESS data archive. Document completed by NCs that provides a country-level technical description about the conduct of fieldwork as well as information on data preparation, country specific items and so on. Documents that describe detailed code frames and instructions for postcoding.
National Technical Summary
Postcoded variable documentation (various) ISCED and religion variable bridging documents Event Reporting Guidelines
Documents that describe detailed code frames and instructions for postcoding. Document that provides detailed instructions on how to code media reported events during fieldwork.
quality measures required to fulfil the Specification are available from the main ESS web site (www.europeansocialsurvey.org) and on the web site of the ESS data archive (http://ess.nsd.uib.no/). Both web sites have intranet areas that are the workbenches for the project and allow for interaction between national teams and the CCT.
3.3.2
Quality and optimal comparability
In order to achieve comparability, the CCT sets a series of targets or minimum standards that participating countries are expected to meet (Jowell et al., 2007), and where these are not met they are designed to act as incentives for improvement in later rounds. As noted in Chapter 2, the ESS is primarily based on an input harmonization model where data collection is designed and built from scratch, rather than relying on existing national sources and harmonizing their output later. The ESS then seeks to obtain cross-national optimal comparability through a combination of uniformity,
52
THE EUROPEAN SOCIAL SURVEY
where both possible and desirable, and differing approaches within each country when uniformity is not optimal. The key aim is to do things in exactly the same way where it is both possible and will lead to high measurement quality across all countries. At the same time the approach demands that, where uniformity is not possible or if pursued is likely to have a detrimental impact on measurement quality, mechanisms are in place to facilitate nationally different approaches to achieving outcomes that remain comparable. This is the ESS approach to aiming for optimal comparability. There are many examples of how the ESS insists on standardization to achieve comparability. For example, it is known that there can be differences in how respondents answer survey questions due to the mode in which the questions are administered (for a recent overview, see de Leeuw, 2005). The decision was therefore taken to insist that every country use a face-to-face methodology, in order to prevent survey comparisons between countries being biased due to the use of different modes both within and across countries. There are no exceptions to this requirement, and all interviews must take place with an interviewer present in the respondent’s home.2 Another example of standardization relates to the questionnaire, although in this case there is a recognition that total uniformity is not optimal, since it could damage data quality. A hybrid approach is therefore adopted. Countries are required to ‘Ask the Same Questions’, providing respondents in every country with the same stimuli at each question through the translation process. Each country is, for example, expected to have the same number of answer codes for each question and to convey a functionally equivalent question in the target language to that specified in the English source questionnaire. At the same time, however, the ESS translation procedures (Harkness, 2007) do not insist that the question should have the same grammatical structure as the source questionnaire, or that ‘direct word equivalents’ of key terms must be used. Such an approach would be harmful to the measurement quality of the translated question, since the question formulation might be extremely long or awkward, whilst a less direct translation might better convey the same meaning to respondents, mirroring that in the English source questionnaire. These examples highlight the mix of approaches employed in the ESS whilst underlining that there is an emphasis on close harmonization where this is optimal. For information, a brief summary of key areas of the general ESS methodology is provided in Box 3.3. The ESS approach to response rates is based on three components. Firstly, targets are set in relation to both low noncontact and high response rates. Secondly, there is a series of requirements, based upon best practice and outlined in the Specification for Participating Countries, that participating countries are obliged to meet in order to try and achieve these targets. Finally, there is a requirement for each national team to use 2
A programme of methodological work is under way in the ESS to determine the extent of such differences and to examine whether it might be possible to correct for these in later analysis. This includes research into the feasibility of mixed-mode data collection in terms of response rate and implementation and so on, as well as research focused on the impact of different modes on how respondents answer questions (see Roberts, Eva and Widdop, 2008).
ESS DESIGN AND METHODOLOGY
53
Box 3.3 General ESS Methodology – A Summary of Key Areasa Questionnaire design, piloting and quality monitoring The ESS used a range of design techniques to develop its core questionnaire and most of these are also used during each round when developing the rotating modules. They include expert papers by substantive specialists, multidisciplinary specialist review, consultation with ESS NCs about both intellectual and cultural concerns, the use of the Survey Quality Predictor (SQP) program to estimate reliability and validity of items, large-scale two-nation quantitative pilots, extensive pilot data analysis to examine item nonresponse, scalability, factor structure and expected correlations, and split ballot Multi Trait Multi Method (MTMM) experiments at both the piloting and main stages of fieldwork. The MTMM experiments allow for both improving questionnaire design and for post hoc amendments to correct for differential measurement error between countries. Translation Questionnaires for each round of the ESS have to be translated into over 25 languages in each round, a number that grows with increasing country participation. Aware that one of the greatest failings of cross-national projects has been a lack of attention to effective translation, the ESS has sought to make this a priority. An English source questionnaire is drafted, which subsequently gets translated not only into every participating country’s first language, but also into languages that are spoken as a first language by more than 5% in this country. Several countries thus have to translate the source questionnaire into multiple languages. To help with the translation process, potentially ambiguous or unclear words or phrases are annotated in the source questionnaire with a brief description of their intended meaning in the context of the module of questions. Translators are guided by a detailed protocol specifying a five-step translation procedure consisting of translation, review, adjudication, pre-testing and documentation (Harkness, 2003, 2007). It is a committee-based approach, thus helping to avoid the subjective impact of a single translator. The process is meticulously documented so that subsequent analysts may refer back to the decisions made. Countries that share a language (such as France, Belgium, Luxembourg and Switzerland) are encouraged to consult with one another during the process, but not necessarily to harmonize their finished questionnaires. Fieldwork contracting and compliance monitoring In order to maximize compliance with the ESS Specification, a CCT contracting team asks each NC to complete a fieldwork checklist outlining the intended fieldwork procedures. At the end of each round, a report detailing compliance with the Specification is produced for each country. (continued)
54
THE EUROPEAN SOCIAL SURVEY Event reporting An innovative system of event reporting has been implemented to allow events reported in the media, which it is thought might impact on survey responses, to be recorded. The current system is already providing data analysts of the future with a key record of events that might be salient to changes during data collection. A new methodology using a ‘political claims’ approach with a detailed coding frame is being developed for the future to increase comparability. Documentation and archiving A key aim of the ESS has been its commitment to make data and extensive documentation available to the whole social science community simultaneous ly, without cost and as quickly as possible (Kolsrud, Kalgraff Skjak and Henrichsen, 2007, p. 139). This has been achieved by making data available to all scientists at once, with no privileged access for the CCT or others closely involved with the operation of the ESS. Data have been made available for immediate download after a simple and automatic registration process that is completed in minutes. Data can also be analysed online via a user-friendly analysis tool. The ESS has been driven by a desire to make the process of data collection and provision as transparent as possible in order that data users might be able to reconstruct all the elements that could have influenced the responses. This has also had important implications for the documentation of the response process. ESS datasets themselves are complemented by a wide array of metadata and auxiliary datasets, all of which can be accessed by the data user at the touch of a button. The main and data web sites host a range of other materials and services. For instance, there is an online searchable list of all publications and other outputs that have made extensive use of ESS data. Just as importantly, meticulous details of questionnaires are available in all languages, together with sampling and other methodological protocols that may influence the reliability or validity of findings. Data availability and usage Combined datasets have been published to a uniform standard less than a year after fieldwork was completed. Made freely available to all via the Internet, there is no privileged prior access for any of the scholars closely involved in the project. Comprehensive documentation is also accessible to all users. As of early 2009, over 20 000 users have registered to access the datasets and more than half of these have downloaded them for more detailed analysis. There are many hundreds of outputs, including books, journal articles and papers, and the release of data from ESS 3 has allowed analysts to really start examining changes over time (Fitzgerald and Jowell, 2008).
a
Sampling, fieldwork and response are dealt with separately below.
ESS DESIGN AND METHODOLOGY
55
their own knowledge and best practice in order to meet the response rate targets. A series of recommendations and guidelines is made available by the CCT, but in an explicit acknowledgement that a uniform approach would not be appropriate, considerable national flexibility in conducting fieldwork is encouraged, subject only to the minimum standards in the Specification for Participating Countries. In Switzerland, for example, there is widespread use of telephone contact attempts after the minimum number of face-to-face contacts have been implemented, whilst in the Netherlands an ambitious programme of respondent incentives is used. Before considering the targets, requirements and use of local expertise, it is necessary to describe some of the other ESS arrangements, notably sampling and fieldwork, because they have a bearing on efforts to minimize nonresponse.
3.3.3
Sampling designs, procedures and definitions of the population
Sampling is at the heart of the social survey process. Random probability samples provide the basis on which social surveys can be used to make inferences about the population from which they are drawn. The sampling procedures used in the ESS also dictate much of the approach to fieldwork and nonresponse issues. Sampling is also an area of ESS methodology where total standardization is not pursued, if only because there is wide variation in the available sampling frames across countries. The sampling requirements, however, are not only specified in detail but their implementation by national teams is subject to the most stringent central control of any area of the ESS prior to data collection. The final sample design has to be approved by the ESS sampling panel in advance of fieldwork to ensure it is comparable to that in other countries. This panel also advises on optimal national sampling solutions. In ESS 3 there were cases where countries were not able to provide the sampling data files required to enable the design weights, which correct for differential selection probabilities arising from the sampling methodology, to be derived. In these cases, the data files for these countries have been excluded from the combined international dataset until the required information can be provided. This sanction demonstrates the central importance attached to ESS sampling procedures and their implementation and documentation. In terms of achieving comparability across countries, there are some key sampling requirements. For example, all countries use a single definition of the target population: all persons aged 15 and over (no upper age limit) resident within private households in each country, regardless of their nationality, citizenship or language. Those living in institutions are therefore excluded from the survey despite the fact that the size and composition of the institutional population differs across countries. The ESS aims to sample individuals and not households (although in some cases households are used as a mechanism by means of which to sample individuals). All countries have to employ probability sampling, with no quota controls or substitution allowed under any circumstances. This means that in countries where this is
56
THE EUROPEAN SOCIAL SURVEY
uncommon, researchers have had to adapt their procedures and systems specifically for the ESS. Although these overarching sampling principles are identical for all participating countries, the actual methods by which samples are drawn differ widely. This primarily reflects the differences in sampling frames available in each country. Countries thus use a population register with samples of named individuals or, where this is not available or suitable, household and then address samples. Some of the Nordic countries have access to registers of all persons aged 15 and over, which allow individuals to be sampled directly (in some instances allowing for simple random samples). Other countries have to use address or household sampling frames and then make individual selections at the household itself. Yet others have to rely on area-based sampling with random route procedures for the final selection of addresses and households, again followed by target person selection at the household level. The decision was taken that countries should select the best available sampling frame. This was preferable to adopting a lowest common denominator approach and trying to keep things the same across all countries (Lynn, 2003b), which would have involved all countries using random route procedures, a procedure that is best avoided wherever possible. The result of these differing approaches is that the process and effort required to make contact with target respondents varies widely between ESS countries. The differing approaches also have implications for the likely response rate. Where samples of named individuals are used, interviewers have to follow up target persons who have moved from the address recorded in the register. If they are now in an area not being covered by an interviewer, they cannot reasonably be included in the survey, and this will depress the response rate. In countries using address samples, the fieldwork agency needs to conduct a selection of addresses, and where no list of addresses is available, the listing of addresses has to be performed by someone different from the survey interviewer to ensure the quality of the sample selection. In household and address samples, interviewers have to select an individual from within the household, creating an additional level of contact and interaction with the household not required for samples of named individuals. These differing challenges have a direct effect on the process of eliciting cooperation with the survey and the resulting fieldwork burden. In countries where no simple random samples (SRS) could be drawn, not all individuals in the population aged 15 and over had precisely the same chance of selection. Thus, for example, the unweighted samples in some countries over or underrepresent people in certain types of address or household, such as those in smaller households. To accommodate this, design weights are computed that correct for these slightly different probabilities of selection, thereby making the sample more representative of a ‘true’ sample of individuals aged 15 and over in each country. The smaller the probability that a person will be included in the sample, the greater is the design weight.3 3
More information on weighting in the ESS is available at http://ess.nsd.uib.no/: Survey Documentation – Weighting ESS Data.
ESS DESIGN AND METHODOLOGY
57
In order to ensure that these differing approaches lead to comparable samples, the ESS sampling panel employed the ‘effective sample size’ concept (H€ader and Lynn, 2007). This approach takes a simple random sample (SRS) of 1500 cases as its benchmark. The further a design moves away from the ‘precision’ of a simple random sample, the greater is the sample size that is required to generate the same ‘effective’ sample size. Countries not using an SRS need to take into account the amount of clustering employed, differing selection probabilities and the estimated response rate in order to achieve an effective sample of 1500. Small countries (with a population of less than two million) are required to have an effective sample size of 800. As will be demonstrated in Chapter 6, this requirement results in requires very different gross sample sizes, meaning that the total effort required to conduct the ESS varies considerably between countries.
3.3.4
Fieldwork and contracting
At an early stage in its development, the ESS rejected the option of choosing a single multinational firm to conduct all fieldwork, recognizing that in any event such groupings often have a loose federal structure and may be no more likely to lead to a reduction in methodological ‘house effects’ than assembling a bespoke grouping for the ESS. Instead, each national funder of the ESS is responsible for selecting a fieldwork agency that is ‘capable of, and has a track record in, conducting national probability-based surveys to the highest standards of rigour by means of face-toface interviewing’ (European Social Survey, 2007b). This approach means that the decision as to which agency to appoint is made in each country by those most familiar with the range of survey firms available. The CCT therefore has no direct role in this part of the process, although it is sometimes asked to assist. It is of note that in some countries the number of firms able to meet the ESS Specification is small. In Switzerland, for example, capacity has had to be built up especially for the ESS because of the almost complete disappearance of face-to-face interviewing in commercial agencies, whereas in other countries such as France, probability sampling was rare in the commercial sector, thus limiting applicants to firms willing to take on the very different interviewer challenges that probability samples impose. Prior to a contract being agreed between the national funders and the survey agency, a fieldwork ‘checklist’ must be completed by each ESS National Coordinator. This checklist asks each national team how they intend to meet the requirements outlined in the ESS Specification for participating countries.4 Many of the issues covered in the checklist will have a direct bearing on the later success of the fieldwork; for example, the expected number of interviewers who will work on the project, or the decision to use an advance letter. Where a country is either unable or unwilling to meet the specified minimum requirements, the issue is discussed with the CCT prior to 4
The checklist was introduced in Round 3. Prior to that, countries had to submit their draft contract (in English) for consideration by the CCT.
58
THE EUROPEAN SOCIAL SURVEY
sign-off of the checklist and the start of fieldwork. Sometimes the country amends its design to bring it in line with the Specification, whilst on other occasions the CCT agrees to a deviation. The checklist is especially helpful in identifying misunderstandings about the Specification in advance of the fieldwork and for identifying strategies for local adaptation to the central guidelines. Most of the specified requirements regarding fieldwork pertain to minimizing nonresponse and will be discussed later in this chapter. However, there are a number of measures that aim to ensure that other factors related to fieldwork do not compromise comparability. Perhaps most important is the requirement that all fieldwork must be conducted face-to-face to avoid the effects of different modes on the comparability of survey findings. As noted earlier, there is good evidence that mode effects compromise data quality (Roberts, Eva and Widdop, 2008), and a single mode was therefore insisted upon to maximize comparability. Furthermore, at the time the ESS was established there was a majority opinion that face-to-face was the optimal mode of interviewing in terms of its measurement quality and its ability to generate good response rates. The contacting and response process in most countries is therefore dominated by in-person face-to-face contact attempts, although in a minority of countries telephone contact attempts are permitted. This has important implications, since the interaction between the survey and the target person is necessarily different depending on the mode of contact being used. It is also of note that in some countries (Norway, Switzerland) face-to-face fieldwork is now quite rare and considerable effort therefore has to be put into providing this for particular projects such as the ESS. For example, the recommendation to rely on experienced interviewers would be more difficult to fulfil in these countries. There are some other important fieldwork requirements that also have a bearing on the response process. Fieldwork is specified to start and end within a four-month period (September to December, inclusive) within each survey year. In addition to facilitating the practical arrangements for producing the final combined ESS dataset, the timetable aims to minimize the impact of events on survey responses by harmonizing the fieldwork period between countries as far as possible. Even though it was known that the fieldwork period might be challenging in some countries, the need for comparability was prioritized. Challenges include some countries being unable to prepare for fieldwork in August due to almost total office closures for holidays, while in others having to conduct fieldwork in winter and in the run-up to Christmas poses challenges for interviewers. The Specification also requires that the workload of any single interviewer be limited to a maximum of 48 issued sampling units in order to reduce the impact of individual interviewers on the data collected. This requirement was based on evidence that interviewer characteristics might affect whether or not respondents agree to take part, thereby possibly introducing nonresponse bias, as well as affecting how respondents answer questions (Loosveldt and Philippens, 2004). It was recognized that this requirement might be challenging in some countries and that a trade-off was required between utilizing the best interviewers in terms of eliciting response and minimizing interviewer effects.
NONRESPONSE TARGETS, STRATEGIES AND DOCUMENTATION
59
3.4 Nonresponse Targets, Strategies and Documentation 3.4.1
Background
Since it was known that contact and cooperation rates differ across countries, which can in turn lead to differing types of nonresponse bias in each country (see Chapter 2), targets were set for both contact and cooperation. The aim was to ensure the most nationally representative samples within each country whilst also facilitating optimal comparability between them. To an extent, however, these two aims were in conflict with one another, since existing evidence from national surveys suggested that some countries would be able to achieve much higher response and contact rates than others. By encouraging those countries that could achieve the highest response rates to do so, it was likely (though not guaranteed) that the representativeness of their samples would be better, potentially undermining the comparative quality of ESS data. At the same time, the alternative of asking all countries to achieve some kind of European average in terms of response and contact rates would have been counter to good science. And in any event, there was no basis for setting such a rate, since the same response and contact rates across countries would not necessarily lead to the same types of nonresponse bias. The decision was therefore taken to set response and contact rates that were generally achieved in the countries that had the best response rates in Europe. The response rate target of the ESS was set at 70%, with a maximum noncontact rate of 3%. As noted above, such a high response rate and low noncontact rate were set in the full knowledge that in some countries they would be difficult (if not impossible) to achieve. In the United Kingdom and Germany, for example, response rates on the national ‘equivalents’ to the ESS (British Social Attitudes and ALLBUS) were in the 50–60% range, considerably lower than the 70% required. However, it was hoped that high rates would serve as an incentive for improvement. Although response rates were never the sole quality criteria in the ESS, there has been an increasing awareness that response rates should not be seen as the most important indicator of survey quality, due to growing evidence that the correlation between nonresponse bias and nonresponse rate is limited (Groves, 2006). At the same time, methodological innovations aimed at enhancing response rates, such as responsive designs (Groves and Heeringa, 2006) or mixed-mode data collection (Roberts, Eva and Widdop, 2008) might prove useful in the future, but are not yet a realistic option for cross-national general social surveys such as the ESS. Mixedmode data collection will not be possible in all European countries, and will have an impact on other quality criteria and comparability. Responsive designs often rely on being able to monitor key survey indicators for evidence of change as a result of increasing response rates. They therefore pose difficulties in a general social survey when there are multiple key indicators, some of which might be being measured crossnationally for the first time. Furthermore, a real-time responsive design is not really possible without computerized interviewing, backed up with regular downloads of data during fieldwork and a sufficiently flexible field force that can respond to
60
THE EUROPEAN SOCIAL SURVEY
changing demands. Many ESS countries still use paper-and-pencil questionnaires (see Chapter 4), making a responsive design especially difficult during fieldwork in around half the participating countries. Finally, clear measurements and targets for minimizing and comparing nonresponse bias across countries are not yet available, if only because nonresponse bias is a characteristic of individual variables and not of a whole survey. The CCT thus continues to specify that there should be a target response rate of 70% and a noncontact rate of 3% in the absence of an alternative fieldwork strategy for minimizing nonresponse bias.
3.4.2
Requirements and guidelines
In order to meet these exacting targets, a number of minimum requirements were put in place. Many of these aim explicitly to help countries achieve a noncontact rate of no more than 3%. These include the stipulations that fieldwork must last for at least 30 days in each country, so as to give difficult-to-reach target respondents a reasonable chance of being included in the study; that countries must make at least four contact attempts for each selected sampling unit before accepting it as nonproductive; and that at least one of these calls must be in the evening and one of them at the weekend. In addition, there is a requirement that aims specifically to maximize the cooperation rate in the survey. This is that first contact must be made with target respondents face-to-face,5 thereby reducing refusal rates, which are known to be higher in telephone contacts. There are also a couple of requirements that cover both minimizing noncontacts and maximizing cooperation. The first is that all interviewers must be personally briefed on the survey to ensure that they are familiar with all aspects of the administration of the survey, including the process of making contact with potential respondents, as well as how to achieve high contact and response rates. In some instances, interviewers might be using a different approach from what is usual in their work; for example, a ban on all target person substitution. In these instances, specific instruction on these aspects is critical. The second requirement is that fieldwork should be closely monitored, including producing fortnightly reports on response. This is to ensure that problems with fieldwork can be identified early and addressed where possible. These requirements were based on a combination of experimental evidence, best practice and the practical and cost considerations that survey organizations have to consider. However, understanding cross-national differences in nonresponse is a relatively new area, and at the time the ESS rules were written little concrete and truly comparable information about the response process was available. Furthermore, it was accepted that there was evidence of differing survey climates and response rates between countries, but little was really known about the reasons for these differences (see European Social Survey, 2007b). So in addition to these universal specifications, each participating country and its survey organization were encouraged to implement 5
The only exception was for those countries using a sample of named individuals from a register that contains a sufficient proportion of telephone numbers, in which case contact attempts may be made by telephone.
NONRESPONSE TARGETS, STRATEGIES AND DOCUMENTATION
61
a range of techniques that they believed would enhance the final response rate. The CCT made suggestions, but it was ultimately up to each NC and survey institute to agree on the optimal arrangements in order to meet the ESS requirements. This underlines the fact that it is not always possible to standardize all aspects on a crossnational survey, especially in an area such as nonresponse. The suggestions made by the CCT for maximizing response are made available to NCs prior to each round of the survey. The guidelines include all of the ESS obligations outlined in Box 3.4. In addition, they provide other optional advice such as the sending of advance letters, the use of incentives, the detailed monitoring of fieldwork, and interviewer issues such as selection, training and monitoring. The range of areas covered underlines the sheer range of factors that were thought likely to have an impact on nonresponse. For example, a clever research design that encourages optimal calling times at evenings and weekends is going to fail if interviewer remuneration is poor or their overall workloads are too high. The ESS accordingly tried to consider this range of activities when devising the supporting documentation for NCs. Box 3.4 summarizes the key recommendations made by the CCT. The next chapter will discuss the implementation of some of these measures. In addition to careful study design and planning, the CCT was also aware of the importance of monitoring its implementation. For its fieldwork monitoring, the ESS therefore relies on two key mechanisms. Firstly, NCs are responsible for directly monitoring, in detail, the fieldwork progress of the survey organization. Secondly, the CCT monitors fieldwork centrally for each country, but on a general rather than detailed level. The Specification for Participating Countries states: ‘Fieldwork progress must be closely monitored, including producing a fortnightly report on response . . .’. NCs have to send the CCT details on the total number of completed interviews per week. These are then compared with advanced projections of fieldwork and where they deviate, more detailed information is examined.6 However, NCs themselves are strongly advised to monitor more detailed information and carry the main responsibility. Key points to evaluate are shown in Box 3.5, an extract from the fieldwork monitoring guidelines. This document and the emphasis put on monitoring are important reminders that achieving good response rates is a complex process. The fieldwork monitoring procedures aim to identify problems early, thus increasing the opportunity to correct them before the end of the process.
3.4.3
Definition and calculation of response rates
A key challenge to comparability of response rates between European surveys or between countries within a cross-national study has been the absence of a harmonized approach to calculating response rates. The American Association for Public Opinion Research (AAPOR) has provided North America with a standardized method for recording response outcomes and response rates (American Association for Public Opinion Research, 2008), but Europe has never had such a scheme of its own and this 6
This system was introduced from ESS 2 onwards.
62
THE EUROPEAN SOCIAL SURVEY
Box 3.4 An Extract from the ESS 4: Guidelines for Enhancing Response Rates Interviewers: selection, organization and training for response enhancement Selecting interviewers . Attempt to enhance response rates by selecting experienced interviewers wherever possible. Briefing interviewers . Include a doorstep introduction and encouraging participation in the ESS session, as part of the briefing. . Motivate interviewers to deliver good work and boost their confidence and ability to sell the survey. Interviewer assignments and workload . Discuss the workload of interviewers with the survey organization to avoid conflicts of interest between surveys. . In addition to overall ESS deadlines, set internal deadlines for when interviewers have to complete assignments. Leave sufficient time for reissues of noncontacts and refusals afterwards too. Monitoring interviewers’ progress . During the fieldwork period, survey organizations should provide regular feedback to the, NCs regarding fieldwork progress. . During the fieldwork period, NCs must provide fortnightly reports on response progress to their CCT contact person (essential in order to comply with the Specification for Participating Countries). Payment of interviewers . Discuss the interviewer pay arrangement with the survey organization. The pay rates for ESS should be attractive for interviewers, both with respect to the study difficulty and with respect to the pay on other studies. Reducing the number of noncontacts When the progress reports on fieldwork reveal a high noncontact rate, participating countries should check whether the interviewers adhered to the specified call schedule or not. This may on occasion require that contact forms are checked on site at the survey organization by the NC team. . Based on experiences from ESS 1 to ESS 3, it is suggested that some countries consider raising the minimum number of calls and changing the timing of the calls. .
Length of the fieldwork period . Ensure that optimal use is made of the stipulated fieldwork period. In particular,
try to ensure that interviewers will work in all areas from the very beginning of the fieldwork period. Minimising the number of refusals
NONRESPONSE TARGETS, STRATEGIES AND DOCUMENTATION
63
Advance letters . Use an advance letter, personalized with the individual name if possible, or the address. . Include the letters in interviewer workpacks, and instruct them to organize posting them a few days before they intend to contact the address. . If an attempt is being made to contact a household a longtime after the initial letter was sent (for example, with a reissue), then consideration should be given to sending a second letter. Respondent incentives . Consider using an incentive to raise response rates. . Be aware that incentives might have an effect on nonresponse bias, as well as response rates. Converting people who initially refuse . Interviewers should be familiar with effective techniques to avoid refusals. In particular, countries with low (interim) response rates should try to attempt to convert as many refusals as feasible into an interview. If possible, experienced interviewers should carry out the conversion attempts. Source: Koch et al. (2008a).
causes comparison difficulties between European surveys. It was vital for the ESS to devise and execute a standardized system that ensured harmonized response rate calculations and documentation across all its participating countries. According to the Specification, the ESS response rate is calculated as the number of interviews achieved, divided by the number of units selected (individuals, households, addresses) minus the ineligibles. The onus is on the fieldwork agency to achieve a high response rate and to secure the maximum possible contacts in order to achieve this. Cases that can be classified as ineligible are very limited. Categories of ineligibles depend on the type of sampling frame that is being used. The definition of ineligibles ultimately depends on the definition of the target population. For samples of individuals, cases where the respondent is deceased, the address is not occupied by the respondent (unoccupied/demolished/not yet built), the respondent has emigrated/left the country long term or the respondent resides in an institution are all considered ineligible. For samples of households or addresses (including area-based samples), a slightly different set of cases are considered ineligible. These include cases where the address is not occupied at all and demolished premises, cases where the address is not yet built or is under construction, nonresidential addresses (e.g. those used solely for business/industrial purposes or as an institutional address – such as factories, offices or schools), addresses that are occupied but are not residential households (e.g. weekend homes) and cases where addresses are occupied by the resident household but there is no eligible respondent (e.g. no one aged 15 or over).
64
THE EUROPEAN SOCIAL SURVEY
Box 3.5 ESS 4 Fieldwork Monitoring Suggestions Measure
How to use
Possible action (not exhaustive)
Number of achieved interviews
Is the number of achieved Check with fieldwork interviews in line with the organization about: targets expected? (i) number of interviewers currently working or On this basis, will (a) the starting work; required sample size be achieved and (b) will the (ii) scheduling of interviews. fieldwork be completed within the time allotted?
Number where no contact attempted as yet
If high, why is this? Have all the addresses been allocated to interviewers? Are there any interviewers unable to start work?
Check with fieldwork organization that: (i) All addresses have been allocated to interviewers. If not, how can this be covered? Are more interviewers/ briefings required? (ii) All interviewers are starting work promptly.
Response rate
Is the response rate in line with predictions? Is the refusal rate in line with predictions?
Discuss early with fieldwork organization: (i) response maximization; (ii) refusal conversion strategies; (iii) number and timing of calls to reduce noncontact rate.
Number of refusals Number of noncontacts (with household or respondent) Number of ineligibles
Is the noncontact rate in line with predictions? Is this higher than expected? Were the initial assumptions correct? Are interviewers assessing eligibility correctly?
Source: Koch et al. (2008b).
NONRESPONSE TARGETS, STRATEGIES AND DOCUMENTATION
65
The primary role of the reported response rate on the ESS is not meant to denote field effort (although this is often implied) but, rather, to reflect the quality of the data in terms of completeness compared to the intended universe (see also the discussion on bias in Section 2.6). So if, for example, in a country using a population register a large number of selected individuals are found to be on an opt-out register (which forbids attempts to include them in the ESS or any other survey), the response rate will be lower, reflecting the loss of completeness that this causes and possibly an associated decline in representativeness. In this case, therefore, a decrease in response rate will occur even though the field agency had no opportunity to try to include these individuals in the survey. Other cases, such as where the address is not traceable/ reachable, or where the respondent is away throughout the fieldwork period or has moved to an unknown destination within the country, where the respondent is too ill/ incapacitated or mentally or physically unable to participate throughout the fieldwork period, or where the respondent cannot be interviewed in national survey language(s), are nevertheless considered to be eligible. Again, in many of these instances the field agency has limited or perhaps no control over such outcomes, yet the response rate will still be lower. A separate field response rate7 is also calculated, which better reflects the efforts made by the field agency. However, the final reported response rate refers to the ‘ESS response rate’, and it is this response rate that is referred to in this book. Ultimately, data analysts are interested in knowing about the completeness of the sample compared to the target population. This is what the ESS response rate provides. Loosveldt, Carton and Billiet (2004) have argued that both ‘process evaluation’ and ‘outcome evaluation’ of contact attempt data are equally important. A methodology is therefore required that allows both these aspects to be measured and reported. This means that the evaluation of data quality not only needs to deal with the survey results obtained (e.g. response rates, comparability of distributions with known distributions in the population, amount of item nonresponse etc.), but also with each step in the process of data collection, and with each attempt to contact a selected sampling unit. Couper and de Leeuw (2003, p. 157) had similar concerns: ‘Only if we know how data quality is affected by nonresponse in each country or culture can we assess and improve the comparability of international and cross-cultural data.’ The ESS therefore set out to adopt methods for recording both process and outcome data to allow both types of evaluation. Specifying methods for deriving uniform response rates on the ESS is complicated, because of the variety of different types of sampling frames used (see Section 3.3.3). With individual sampling frames, calculating response rates is simpler, because the sample can be drawn directly from an existing single source. With address and household samples, the process is more complicated because a number of intermediate steps are required before contact with the target respondent can be attempted. For example, for address samples it is usually necessary to identify households within selected addresses prior to attempting to identify target respondents.
7
The ESS also requires countries to calculate a ‘field’ response rate where a larger group of cases are classed as ineligible.
66
THE EUROPEAN SOCIAL SURVEY
To record aggregate outcome data, the ESS asks all countries to complete a section in a National Technical Summary: a document designed to summarize the data collection process in each country. The NTS records the length of the fieldwork period, payment and briefing of interviewers, the use of quality-control back-checks, the use of special efforts to convert reluctant respondents, the use of advance letters, brochures and respondent incentives, and the distribution of outcome codes for the total selected/issued sample, according to a previously defined set of categories. The information collected in the NTS is discussed in detail in Chapters 4 and 5. Table 3.2 shows an example of the aggregate information on final outcome codes and the response rate from a country in ESS 2 that is included in the NTS. While this aggregate level information is clearly of use, it fails to meet the requirement set by Couper and de Leeuw (2003) that the impact of nonresponse on
Table 3.2 ESS 2
A response rate example from the ESS National Technical Summary for
Breakdown of response and nonresponse, main questionnaire a
Total number of issued sample units (addresses, households or individuals) b Refusal by respondent c Refusal by proxy (or household or address refusal) d No contacts (after at least four visits) e Language barrier f Respondent mentally or physically unable to cooperate throughout the fieldwork period g Respondent unavailable throughout the fieldwork period for other reasons h Address not residential (institution, business/industrial purpose) i Address not occupied (not occupied, demolished, not yet built) j Address not traceable k Other ineligible address l Respondent moved abroad m Respondent deceased n Number of achieved interviews o Interviews not approved p Records in the data file x Number of sample units not accounted for Response rate, main questionnaire: (n o)/ (a (sum h,i,k,l,m)) Number of completed supplementary questionnaires
3042 551 106 205 65 118 46 14 19 4 15 87 10 1778 0 1778 24 61.37% 1778
NONRESPONSE TARGETS, STRATEGIES AND DOCUMENTATION
67
survey data be clearly identifiable. For example, it fails to inform the analyst if those who were hard to contact differed in their response to the survey compared to those who were at home the first time the interviewer called. And it does not shed light on whether those who were approached by a different interviewer after an initial refusal and only then decided to take part differed from those who agreed to take part first time round. Only with such information is it possible to examine whether survey estimates are subject to nonresponse bias. In order to provide this information, the ESS CCT set about the considerable challenge of designing ‘contact forms’ that would facilitate the collection of comparable information about every contact attempt with every issued sample unit in every participating ESS country. The ESS therefore has two sources of information on response rates: the NTS just described and the contact form data. There can sometimes be small discrepancies between the data derived from these two different sources, and where this occurs the contact form data is considered to provide the most accurate and transparent data.
3.4.4
Contact forms
The ESS was the first cross-national survey to capture and make publicly available call record data for both respondents and nonrespondents (Stoop et al., 2003; Blom, Lynn and J€ackle, 2008). Unfortunately, most cross-national studies do not publish much information about differential response across countries. Prior to the launch of the ESS, this information was not available on an individual level in a comparable format for such a large number of countries. In order to ensure that the ESS could provide such data, not only about the aggregate outcomes in each country but also about the different components of nonresponse by country, the CCT designed and implemented uniform ‘contact forms’, allowing assessment of the response process at micro-level. The Specification for Participating Countries states: ‘Outcomes of all approaches to addresses, households and individuals in the sample will be defined and recorded according to a pre-specified set of categories that distinguish noneligibility, noncontacts and refusal. Model contact forms will be produced by the CCT, for translation and use by national teams. Countries may use their own contact forms if they wish, ensuring that these collect data on all of the variables specified by the CCT’ (see European Social Survey, 2007b). Thus every single attempt to contact potential ESS respondents is noted down by interviewers, providing comparable records of the response process, irrespective of the different frames and designs used in the countries. Countries can choose either to use the contact forms provided by the ESS (input harmonization) or to use their own existing forms (output harmonization), the latter on the condition that they can provide the data the ESS requires. An annotation file specifies the final variables that are required from the contact forms in order to facilitate output harmonization. Perhaps inevitably, when countries use their own forms there can be problems related to missing variables and less standardization than when the ESS contact forms are used.
68
THE EUROPEAN SOCIAL SURVEY
Stoop et al. (2003) have outlined the challenges of designing forms that could be used in all ESS countries. The first phase in their development was to make an inventory of contact forms used by survey organizations in Europe and the United States, drawing on experience from the International Household Survey Nonresponse Workshop (www.nonresponse.org). The second phase involved drafting a contact form for three types of sampling frames (address samples, household samples, individual samples), allowing for different methods of selecting households per address and individuals per household. Finally, a compromise had to be reached between data needs and fieldwork burden. The requirements were revisited after Round 1 of the survey (ESS 1) because the burden on field agencies was considered too high, and a slightly reduced set of data were collected in subsequent rounds. Although some national teams continue to report that the fieldwork burden associated with the contact forms is high, other teams show how data from the contact forms are used to optimize national fieldwork strategies and to analyse nonresponse bias. There are six different contact forms: individual samples; household samples (Kish grid/last birthday selection); address samples (Kish grid/last birthday selection); and address samples, where more than one household can be selected at each address. Either the Kish grid or last birthday selection is used for the selection of individuals within the target household. Each country selects the contact form that matches its proposed sample design prior to each round. The ESS contact forms capture a range of information, including interviewer and (potential) respondent identification, as well as information about the selection procedure (type of sample etc.). In addition, the date, month, day of the week, and exact time (hours and minutes) of each visit, the mode of each visit (face-to-face versus telephone) and the result of each visit are recorded. For contact attempts that did not result in an interview, information on the ‘outcome’ is entered and where applicable the reason for refusal and an estimation of the likelihood of future cooperation are also noted. If the sample unit is ineligible, the reason for this is also recorded. And for every sample unit, information is also entered on the neighbourhood (e.g. type of housing, amount of graffiti etc.). Appendix 3.1 contains a contact form as used in ESS 3. The complete forms are complex, in some cases running to eight pages. Although some simplifications were made after the first round, the forms remain a challenge in the field, especially where completing such information is not part of the usual fieldwork procedures of the survey agency. Despite these constraints, the contact forms have generally been implemented in most countries to a reasonable standard. In ESS 1, for example, 17 out of 22 countries successfully delivered a complete call record dataset, whilst 22 out of 26 did so in ESS 2 and 23 out of 25 did so in ESS 3. However, there remain a number of recurrent difficulties, including implementation burden, freedom for countries to adapt existing in-house contact records to try to meet ESS standards, the absence of translation requirements for the forms, and some evidence of uneven recording of contact attempts by telephone (Billiet and Pleysier, 2007). Blom, Lynn and J€ackle (2008) also note the large amount of missing data on variables from certain countries. Nevertheless, despite these limitations, the data that
CONCLUSIONS
69
are available represent a major step forward for the cross-national study of nonresponse. The CCT has produced a series of papers based on these data, assessing the quality of fieldwork in each round of the ESS. More recently, the data are also being quarried by methodologists outside of the ESS team (for examples of analysis by the CCT and others, see Billiet et al., 2007; Cordero et al., 2007; Kaminska and Billiet, 2007a,b; Kreuter, Lemay and Casas-Cordero, 2007; Kreuter and Kohler, 2009). Having such data has not only facilitated detailed documentation of response and noncontact rates, but has also yielded information by country on the average number of contact attempts, the percentage of contact attempts made by time of day and day of week, the probability of contact at first call attempt by timing of first call, and the percentage of eligible sample units that refused at least once, to name just some of the issues. Later chapters will address these areas in detail. These data have also enabled the differences in fieldwork strategies across Europe to be assessed and reported back to ESS National Coordinators, data users and methodologists. Finally, the contact forms are a major source of information for research into potential sources of bias.
3.5 Conclusions The European Social Survey is a cross-national survey that aims not only to measure and chart attitudinal change over time, but also to improve the methodology of crossnational survey research. It has developed a methodology that prioritizes crossnational comparability over different national methodological traditions whilst trying to ensure that it remains sensitive to the different cultures in which it operates. This chapter has outlined how the ESS set out to ensure comparability through its methodology and organizational structure. When attempting to understand the response rate data included in this book, it is important to remember that the ESS covers a very wide range of countries, with widely differing economic, social and geographical structures. Furthermore, they also have widely varying survey infrastructures, which places limitations on the survey procedures that can be utilized, as well as requiring very different levels of adaptation from the usual way of working in some countries. In addition, the different sampling procedures employed place very different requirements on survey organizations, ranging from sending an interviewer directly to a named target person right through to having to conduct random route procedures to a high standard. The ESS also places great emphasis and has invested considerable resources in the recording of the response process across countries in order to facilitate accurate crossnational response rate comparisons. The development of equivalent contact forms that allow comparable measurement across different sampling frames has been central to achieving this. Much of the rest of this book is based on the data collected via these forms and from the NTS, and together these allow, perhaps for the first time, a truly cross-national comparison of response across such a large number of countries. To a more limited extent, the contact forms in particular allow some dimensions of
70
THE EUROPEAN SOCIAL SURVEY
nonresponse bias to be examined. With response rates commonly reported to be in decline and the emphasis shifting from top-line response rates to the effect of nonresponse bias, such data are likely to grow in importance over time. In the longer term, however, the burden of collecting such data can only be truly justified by the use that can be made of it. This book provides the first comprehensive attempt to do that.
Appendix 3.1 A Contact Form as Used in ESS 3
APPENDIX 3.1 A CONTACT FORM AS USED IN ESS 3
71
72
THE EUROPEAN SOCIAL SURVEY
APPENDIX 3.1 A CONTACT FORM AS USED IN ESS 3
73
74
THE EUROPEAN SOCIAL SURVEY
4
Implementation of the European Social Survey 4.1 Introduction The preceding chapter discussed the design of the ESS and the various rules and regulations governing the implementation of the survey. This chapter provides an overview of how the ESS was actually implemented in the 30 or so countries that participated in the first three rounds (ESS 1, ESS 2 and ESS 3). In Section 4.2, a number of basic survey features are described, such as the mode of data collection and type of sampling frame used. This serves as the general background for the detailed analyses on nonresponse in the later chapters of the book. Section 4.3 presents details of the fieldwork, such as the briefing of interviewers and the use of advance letters and incentives. These procedures are often seen as tools that survey researchers have at their disposal to influence response rates in surveys (Groves and Couper, 1998; de Heer 1999a). In Chapter 5, this information is used to investigate whether differences in fieldwork efforts help us to explain differences in response rates, both between countries and between survey rounds. The analyses in this chapter include all countries that participated in any of the first three rounds of the ESS (see Table 3.1). The only exception is Italy, which did not field the whole ESS questionnaire in ESS 2 and is therefore excluded from the empirical analyses. As a consequence, the present analyses include the 22 countries in ESS 1 and 25 countries each in ESS 2 and ESS 3. It should be noted that not exactly the same countries participated in each ESS round. The same 20 countries participated in ESS 1 and 2, while another 20 countries participated in both ESS 2 and ESS 3. Seventeen of
Improving Survey Response: Lessons learned from the European Social Survey Ineke Stoop, Jaak Billiet, Achim Koch and Rory Fitzgerald Ó 2010 John Wiley & Sons, Ltd
76
IMPLEMENTATION OF THE EUROPEAN SOCIAL SURVEY
the 32 countries from ESS 1 to 3 took part in each round. From this, it follows that the differences in prevalence of certain survey features between ESS rounds can be the result of countries entering or leaving the ESS between rounds as well as reflecting changes within countries that have participated in more than one round. All the information in this chapter is aggregate country-level data. The data used stem mainly from the National Technical Summaries that the countries have to provide when submitting their data to the ESS archive (see Section 3.4.3). It is one of the distinctive features of the ESS that such an effort to collect comparative data on survey implementation and fieldwork procedures was built into the design phase of the survey.1
4.2 Basic Survey Features 4.2.1
Survey organization, administration mode and sample
The ESS Specification for Participating Countries (European Social Survey, 2007b; see Chapter 3) stipulates that high-quality survey organizations should be appointed for the ESS. The selection of the survey organizations is primarily the task of the national research councils and the National Coordinators. As a result of the selection processes taking place in the various countries, fieldwork in the ESS is carried out by a somewhat eclectic mixture of survey firms, including commercial survey agencies, national statistical institutes, nonprofit organizations and university institutes. Table 4.1 shows that in each of the first three rounds of the ESS the majority of countries appointed a commercial survey organization. Only a few countries selected their national statistical agency, a university institute or a nonprofit survey organization for fielding the ESS. However, when dealing with these figures, it is important to bear in mind that the choice that can be made in a given country is restricted by the type and number of suitable survey agencies that exist in that country. For instance, in many countries the national statistical institute does not perform contract work for others or is prohibited from conducting attitudinal surveys that examine political issues, and therefore cannot be selected for ESS fieldwork. The proportion of countries selecting a commercial survey agency increased somewhat over the first three rounds of the ESS. This is mainly due to new countries entering the ESS for the first time in Rounds 2 or 3. Among the countries that participated in several rounds of the ESS, the great majority stuck to the same survey organization. The prescribed mode of data collection in the ESS is face-to-face interviewing. In the first three rounds, about half the countries used paper-and-pencil interviewing (PAPI), whilst the other half used computer-assisted interviewing (CAPI) (see Table 4.1). PAPI was mainly used by Central European countries, probably because 1
Groves and Couper (1998, p. 173) suggested such an effort in their monograph on household survey nonresponse.
BASIC SURVEY FEATURES
77
Table 4.1 Type of survey organization, interview administration and type of sample in ESS 1, 2 and 3 ESS 1
ESS 2
ESS 3
Number of countries
Type of survey organization Commercial National statistical agency University institute Nonprofit
12 3 4 3
16 4 3 2
17 3 4 1
Interview administration PAPI CAPI
12 10
13 12
14 11
9 3 9 1 4 1
13 7 8 1 4 1
11 6 10 2 4 2
22
25
25
Type of sample Individual Of which unclustered Household Of which unclustered Address Of which unclustered Total number of countries
CAPI interviewing is still not so common in these countries. Only two countries switched from PAPI to CAPI during the first three rounds of the ESS. The target population of the ESS consists of the population aged 15 years and over, resident within private households in each country. It is a requirement that the sample for the ESS is selected using strict random probability methods at every stage. However, sample designs may be chosen flexibly, depending on the available sampling frames, experiences and also the costs in the different countries (see also H€ader and Lynn, 2007). A basic distinction regarding the sampling design is whether a sample of individuals, households or addresses can be drawn in a country. Another important distinction relates to whether a (geographically) clustered or an unclustered sample is selected. In the previous rounds of the ESS, most of the countries used a sample of individuals, followed by countries using a household sample (see Table 4.1). Samples of addresses were only used in a minority of countries. The majority of countries relied on a clustered sampling design, presumably because this often helps to limit costs by reducing the travelling distances for interviewers. However, the share of countries using an unclustered design increased in Rounds 2 and 3, mainly because of new countries entering the ESS. Among the 20 countries participating in both Rounds 1 and 2, 18 countries did not change the basic features of the sampling design that they used (sample of individuals versus households versus addresses; clustered versus unclustered sample). This does not mean that the sampling designs were kept totally constant, since in a number of
78
IMPLEMENTATION OF THE EUROPEAN SOCIAL SURVEY
countries improvements occurred using the same general design (see H€ader and Lynn, 2007). Also, between Rounds 2 and 3, only two countries changed their basic sampling design.
4.2.2
Sample size, number of interviewers and length of fieldwork period
The ESS lays down a minimum ‘effective’ sample size of 1500 realized interviews per country (or 800 interviews in countries with populations of less than two million2). With regard to the realized (nominal) sample sizes, the vast majority of countries achieved sample sizes of between 1500 and 2500 actual interviews in all three rounds of the ESS (see Table 4.2). Analyses indicate that due to design effects the effective sample size in some countries is considerably lower than the actual sample size (H€ader and Lynn, 2007). As a consequence, in some countries the nominal sample size does not correspond to an effective sample size of at least 1500 interviews as required by the Specification. Broadly speaking, there are two main reasons for these deviations. Due to budgetary constraints in some countries, the issued gross sample was smaller than necessary for the achievement of the targeted effective sample size. In other countries, the planned response rate3 could not be achieved during fieldwork and the lower response rate also brought down the number of interviews. In the majority of countries, the number of realized interviews was kept quite stable across survey rounds. In a small number of countries, however, the sample size was increased considerably in order to compensate for high design effects (H€ader and Lynn, 2007). The ESS Specification does not contain an explicit requirement regarding the number of interviewers who should be used. However, the interviewer workload is limited, which in turn creates requirements regarding the number of interviewers to be used.4 In practice, in the first three rounds of the ESS countries differ widely with regard to the number of interviewers used. Table 4.2 shows that the number of interviewers involved varied from around 50 in some countries up to around 300 or 400 in others.5 In the majority of countries, the number of interviewers was – more or less – stable across survey rounds. In a few countries, however, quite remarkable changes occurred. Whereas between ESS 1 and 2 a small number of countries cut down the number of interviewers by half, between ESS 2 and 3 some countries doubled or 2
In ESS 1 this applied to Luxembourg and Slovenia, in ESS 2 to Estonia, Iceland, Luxembourg and Slovenia, and in ESS 3 to Cyprus, Estonia and Slovenia. 3 Please note that the planned response rate was not 70% in all countries. On the basis of former experiences and results, some countries anticipated a lower target rate. 4 It is assumed that large interviewer workloads can result in large interviewer effects on the data (see Groves et al., 2004, pp. 274–8). According to the ESS Specification, no single interviewer should work with more than 48 individuals, households or addresses (gross). 5 Since the gross and net sample sizes did not vary at the same rate across countries, this also means that the average interviewer workload was quite different between countries.
BASIC SURVEY FEATURES
79
Table 4.2 Number of interviews and interviewers and length of fieldwork period in ESS 1, 2 and 3 ESS 1
ESS 3
Number of countries
Number of realized interviews Up to 1000 1001 – 1500 1501 – 2000 2001 – 2500 2501 – 3000 More than 3000
ESS 2
0 2 10 8 2 0
1 3 12 7 1 1
1 2 15 6 1 0
Interviews Min. Max. Mean
1207 2919 1925
Number of interviewersa Up to 100 101 – 150 151 – 200 201 – 300 301 – 400 More than 400
579 3026 1901
995 2916 1884
Number of countries 5 7 6 1 0 1
7 11 3 2 1 0
7 6 6 4 2 0
Interviewers Min. Max. Mean
59 405 155
55 308 137
45 350 156
Number of countries Missing datab
2
Length of fieldwork periodc Up to 60 days 61 – 90 days 91 – 120 days 121 – 150 days 151 – 180 days 181 days or more
1
0
Number of countries 3 1 7 5 4 2
2 5 6 6 4 2
2 6 5 6 1 5 (continued )
80 Table 4.2
IMPLEMENTATION OF THE EUROPEAN SOCIAL SURVEY (Continued ) ESS 1
ESS 3
Days
Length of fieldwork period Min. Max. Mean
ESS 2
29 241 121
35 225 120
33 361 126
Number of countries Total number of countries
22
25
25
a
The figures are from the ESS dataset distributed by NSD (see Figure 3.1). This means that they refer only to interviewers who realized at least one interview. b In ESS 1, no information is available on Austria and Sweden; in ESS 2 this is the case for Iceland. c Information from the ESS dataset, except for Iceland in ESS 2: Information from the Documentation Report, Edition 3.1 was used, since information on the day/month of the interview was missing for around 70% of all Icelandic cases in the dataset.
even tripled the number of interviewers.6 In about half the cases, these changes were accompanied by a change in the survey organization appointed to administer the ESS. In the other half, however, the number of interviewers changed considerably despite the same survey organization being in charge of the ESS fieldwork. In some cases, at least, these changes were directly related to the issue of nonresponse. One country with a very low response rate in ESS 1, for example, deliberately decided to concentrate on a small number of well-trained and highly motivated interviewers as one measure to improve response rates in future rounds of the ESS. In each round of the ESS, fieldwork should last for at least one month in a fourmonth period from September to December of the respective survey year. In practice, however, large differences occur across countries (see Table 4.2).7 In the first two rounds of the ESS, the length of the fieldwork period in the individual countries varied between one and more than seven months. In ESS 3, one country even needed a year to finalize the fieldwork. Taken over all three rounds, the average length of fieldwork across countries was approximately four months (see also Koch and Blohm, 2006). In each round, fieldwork took longer than the specified four months in about half of the countries. 6
Please note that these changes were not related to changes in the gross sample size. The length of the fieldwork period was measured as the time span between the date of the first and the date of the last interview in each country. Needless to say, the interviewers could have made additional contact attempts before/after the date of the first/last interview in a country. For the present purpose, the potential error introduced by this seems to be negligible, however, especially since we are primarily interested in looking at differences between countries. It should also be noted that for a few countries we ignored some ‘isolated’ interviews with an interview date more than 10 days before/after the first/last interviews when determining the length of the fieldwork period.
7
BASIC SURVEY FEATURES Table 4.3
81
Average costs per interview in ESS 1, 2 and 3 (incl. VAT)a ESS 1
ESS 3
Number of countries
Costs per interview in euros Up to 50 51 – 100 101 – 150 151 – 200 201 – 250 251 – 300
ESS 2
5 2 2 5 2 1
6 3 1 5 2 1
7 5 1 4 3 3
Euros Min. Max. Mean
18 262 122
11 278 123
7 295 123
Number of countries Missing datab Total number of countries
5 22
7 25
2 25
a Information from Study Monitoring Questionnaire No.1 (ESS 1), contract with survey organization (ESS 2) and Fieldwork checklist (ESS 3; except Germany, Ireland, Slovak Republic, where information from the contract with the survey organization was used). b Missing in ESS 1: Finland, Greece, Ireland, Israel and Luxembourg. Missing in ESS 2: France, Iceland, Luxembourg, Sweden, Slovenia, Turkey and Ukraine. Missing in ESS 3: Bulgaria and Ukraine.
In general, the fieldwork period was longer in western European than in eastern European countries. However, in a number of countries the length of the fieldwork period differed considerably between rounds. This suggests that the length of the fieldwork period is not a stable characteristic, but one that can be influenced by idiosyncratic circumstances and events. In ESS 1, for example, the sample in one country had to be fielded in several tranches over a period of several months in order to cope with budgeting problems. This resulted in a rather long fieldwork period.
4.2.3
Survey costs
Implementing a face-to-face survey is an expensive undertaking, especially when the survey imposes such rigorous standards as are upheld in the ESS. In order to obtain comparable information on survey costs in the ESS, the average costs per interview were calculated for each country.8 Table 4.3 shows large differences in fieldwork 8
The figures were derived by dividing the total price of the survey, which had to be paid to the survey organization (including VAT), by the planned number of interviews to be realized. One should note that the actual costs may differ from the results reported here due to unforeseen difficulties that could have arisen during the implementation of the survey.
82
IMPLEMENTATION OF THE EUROPEAN SOCIAL SURVEY
costs between countries. In the most expensive countries, the average costs per interview added up to more than D 250. This is more than 10 times higher than in the least expensive countries. Not surprisingly, in the northern European countries fieldworks costs were higher than in (some) southern and all eastern European countries. The differences in survey costs across countries largely reflect differences in per capita Gross Domestic Product (GDP). The correlation between survey costs in ESS 3 and GDP, for example, is approximately r ¼ 0.8. The average costs per interview across all countries were rather similar in the first three rounds of the ESS (around D 120 in each round; see Table 4.3). However, at the level of individual countries the rate of change of costs varied widely. Restricting ourselves to the countries that fielded adjacent rounds of the ESS, we find an average increase in costs of 17% between ESS 1 and 2 (information from 13 countries). Between ESS 2 and 3, the corresponding figure is 16% (information from 16 countries). The maximum increase in survey costs in an individual country was 100%. This increase was related to a change in the survey organization; however, it should be noted that in a few instances countries experienced a considerable increase in costs even when using the same survey organization. In a few instances, a decrease in costs could be observed. The maximum decrease was 17% between subsequent rounds.
4.3 Practical Fieldwork Issues 4.3.1
Interviewers
Interviewers obviously play a central role in face-to-face surveys such as the ESS. They thus have great potential to affect data quality and survey costs. Their tasks include contacting target persons, securing cooperation and conducting the interviews according to the rules of standardized interviewing. Research shows that interviewers are not equally successful at doing their job. They differ regarding the quality of the data collected (Biemer and Lyberg, 2003, pp. 156–87) and in the response rates they achieve (Biemer and Lyberg, 2003, pp. 110–11; see also Chapter 2).9 It is argued that the actual behaviour of interviewers when attempting to contact target persons and trying to secure cooperation is decisive for the response rate achieved in a survey (Groves and Couper, 1998, pp. 219–45). However, in the rest of this chapter we will not concentrate on this actual behaviour; that will be done in Chapters 6 and 7 of this book, where microdata from the ESS contact forms will be used to analyse in detail the interviewer efforts in contacting and obtaining cooperation from sample cases. This section will instead deal with a few issues that could be thought of as being antecedent variables of a sort, and which can influence the 9
It is often difficult to distinguish to what extent these differences arise from differences between interviewers or from differences between the areas (and the target persons living in those areas) assigned to the interviewers. Research using interpenetrated sample designs has, however, shown that interviewer effects can remain strong even when area effects are controlled (Campanelli and O’Muircheartaigh, 1999).
PRACTICAL FIELDWORK ISSUES
83
behaviour of interviewers. These aspects pertain to the experience, payment and training of the interviewers. The ESS Specification for Participating Countries lays down several rules and recommendations concerning these issues. There is a considerable body of evidence showing that more experienced interviewers tend to achieve higher response rates than those with less experience (Groves and Couper, 1998, pp. 211–14). Therefore the recommendation for the ESS is to select experienced interviewers wherever possible. If ‘experience’ is defined ‘softly’, as interviewers having worked on at least one other survey before, it turns out that in the first three rounds of the ESS the average percentage of experienced interviewers across countries was around 90% (see Table 4.4). In each round, in about half the countries all interviewers had some prior experience. The percentage of experienced interviewers for individual countries was fairly stable across survey rounds. Only in a minority of countries can substantial changes be observed. These changes were in both directions and sometimes occurred even when the survey organization stayed the same. In a meeting with the survey organizations of ESS 3, field directors from several countries emphasized that they faced major difficulties in finding good interviewers, particularly in large cities (Zabal and Wohn, 2008). Levels of interviewer pay and the pay structure can both affect interviewers’ incentive to work hard in order to enhance their response rates. It is usual for survey organizations to have a standard policy concerning pay arrangements, which they are unlikely to vary significantly for particular studies. The two standard policies are to pay interviewers an hourly rate or to pay per completed interview. The latter is more popular, mainly because it makes fieldwork costs easier to control. However, the drawback of this type of remuneration is that it does not provide an incentive for interviewers to follow up potential nonrespondents; that is, persons who are difficult to reach, or hard to persuade to participate (Darcovich and Murray, 1997; de Heer, 2000). The ESS regulations on payment are framed in general terms so that they can be applied to the different types of payment systems used by the various field agencies. When the ESS countries are classified according to their basic interviewer pay scheme,10 it turns out that in all three rounds only the Nordic countries paid their 10
The classification is made according to the basic form of payment for the majority of interviewers in a country. This means that small deviations from this standard method of payment are possible; for example, when in a country classified as paying per interview a few interviewers (e.g. senior interviewers) are paid per hour, or when all interviewers receive a small proportion of their payment irrespective of the number of interviews they have conducted. A few examples from ESS 3 will illustrate this. In Switzerland, for example, the bulk of the payment was per completed interview. However, interviewers additionally received a payment per hour to compensate for their time spent contacting target persons and travelling. The higher the response rate achieved by the interviewer, the higher was the rate paid per interview. A similar scheme was used in the Netherlands. In France, most interviewers were paid per completed interview. However, a few interviewers affiliated to the survey organization received a salary. Additionally, interviewers received a bonus in certain areas and for good performance (high response rate, low travel costs). Also, interviewers received an assignment fee if they did not achieve any interviews. We classified all three countries as ‘interview þ bonus’.
84
IMPLEMENTATION OF THE EUROPEAN SOCIAL SURVEY
Table 4.4
Interviewer experience, payment and training in ESS 1, 2 and 3 ESS 1
Interviewer experience % Interviewers experienced 1–49% 50–74% 75–89% 90–99% All Min. Mean Missing dataa Interviewer payment Per interview Per interview þ bonus Per hour Interviewer training ESS-specific personal briefing of interviewers % Interviewers personally briefed
ESS 2
ESS 3
Number of countries 0 4 4 2 12
1 4 4 3 11 %
1 3 3 5 13
65 49 47 90 88 90 Number of countries 0
2
0
Number of countries 13 6 3
13 9 3
8 14 3
Number of countries
None 1–49% 50–74% 75–89% 90–99% All Length of ESS-specific personal briefing sessions No personal briefing at all Half day or less Half day to one day More than one day Training in refusal conversion Yes No Missing datab
1 0 0 0 3 18
1 0 1 0 3 20
1 2 0 1 1 20
1 6 15 0
1 9 13 2
1 10 13 1
16 5 1
18 7 0
21 4 0
Total number of countries
22
25
25
a b
In ESS 2, no information is available on Spain and lceland. In ESS 1, no information is available on Belgium.
PRACTICAL FIELDWORK ISSUES
85
interviewers for the actual number of hours worked (see Table 4.4). In these countries, the national statistical institute collected the data and the interviewers were regular employees (and were thus not working as freelancers, which is the typical arrangement in most other countries). In all the other countries, the interviewers were paid per completed interview. However, the proportion of countries paying per interview and additionally having some kind of bonus system in place increased round on round; in ESS 3, more than half the countries used some kind of bonus system. Unfortunately, we do not know whether these bonus systems were always linked to the response rates achieved or to some other quality indicator, such as adherence to time standards or completeness of interview data. Most of the countries kept their basic payment regime constant between the different ESS rounds. Between ESS 1 and 2 seven countries changed their system, and between ESS 2 and 3 six countries made a change. Most changes (10 out of 13) involved the implementation of a bonus system in addition to payment per interview. On only three occasions was a move made in the opposite direction, from having used a bonus system to not using it in the following round. Interviewer training is an important tool for influencing interviewer behaviour (Billiet and Loosveldt, 1988; Couper and de Leeuw, 2003). Two types of training can be distinguished: generic and specific. Interviewers usually receive some kind of generic training when they begin working for a survey organization. In addition, many survey organizations provide survey-specific training or briefing. The ESS rules, for example, require that all interviewers be personally briefed by the National Coordinator or members of the research team of the survey organization before carrying out an assignment for the ESS. In the first three ESS rounds, nearly all countries adhered to the basic requirement concerning the training of interviewers; that is, the personal briefing of all interviewers (see Table 4.4). In the vast majority of countries, all or nearly all interviewers (90% or more) received a personal briefing before they started to work for the ESS. In nearly all countries, the briefing sessions lasted one day or less; only a small number of countries had briefing sessions longer than one day. In all three rounds of the ESS, the vast majority of countries reported that their interviewers were trained in refusal conversion, either as part of the ESS-specific personal briefings or otherwise.
4.3.2
Information and incentives
The use of advance letters or respondent incentives are common measures to improve survey participation (see Section 2.5.3). Information can provide an intrinsic motivation to participate, and a small gift can serve as an extrinsic incentive to participate. In face-to-face surveys, a letter sent in advance of an interviewer call usually has a positive effect on the response rate (Groves and Couper, 1998, 276–81; Biemer and Lyberg, 2003, pp. 109–10; Groves, 2006). It can serve several purposes, addressing a variety of issues known to affect survey participation. The general recommendation for the ESS is to use an advance letter, personalized with the individual name of the target person if possible.
86 Table 4.5
IMPLEMENTATION OF THE EUROPEAN SOCIAL SURVEY Advance letters, brochures and incentives in ESS 1, 2 and 3 ESS 1
ESS 2
ESS 3
Number of countries Advance letter Yes No Brochure Yes No Incentive Yes No Total number of countries
20 2
20 5
19 6
11 11
13 12
15 10
11 11
14 11
15 10
22
25
25
Although not part of the ESS Specification, it is also pointed out that it might sometimes be helpful to use a brochure or leaflet, often in addition to an advance letter. The use of such documents can help to underline the reputation and authority of the survey and also provides an opportunity to give more detailed and different types of information (including graphs and pictures). In all three rounds of the ESS, most countries sent an advance letter (see Table 4.5). With one exception, the few instances where no letter was sent concern countries that used a sample of households or addresses. Where such samples are used, the positive effect of an advance letter may be diluted, as the individual to be selected may not receive or read the letter. The number of countries using a brochure was lower than the number of countries using an advance letter, though this number did increase slightly between ESS rounds. Offering a reward to the target persons appears to increase response rates (Groves and Couper, 1998, pp. 281–4; Singer, 2002). In particular, incentives can help motivate those target persons who are not interested in the survey topic to participate (and can thus counteract biases potentially arising from topic saliency) (Singer, 2002; Groves 2006; Groves et al., 2006). Unconditional prepaid incentives seem to be more effective than conditional incentives paid upon completion of the interview. In addition, cash incentives appear to work better than nonmonetary incentives. The general recommendation in the ESS is therefore to consider using an incentive to raise response rates. Since it is recognized that incentives work differently in different countries (or even between different groups within a country), with different norms and cultural settings (Couper and de Leeuw, 2003), no recommendations for a specific type of incentive are given. When deciding on the type of incentives, countries are therefore allowed to take their own culture and customs into account. Table 4.5 shows that in the first round of the ESS half the countries used an incentive, while the other half did not. In ESS 2 and 3, the proportion of countries
SUMMARY AND CONCLUSIONS
87
offering incentives to respondents increased slightly. This is the result both of new countries entering the ESS that offered an incentive from the very beginning, and ‘old’ countries changing their policy from not using an incentive in earlier rounds to using an incentive in later rounds. Interestingly, no country used an incentive and then decided not to use it in a subsequent round. The types of incentives used and the implementation procedures differed across countries. Most ESS countries relied on a conditional incentive; that is, an incentive that was only delivered upon completion of the interview. Only a few countries used a prepaid incentive, given to the target person regardless of whether they decided to participate in the survey. Some countries offered a cash incentive. Other countries provided a lottery ticket or a shopping voucher. Sometimes, a donation to a charitable organization was offered. Other incentives used covered a wide range of nonmonetary gifts, such as calendars, pens or stamps. In a few countries a mix of incentives were offered to respondents, who could choose the one they liked most. In some countries, incentives were only provided in large cities (where it was most difficult to motivate target persons to participate) or were used for refusal conversion only. In other countries, the value of incentives was raised for refusal conversion purposes. In ESS 3, one country carried out a large incentive experiment aimed at obtaining empirical evidence on the question of which incentive works best (Phelps, 2008).
4.4 Summary and Conclusions This chapter has provided an overview of how the first three rounds of the ESS survey were implemented in the participating countries. We found that some of the survey features covered were implemented differently across countries, reflecting the mix of standardization and country-specific variation that forms part of the ESS design in this area. This is the case, for example, for the mode of data collection used (PAPI versus CAPI), the length of the fieldwork period, the number of interviewers deployed or the use of incentives. Most often, this variation was within the limits set by the Specification for Participating Countries and should therefore not be seen as a problem. In a few instances, however, some countries did not adhere to the Specification for various reasons; for example, very long fieldwork periods or too few realized interviews. This points up the fact that defining procedures and setting targets is only a first step; actually implementing procedures and achieving targets is a different and often challenging task. It is probably inevitable that in such a large and complex survey as the ESS, deviations occur occasionally. As well as the differences, however, we also found many similarities in survey implementation; for instance, with regard to the personal briefing of interviewers or the use of advance letters. It would therefore seem that the approach adopted in the ESS – setting standardized requirements, complemented by some built-in flexibility for implementation – by and large worked quite well. With regard to the stability and change between different survey rounds, the overarching picture is consistency. Most countries, for example, stuck to the same
88
IMPLEMENTATION OF THE EUROPEAN SOCIAL SURVEY
survey organization, fielded the survey in the same mode, deployed approximately the same number of interviewers and provided a personal briefing for all interviewers in each round of the ESS. However, in a number of countries noticeable changes in specific aspects of the fieldwork also took place. For instance, the number of countries using incentives or brochures increased between ESS rounds. Also, the use of a bonus system for interviewers grew in later rounds. These developments might be linked to the issue of nonresponse, and in Chapter 5 we investigate further whether or not this is actually the case. Finally, it is worth noting that we have been able to take only a rather cursory look at the basic survey features and fieldwork procedures in the countries in the first three rounds of the ESS. A more detailed investigation would probably have yielded both more differences across countries and also more indications of change within countries over time. For instance, even if we know that nearly all countries in the ESS provided a personal briefing for their interviewers, we do not know about the issues covered in those briefings. It seems likely that there are differences between countries in this respect. Some of these differences are a good thing, and are actually recommended since they may, for example, reflect differences in the experiences of interviewers in different countries. It is likely that a briefing given to interviewers who are regular employees of a statistical institute will be different from a briefing of students who work freelance for a survey organization. Similarly, more detailed insights could reveal that countries using an incentive in several rounds of the ESS still implemented minor changes; for example, by increasing the value of the incentive or by altering the way in which the incentive was given to the target person. Presumably, such incremental changes are particularly relevant for processes intended to lead to continuous improvements over a longer period of time. In the ESS, details of the national implementation of the survey are not always available at a central level. Moreover, it was not possible to consider all of the details in this condensed overview.
5
Response and Nonresponse Rates in the European Social Survey 5.1 Data and Definitions This chapter focuses on the central topic of our book; namely, the response rates achieved in the ESS and the prevalence of the basic types of nonresponse (noncontacts, refusals, ‘not able/other’ nonresponse). As in Chapter 4, all analyses are performed at country level. It should be noted, however, that both the response rate and the rates of noncontacts, refusals and ‘not able/other’ nonresponse are derived from data at individual level that all countries participating in the ESS are required to provide. We included in our analyses all 22 countries from ESS 1, 25 countries from ESS 2 (only Italy was left out) and 21 countries from ESS 3. For the four remaining countries from ESS 3 (Austria, Ireland, Latvia and Ukraine), the relevant information was not available in time. At least two things are required for valid cross-national comparisons of response rates; namely, precise definitions of outcomes (e.g. definitions of eligibility, noncontacts and refusals) and a standardized procedure for calculating the response rate based upon those definitions. It is a unique feature of the ESS that it has built in this standardization from the very beginning through its contact forms (see Chapter 3, including Appendix 3.1). In what follows, data from these forms are used to provide information on response and nonresponse rates for all the ESS countries in a comparable way (see Billiet and Pleysier, 2007; Symons et al., 2008 – from which Improving Survey Response: Lessons learned from the European Social Survey Ineke Stoop, Jaak Billiet, Achim Koch and Rory Fitzgerald Ó 2010 John Wiley & Sons, Ltd
90
RESPONSE AND NONRESPONSE RATES
the data on response and nonresponse used in this chapter are drawn). Unfortunately, not all countries delivered a dataset containing the necessary information. For countries with no suitable call record data, we therefore report response and nonresponse rates calculated from the information provided in the National Technical Summaries (see Chapter 3), recognizing that they may not be directly comparable and need to be treated with due caution.1 The ESS uses the generally agreed operational definition of response rate; namely, the number of completed interviews divided by the size of the gross sample minus the number of ineligible sampling units (see Section 3.4.3). The definition of eligibility is therefore an issue that deserves special attention in calculating response rates. Depending on what is considered to be an eligible or ineligible sampling unit, response rates can be calculated in different ways (Couper and de Leeuw 2003; Stoop, 2005; American Association for Public Opinion Research, 2008). The target population of the ESS consists of the population 15 years and over resident within private households in each country, regardless of their nationality, citizenship or language (see Section 3.3.3). In accordance with this definition, we classified the following outcome codes from the contact forms as ineligible: respondent deceased; respondent moved abroad or to unknown destination; address not residential (institution, business/industrial purpose); address not occupied (including demolished houses or houses not yet built); and other ineligible cases.2 The relevance of each of these codes varies according to the type of sample used. In an address sample, the likelihood of finding a target person deceased is very small, but the likelihood of the interviewer visiting a nonresidential address cannot be ignored. In an individual sample, the possibility cannot be ruled out that the person selected will die before the interviewer calls. On the other hand, it is much less likely that the interviewer will visit a business address hoping to find a resident there. This sometimes makes it difficult to compare rates of ineligibles between countries using different types of sampling frames. However, in all these cases an interview for the ESS could not be conducted, since no person belonging to the target population of the ESS was (still) living at the address visited. Apart from this difficulty, two other notes of caution are called for. Firstly, according to the above definition, the response rate of a survey will be higher as the number of sampling units classified as ineligible increases. If interviewers – or survey organizations – are tempted to ‘improve’ their response rate by incorrectly classifying as ineligible sampling units that should actually be treated as regular nonresponse (e.g. as noncontacts or refusals), the comparability of response rates across countries 1
This was the case in the Czech Republic, Denmark, France and Sweden in ESS 1; the Czech Republic, Hungary, Iceland, Slovenia, Turkey, Ukraine and the United Kingdom in ESS 2; and Estonia and Romania in ESS 3. For these countries, the relevant information stems from European Social Survey (2003, 2005, 2007a). 2 In the following analyses, we do not distinguish a category with cases of unknown eligibility. In fact, we assume that all potential cases of unknown eligibility (which probably belong mainly to noncontacts) are eligible. This is a conservative approach, which results in the estimation of the lower bound for response rates in the countries (see Smith, 2003).
DATA AND DEFINITIONS
91
will be endangered. The definition of ineligibles and the correct use of this definition by the interviewers during fieldwork are therefore vital for achieving reliable data on response rates.3 Secondly, a low rate of ineligibles is not necessarily an indicator of a high-quality sample. An address sample comprising a reasonable proportion of business addresses where someone might live (e.g. live-in janitors) could result in a high ineligibility rate. Excluding all business addresses in advance might be cost-effective and result in a lower ineligibility rate, but would also exclude these janitors. Once ineligible units are excluded from the gross sample, response and nonresponse rates can be expressed as percentages of the eligible sample. This is fairly straightforward with regard to the response rate, which is simply the percentage of realized interviews. It should be noted, however, that only interviews in which the majority of applicable questions were answered by the respondent were considered to be completed interviews in our analyses. In the cases where no interview could be achieved, it is usual to distinguish three basic types according to the reasons for nonresponse: (1) ‘noncontacts’ – those who could not be contacted during the fieldwork period; (2) ‘refusals’ – those who were contacted but refused to participate; and (3) ‘not able/other’ nonrespondents – for example, those who were not able to cooperate due to illness or language problems. In order to obtain such a final outcome code for every individual sampling unit in the ESS, the data documented in the contact forms had to be transformed in two respects. Firstly, the results of the various contact attempts (or calls) for each unit had to be combined to obtain a single final outcome code for each sampling unit. There are two ways to arrive at a final outcome code (Billiet and Philippens, 2004; Blom, Lynn and J€ackle, 2008); namely, to take the outcome of the last contact attempt as the final response code, or to construct a priority system of call outcomes to select the outcome with the highest priority (see Lynn et al., 2002a). In the ESS, and in this book, a combination of the two methods is used. Thus the outcome of the last contact attempt is taken as the final disposition code, except when a refusal occurred at an earlier call. In that case, the final code was ‘refusal to participate’, because this code has priority over other nonresponse codes, such as noncontact. When a refusal was followed by a response because of successful conversion attempts, then, of course, the final outcome became ‘realized interview’, because it had a higher priority in the coding procedure. Secondly, in addition to combining the outcome codes from the different calls into one final code, the detailed coding scheme from the ESS contact forms also had to be recoded into the three broad categories of noncontact, refusal and ‘not able/other’. In the category ‘noncontact’, all sampling units are subsumed where no contact at all was made with anyone at any call. Refusals comprise both refusals by the target person and refusals by other household members. The ‘not able/other’ category is made up of several diverse reasons for nonresponse. It should be noted that not all the
3
This is why from ESS 3 onwards the ESS Specification for Participating Countries explicitly requires countries to include ineligible cases in the quality control back-checks.
92
RESPONSE AND NONRESPONSE RATES
following categories were relevant in every country: . respondent mentally or physically unable to cooperate; . respondent unavailable/not at home throughout the fieldwork period; . language barrier – in other words, the interview could not be realized in one of the survey languages; . respondent moved within country; . partial interview/break-off; . invalid interview; . broken appointment; . contact but no interview – other; . address not traceable; . address not attempted; and . contact form missing.
5.2 Response and Nonresponse Rates in ESS 3 5.2.1
Rate of ineligibles
Given the relevance of the rate of ineligibles for the calculation of response rates, we begin here with a brief look at the rate of ineligibles in the ESS. The Specification for Participating Countries does not detail a specific requirement with regard to the level of ineligibles. If we use the definitions from the previous section, we obtain the figures for the rate of ineligibles in the 21 countries that took part in ESS 3 that are shown in Figure 5.1.
20 18
16.6
16 13.9
14
12.8
12 % 10
9.3 7.7 7.2 7.4
8 6
4.8
4 2 0
8.3
1.2 1.2 1.2
1.8 2.0
2.6 2.9 2.9 2.9
3.4 3.6
0.1
CY DK NO RU FI SE PT CH NL SI SK DE PL RO UK FR BG BE HU ES EE
Figure 5.1
Rate of ineligibles in ESS 3
RESPONSE AND NONRESPONSE RATES IN ESS 3
93
100 90 80 70 59.8 61.0
60 %
50 46.0
64.4 64.4 64.8 64.9 65.0 65.5 66.0 66.2
67.3
69.5 70.1
71.8 72.7 73.2
52.1 52.9 50.0 50.8
40 30 20 10 0 FR CH DK UK DE NL BE FI NO BG SI EE SE HU ES CY RU PL RO PT SK
Figure 5.2
Response rates in ESS 3
The average rate of ineligibles across all countries in ESS 3 was 5.4%.4 However, there were large differences between countries. The majority of countries (13) had a rate of ineligibles of 5% or less. Five countries had a rate of 5–10%, while in three countries it was more than 10%: these were Hungary (12.8%), Spain (13.9%) and Estonia (16.6%). In the given situation, it is almost impossible to ascertain the reasons for these differences in rates of ineligibles across countries. In list-based samples, for instance, differences may arise because some frames are updated more regularly than others, or because the time span between selecting the sample and the start of fieldwork differs between countries. What can be said is that there seems to be no relationship between the ineligibility rate and the type of sample used (sample of named individuals versus sample of households versus sample of addresses). For the three countries with more than 10% ineligibles, we find that in the two countries with a sample of named individuals (Estonia and Spain), ‘moved abroad or to unknown destination’ was the category of ineligibility mentioned most frequently. In Hungary, where a sample of households was used, ‘address not occupied’ was the category most commonly used.
5.2.2
Response rate
The ESS sets a minimum target response rate of 70%. Figure 5.2 shows the response rates actually achieved in the countries in ESS 3. As can be seen, four countries achieved a response rate of 70% or more (Poland, Romania, Portugal and the Slovak Republic). The highest rate was attained in the Slovak Republic, with 73%. Another 10 countries obtained a rate of approximately 65–70% (Finland, Norway, Bulgaria, Slovenia, Estonia, Sweden, Hungary, Spain, Cyprus and Russia). Thus two-thirds of the countries in ESS 3 achieved, or nearly achieved, the ambitious target of a 70% response rate. 4
It should be noted that this is an unweighted mean across countries. Here and in the following sections of this chapter, all averaged results across countries reported are simple arithmetic means.
94
RESPONSE AND NONRESPONSE RATES
Of the remaining seven countries, two obtained a response rate of around 60% (the Netherlands and Belgium) and four countries had response rates of around 50% (Switzerland, Denmark, the United Kingdom and Germany). Only France had an even lower response rate (46%). The response rate in France was thus 27 percentage points lower than the response rate in the country with the highest rate (the Slovak Republic). Despite the general target of a 70% response rate and the aspiration to achieve comparable response rates in all ESS countries, therefore, there are considerable differences between countries in the response rates achieved.
5.2.3
Structure of nonresponse
This section examines the three main causes of nonresponse; that is, noncontact, refusal and ‘not able/other’. The ESS only sets a target for the noncontact rate; all countries that participate in the ESS have to aim for a maximum noncontact rate of 3%. Figure 5.3 shows the noncontact rates achieved in ESS 3. Half the countries achieved a noncontact rate below 3%. The lowest rate (0.8%) was obtained in Norway, but Poland, Sweden, Switzerland, Cyprus, the Netherlands, Bulgaria, Finland, Belgium, Hungary and Slovenia also had noncontact rates of less than 3%. Another six countries only just missed the target. Denmark, Spain, Portugal, the Slovak Republic, Germany and Russia had noncontact rates of between 3% and 5%. France and the United Kingdom had rates of around 7%. The highest noncontact rates in ESS 3 were in Romania (10.0%) and Estonia (13.1%). Good response rates do not necessarily coincide with low noncontact rates, and vice versa. Romania, for instance, had a response rate of 72%, but its noncontact rate was nevertheless high at 10%. Switzerland, on the other hand, had a noncontact rate of only 2%, but a response rate of just 50%. Obviously, the absence of a clear relationship between response rates and noncontact rates is explained by the fact that noncontacts only account for a small fraction of the nonrespondents. 20 18 16 14
13.1
12 10.0
% 10 8
6.6
6 4
7.2
5.0 5.0 3.3 3.3 2.6 2.7 2.7 2.9 2.9 2.9 2.0 2.2 2.2
3.8 3.9
2 0.8 1.3 0 NO PL SE CH CY NL BG FI BE HU SI DK ES PT SK DE RU FR UK RO EE
Figure 5.3
Noncontact rates in ESS 3
RESPONSE RATE CHANGES OVER TIME
95
FR CH DK UK DE NL BE NO FI BG SI EE SE HU ES CY RU PL RO PT SK 0
10
20
Response rate
Figure 5.4
30
40
Noncontact rate
50 %
60
Refusal rate
70
80
90
100
Not able/other rate
Response, noncontact, refusal, and ‘not able/other’ rates in ESS 3
If the total nonresponse is decomposed into noncontacts, refusals and ‘not able/ other’ nonresponse, in all but two countries refusals make up the largest part of nonresponse.5 The average noncontact rate across all countries was 4.1%, whereas the refusal rate was 24.2% and the ‘not able/other’ rate was 9.0%. Figure 5.4 shows the outcome of the decomposition of nonresponse in ESS 3. France and Switzerland, the two countries with the lowest response rates, also had the highest refusal rates (in both cases slightly more than 40%). Figure 5.5 shows, not very surprisingly, that there is a negative relationship between response and refusal rates (r ¼ 0.78, n ¼ 21).
5.3 Response Rate Changes Over Time 5.3.1
Overview
With data available from the first three rounds of ESS, it is possible to check how stable the response and nonresponse rates are. Table 5.1 gives aggregate information 5
In Slovenia, the percentage of ‘not able/other’ was only slightly higher than the percentage of refusals. In Cyprus, however, the category ‘not able/other’ was by far the largest one. The reason for this is that in Cyprus approximately 25% of the sample units from the gross sample were never contacted because the fieldwork was stopped prematurely once a response rate of close to 70% had been achieved. These cases were coded as ‘not able/other’.
96
RESPONSE AND NONRESPONSE RATES 100 90 80 70 60 % 50 40 30 20 10 0 FR CH DK UK DE NL BE FI NO BG SI EE SE HU ES CY RU PL RO PT SK Response rate
Figure 5.5
Refusal rate
Response and refusal rates in ESS 3
on the ineligibility rate as well as the response, noncontact, refusal and ‘not able/ other’ rates for each round of the ESS.6 Section 5.3.2 takes a closer look at stability and change at the level of individual countries. In each of the first three rounds of the ESS, the majority of countries had a rate of ineligibles of 5% or less (Table 5.1). However, the average rate increased slightly from round to round. In ESS 2, two countries had a rate of ineligibles of more than 10%: Estonia (12.1%) and Turkey (15.4%). In ESS 3 there were three countries: Hungary (12.8%), Spain (13.9%) and again Estonia (16.6%). After exclusion of the ineligibles, response rates across rounds can be compared (Table 5.1). The average response rate across countries in ESS 1 is 60.0%. In ESS 2 and 3 the rate increases slightly, to 61.6% and 62.8%, respectively. At the same time, the variation in response rates across countries is decreasing. Whereas in ESS 1 the difference in response between the country with the highest and the country with the lowest response rate added up to nearly 50 percentage points, in ESS 3 it was only 27 percentage points – nearly a halving of the difference. This is mainly due to a decrease in the number of countries with low response rates. It looks as if the efforts to achieve consistent response rates across countries in the ESS are achieving at least partial success. However, it should also be borne in mind that in ESS 3 the highest response rate achieved was 73%; this is lower than in ESS 1 and 2, in both of which there were countries that achieved a response rate of nearly 80%. The bottom part of Table 5.1 summarizes the broad structure of nonresponse in the first three rounds of the ESS. In each round, refusals are the most important reason for 6
Please note that in Table 5.1 results from all countries of each ESS round are included (except for the four ‘late’ countries in ESS 3). This means that differences across rounds can be the result of both countries entering or leaving the ESS between rounds and of changes among countries that have participated in more than one round.
RESPONSE RATE CHANGES OVER TIME Table 5.1
97
Outcome rates in ESS 1, 2 and 3 ESS 1
ESS 3
Number of countries
Ineligibility rates Up to 5.0% 5.1–10.0% 10.1% or more
ESS 2
16 6 0 %
Min. Max. Mean Response rates
0.0 9.2 3.5
30.0–39.9% 40.0–49.9% 50.0–59.9% 60.0–69.9% 70.0–79.9%
1 4 4 8 5
14 9 2
13 5 3
0.0 15.4 4.9 Number of countries
0.1 16.6 5.4
0 2 7 10 6
0 1 5 11 4
42.9 79.1 61.6
46.0 73.2 62.8
% Min. Max. Mean Nonparticipation rates Noncontact rate
32.5 79.5 60.0
Number of countries
Up to 3.0% 3.1–5.0% 5.1–10.0% 10.1% or more
8 7 4 3
7 6 8 4
11 6 3 1
% Min. Max. Mean Refusal rate
0.8 14.7 4.9
Up to 10.0% 10.1–20.0% 20.1–30.0% 30.1–40.0% 40.1% or more
0 5 11 4 2
0.9 13.5 5.6 Number of countries 0 8 11 5 1
0.8 13.1 4.1
1 5 11 2 2
(continued )
98 Table 5.1
RESPONSE AND NONRESPONSE RATES (Continued) ESS 1
Refusal rate Min. Max. Mean Not able/other rate
ESS 2
ESS 3
11.1 44.0 24.8 Number of countries
4.2 40.7 24.2
% 14.9 51.2 26.8
Up to 5.0% 5.1–10.0% 10.1–15.0% 15.1% or more
8 5 8 1
6 13 5 1
6 9 3 3
% Min. Max. Mean
1.1 25.0 8.4
Total number of countries
22
0.9 22.7 7.9 Number of countries 25
0.6 26.4 9.0 21
unit nonresponse. The average refusal rate across all countries is 26.8% in ESS 1, 24.8% in ESS 2 and 24.2% in ESS 3. The average noncontact and ‘not able/other’ rates are much lower, at around 4–6% and 8–9%, respectively. Although there is some variation in the relevance of the different nonresponse categories across countries, in nearly all countries refusal to participate is the dominant reason for nonparticipation in each round.7 The number of countries managing to achieve a noncontact rate of less than 3% is increasing (from eight countries in ESS 1 and seven countries in ESS 2 to 11 countries in ESS 3). At the same time, the number of countries with a rather high noncontact rate of 10% or more is decreasing (from three in ESS 1 and four in ESS 2 to one country in ESS 3). Also, the number of countries with rather high refusal rates (30% or more) is declining (from six countries in ESS 1 and 2 to four countries in ESS 3). However, the number of countries exhibiting a high rate of ‘not able/other’ nonresponse (15% or more) increased from one country in ESS 1 and 2 to three countries in ESS 3. 7
There are only four exceptions, where the percentage of ‘not able /other’ is higher than the percentage of refusals. Apart from Cyprus and Slovenia in ESS 3 (see footnote 5), specific procedural deficiencies in the Czech Republic led to a very high number of cases being placed in the ‘not able/other’ category in the first two rounds of the ESS. In ESS 1, the high ‘not able/other’ rate in the Czech Republic was due mainly to the fact that for many cases in the issued sample, no final outcome code was provided in the National Technical Summary. In ESS 2 the high rate is due to the fact that a large number of selected sample units were systematically dropped (and not used) near the end of the fieldwork.
RESPONSE RATE CHANGES OVER TIME
5.3.2
99
Response rate trends for specific countries
The previous analyses included 31 countries that participated in at least one of the first three rounds of the ESS. Nine countries participated only in one round, and for these countries no conclusion about the stability of response rates at country level can be made (Bulgaria, Cyprus, Iceland, Israel, Italy, Romania, Russia, Turkey and Ukraine). Of the 22 remaining countries, 15 took part in all three rounds of the ESS and seven participated in two rounds. For these countries, we can investigate changes in response and nonresponse at country level. Looking for changes of 5 percentage points or more between any two rounds, it is apparent that the majority of countries (12 out of 22 countries) do not exhibit any change. Thus in Austria, Belgium, France, Germany, Greece, Hungary, Ireland, Norway, Poland, Portugal, Sweden and the United Kingdom, response rates are fairly stable. Response rates in the remaining 10 countries show some change (see Table 5.2). In five countries, we observe an increase in response rates of at least 5 percentage points. This is the case in the Czech Republic, Luxembourg, the Slovak Republic, Spain and Switzerland. By contrast, response rates in Denmark, Estonia, Finland, the Netherlands and Slovenia show a decrease of 5 percentage points or more. An increase in response is observed mainly in countries that started with rather low response rates in ESS 1, such as Switzerland (32.5%), Luxembourg (42.6%), the Czech Republic (43.3%) and Spain (51.5%). It suggests that these countries had the strongest incentive to improve and also had good opportunities for improvement, since increasing the response rate is probably easier at a lower than at a higher level of initial response. An exception in this respect is the Slovak Republic, which obtained a response rate of 62.9% in ESS 2 but still managed to raise its response rate to 73.2% in ESS 3. The increase in response rate in these five countries ranges from a rise of 7.5 percentage points in Luxembourg to as much as 17.5 percentage points in Switzerland. In the majority of these countries, the increase in the response rate was achieved primarily by reducing the main source of nonresponse; that is, the number of refusals. A decrease in response, on the other hand, mainly occurred in countries that had achieved high response rates in previous rounds. Four of the five countries with a decrease in their response rates had previously achieved a rate of approximately 70%, and one had even achieved nearly 80%. The decrease in response ranges from a reduction of 6.3 percentage points in Slovenia to 16.8 percentage points in Denmark. This means that the reductions in the response rate observed in these five countries are similar in magnitude to the increases in the other countries mentioned above. The sources of the observed decreases in response rate vary: in one country (Estonia) an increase in the noncontact rate is the main reason for the reduction; in Denmark and the Netherlands, an increase in the number of refusals is the main source; and in Finland and Slovenia, a rise in the ‘not able/other’ category is the main factor. Of course, what has been said so far provides only a broad overview of what happened in the different countries. In order to learn more about the actual processes that took place and the reasons for these processes, a more detailed look at individual countries would be necessary. Purely for illustration, we will focus on the two countries with the most pronounced change; namely, Switzerland, with an increase in
100
Table 5.2
Countries exhibiting substantial change in response rates between ESS 1, 2 and 3
Country
Response rate %
Difference between first and last response rate (percentage points change)
ESS 2
ESS 3
Increase in response rate Switzerland (CH)
32.5
48.6
50.0
þ17.5
Czech Republic (CZ) Spain (ES)
43.3 51.5
55.3 54.9
66.2
þ12.0 þ14.7
Luxembourg (LU) Slovak Republic (SK) Decrease in response rate Denmark (DK) Estonia (EE)
42.6
50.1 62.9
73.2
þ 7.5 þ10.3
67.6
64.3 79.1
50.8 65.0
16.8 14.1
Finland (FI) Netherlands (NL) Slovenia (SI)
73.2 67.7 71.2
70.7 64.3 70.2
64.4 59.8 64.9
8.8 7.9 6.3
Refusal Not able/other Refusal Refusal Noncontact Not able/other Refusal
10.5 5.6 8.9 12.2 4.3 6.4 8.0
Refusal Noncontact Refusal Not able/other Refusal Not able/other
þ14.3 þ9.7 þ7.3 þ5.3 þ7.1 þ5.0
RESPONSE AND NONRESPONSE RATES
ESS 1
Main source of change (percentage points change)
RESPONSE RATE CHANGES OVER TIME
101
response rate of 17.5 percentage points, and Denmark, with a decrease in response rate of 16.8 percentage points. Switzerland is the country with the lowest response rate in ESS 1, at 32.5%. This is nearly 10 percentage points lower than the response rate of the country with the second lowest rate in ESS 1. During the preparations for ESS 1, it became clear that fielding a survey like the ESS in Switzerland would be a challenge and that a 70% response rate was not a realistic target. A major obstacle was the fact that the survey business in Switzerland relies mainly on computer-assisted telephone interviewing. Nationwide face-to-face surveys are a rare event, which meant that the survey organizations did not have a well-trained face-to-face interviewing corps at their disposal. In order to improve the response rates in ESS 2 and 3, the Swiss survey organization implemented a range of measures, including: .
Better training of interviewers.
.
An elaborate call schedule – at least five face-to-face contact attempts, plus additional telephone calls for noncontacts and refusals, made from a central telephone facility (call centre), in order to arrange an appointment for a face-toface interview.
.
Respondent incentives – these were increased in value (from D 6 in ESS 1 to D 20 in ESS 2 and 3); they offered different types of incentives from which the respondent could choose (cash, voucher for flowers, rail travel voucher, or donation to a charity organization).
.
The use of specialist interviewers for refusal conversion efforts.
These measures were successful, with response rates of around 50% being achieved in ESS 2 and 3. Although this is still at the lower end of the country scores in the ESS, these figures are well above the results regularly achieved in Switzerland in wellcontrolled face-to-face surveys.8 Denmark is characterized by a change in the opposite direction. It started with an above-average response rate of 67.6% in ESS 1, but in ESS 2 the rate dropped slightly to 64.3%. However, in ESS 3 there was a more dramatic decline to 50.8%. This decline can be explained largely by the fact that more people subscribed to an opt-out list, which makes it impossible to contact them for the purpose of a survey. If addresses are selected from the Danish Central Person Register – as was done for the ESS – persons who have opted out cannot legally be contacted. The possibility to opt out was facilitated by the installation of an opt-out alternative via the Internet in the period between ESS 2 and ESS 3. In ESS 3, 360 persons in the sample selected for the ESS had subscribed to this list. These persons could not be contacted and were counted as refusals. They accounted for around 12% of the eligible sample. This example makes clear that very specific circumstances and events can sometimes lead to marked changes in a country’s response rate. 8
In the Survey of Health, Ageing and Retirement in Europe (SHARE), for example, the response rate in the first wave, which was fielded in 2004, was below 40% in Switzerland (De Luca and Peracchi, 2005).
102
RESPONSE AND NONRESPONSE RATES
5.4 Response Rate Differences and Fieldwork Efforts 5.4.1
Response rate differences across countries and fieldwork efforts
What are the reasons for the differences in response rates between countries? According to the models outlined in Chapter 2, two broad groups of influencing factors can be distinguished. On the one hand, it can be more or less difficult to achieve a certain response rate target in a given country. Countries differ with respect to the contactability of their population (e.g. due to differences in labour force participation rates) or the attitudes of the population towards surveys and the willingness to participate in surveys (sometimes labelled the ‘survey climate’; see Groves and Couper 1998, p. 155 – and see the critical remarks on this concept by Smith, 2007). On the other hand, even when countries do not differ with regard to the difficulty of interviewing, they – or, more precisely, their survey organizations9 – may expend more or less effort to achieve a certain response target. As we saw in Chapter 4, countries differ with regard to interviewer training, interviewer payment schemes and the use of respondent incentives. In this section, we concentrate on this second group of factors and try to analyse whether differences in fieldwork efforts help to explain the differences in response rates across countries.10 A first, somewhat naive approach is to check whether there are differences in response rates between countries that used a particular survey procedure (e.g. providing a respondent incentive) and countries that did not. Table 5.3 shows the average response and nonresponse rates for the countries in ESS 3 that did and did not use incentives. Contrary to the naive expectation, in ESS 3 countries that used an incentive achieved a somewhat lower response rate on average than countries that did not use an incentive (–2.9 percentage points). Both their refusal and ‘not able/other’ rates were higher: the noncontact rate, however, was lower. This result is not an idiosyncrasy of ESS 3, since the results in ESS 1 and ESS 2 were fairly similar, and in fact even a little more pronounced in the ‘wrong’ direction.11 The reason for these results becomes clear if allowance is made for the fact that our data do not come from an experimental variation. It was not decided at random which countries used an incentive and which did not. In fact, precisely the contrary seems plausible; countries that expected difficulties in achieving good response rates opted to use an incentive, while countries where interviewing was expected to be less 9
It should be noted that since each country appoints a different survey organization, the effects of ‘countries’ and ‘survey organizations’ are confounded in the present analyses. 10 It should be noted that the present analysis focuses only on a number of the different fieldwork aspects described in Chapter 4. These are mainly rather general indicators for various fieldwork efforts. They do not cover the specific interviewer behaviour in contacting and motivating target persons, such as the number and timing of call attempts or the reissuing of cases for refusal conversion. Detailed analyses of these aspects (including their consequences for response rates) are provided in Chapters 6 and 7. 11 In ESS 2, countries using an incentive had a response rate that was 3.8 percentage points lower than for countries that did not use an incentive. In ESS 1, the respective difference was 5.9 percentage points.
RESPONSE RATE DIFFERENCES AND FIELDWORK EFFORTS Table 5.3
103
Average response and nonresponse rates, by use of incentives, ESS 3
Incentive Number of Response Noncontact Refusal rate (%) ‘Not able/ countries rate (%) rate (%) other’ rate (%) Yes No
13 8
61.7 64.6
3.4 5.2
24.8 23.1
10.1 7.2
difficult had less need to use an incentive. Consequently, at country level there is probably a relationship between the expected difficulty of achieving good response rates and the decision to use an incentive (or any other measure that might help to improve the response rate).12 An even more striking example relates to the length of the fieldwork period. Usually, it would be expected that the longer the data collection period of a survey, the greater would be the chance that all target persons could be reached, as longer fieldwork periods allow for repeated attempts to contact persons who are difficult to reach (Groves and Couper, 1998, pp. 272–4). In addition, people who are difficult to persuade to participate can be re-approached. However, at country level in ESS 3 there is a negative relationship between the length of the fieldwork period and the response rate obtained (Pearson’s r ¼ 0.63, n ¼ 21; see Figure 5.6). Probably, difficulties in achieving high response rates made several countries extend their fieldwork period, whereas countries with no difficulties were able to finish the fieldwork more rapidly. Of course, this negative correlation at the between-country level does not preclude a positive relationship within a country. If a country extends its fieldwork period in order to reissue difficult to reach or reluctant target persons, this will have a positive effect on the response rate. If an attempt is made to examine the relationship between fieldwork efforts and response rates more systematically, it seems reasonable not to rely on single indicators for fieldwork efforts. Differences in response rates can be caused by a multitude of different fieldwork factors. In the present situation, the effect of individual factors cannot be singled out, since we do not have data from randomized treatments and it is also not possible to apply control variables in analyses due to the small number of cases (n ¼ 21 countries). An alternative might be to construct an index drawing on several dimensions of fieldwork procedures and fieldwork efforts. This would make it possible to consider the impact of several fieldwork factors simultaneously and to explore their combined effect on the response rates. To build such an index, the eight fieldwork aspects described in Section 4.3 are used. These aspects can be subdivided into two broad categories. The first category covers five different features of the interviewers. The second consists of three aspects relating to information and incentives. We first dichotomized each of the 12
A similar assumption – on an individual level – is made by Brehm (1993, pp. 128–30).
104
RESPONSE AND NONRESPONSE RATES 80 75
SK
RO PL 70 BG
Response rate (%)
65
SI
HU
PT RU
CY
ES FI
NO
EE
SE
BE
NL
60 55 UK
DE DK
CH
50 FR 45 40 35 30 0
50
100
150
200
250
Length of fieldwork period (days)
Figure 5.6
Length of fieldwork period and response rates in ESS 3
eight different aspects of fieldwork efforts (high efforts ¼ 1, low efforts ¼ 0), and then built an additive index. The following issues were included (high efforts mentioned first): .
experience of interviewers – 90% or more of all interviewers experienced versus less than 90% experienced;
.
payment of interviewers – per hour or per interview þ bonus versus per interview;
.
personal briefing of interviewers – 90% or more of all interviewers personally briefed versus less than 90% personally briefed;
.
length of personal briefing sessions – more than half a day versus half a day or less;
.
interviewers trained in refusal conversion – yes versus no;
.
use of advance letter – yes versus no;
.
use of brochure – yes versus no; and
.
use of respondent incentive – yes versus no.
RESPONSE RATE DIFFERENCES AND FIELDWORK EFFORTS
105
80 75 RO 70
Response rate (%)
HU
EE
65
CY
PT
PL
RU
ES FI
SE
SI
BG
SK
BE
NO
NL
60 55
UK
DK
DE CH
50
FR 45 40 35 30 2
3
4
5
6
7
8
9
Index of fieldwork efforts
Figure 5.7
Index of fieldwork efforts and response rates in ESS 3
The index scores range from 3 to 8 for the 21 countries that participated in ESS 3. This means that some countries only implemented three of the eight aspects of fieldwork efforts, whereas other countries implemented them all. Since each of the different aspects is expected to contribute to higher response rates (see Chapter 4), the hypothesis is that countries with higher efforts should exhibit higher response rates than countries with lower efforts. However, this hypothesis does not turn out to be true. Figure 5.7 portrays a negative relationship between the index of fieldwork efforts and response rates in ESS 3 (r ¼ 0.41). A closer look at the countries with the lowest and highest scores on the measure of fieldwork efforts reveals the following. At the lower end (index of fieldwork efforts of 3 or 4), five countries achieved quite a high response rate (65% or more) with limited efforts. These are all eastern or central European countries (Bulgaria, Estonia, Hungary, Romania and Slovenia). At the higher level of efforts (index of fieldwork efforts of 7 or 8), two groups of countries can be distinguished. On the one hand, there are four countries that exerted a lot of effort but still only attained a response rate of 53% or less; this group includes four western European countries (France, Germany, Switzerland and the United Kingdom). On the other hand, there are five countries with a similar level of effort but a response rate of at least 60%; this group comprises four western European countries (Finland, Portugal, the Netherlands and Norway) plus Russia.
106
RESPONSE AND NONRESPONSE RATES
It is quite evident that even when we use an index we cannot avoid the fact that the nonexperimental nature of our observations may mean that we are unable to find the expected relationship. Countries may differ with respect to the difficulty of achieving high response rates, and this difficulty is probably related to the efforts that are invested in fieldwork. Additionally, we cannot rule out the possibility that countries also differ on other, nonobserved aspects of fieldwork or that the effectiveness of certain fieldwork procedures varies across countries. In summary, there are many possible explanations for why our ‘naive’ hypothesis linking fieldwork efforts and response rates across countries is rejected.
5.4.2
Change in response rates over time and change in fieldwork efforts
In this section, we are not concerned with differences across countries but instead try to ascertain whether changes in fieldwork efforts between survey rounds can help to explain changes in response rates over time within a country. In doing this, countries are kept constant, thereby avoiding some of the obstacles we faced in the preceding section. As long as it can be assumed that the factors that can have an impact on the difficulty of achieving a high response rate in a country do not change much within a two-year period, it seems reasonable to expect that changes in fieldwork efforts between survey rounds will result in changes in response rates. With information available from three rounds of the ESS, changes both between ESS 1 and ESS 2 and between ESS 2 and ESS 3 can be examined. There are 20 countries that participated in ESS 1 and ESS 2 and 17 that took part in both ESS 2 and ESS 3. The following steps were taken: (a) The dependent variable is the change in the ESS response rate between survey rounds at country level. Specific rates – for example, for noncontacts or refusals – are not considered separately, since most of the indicators that we use for fieldwork efforts will probably affect both types of nonresponse. (b) The independent variable consists of an index that comprises the same dimensions of fieldwork procedures and efforts as in the preceding section (see Table 5.4). The decision to use an index and not to rely on individual variables is based on two considerations. Firstly, changes in response rates can be caused by a multitude of different factors. In order to analyse the effect of an individual factor, all the other factors ideally need to be kept constant. This is not feasible in the present situation, however, since the data are of an observational nature. Therefore, it seems appropriate to take several fieldwork factors into consideration simultaneously and try to ascertain their combined effect on the response rate. Secondly, most countries do not exhibit any change between rounds on any of the indicators of fieldwork efforts. This means that using individual variables would lead to very skewed distributions. This aspect is mitigated when an index is used.
RESPONSE RATE DIFFERENCES AND FIELDWORK EFFORTS Table 5.4
107
Change in fieldwork efforts between ESS rounds
Total number of countries in both rounds Change in fieldwork procedures/efforts
ESS 1 versus ESS 2
ESS 2 versus ESS 3
20
17
Worse
Same
Better
Worse
Same
Better
Number of countries Interviewer Proportion of experienced interviewersa Payment of interviewers (per interview, per interview þ bonus, per hour)b Proportion of interviewers personally briefedc Length of ESS-specific personal briefing sessions (up to half a day, between half a day and one day, more than one day)d Training in refusal conversion (yes/no) Information and incentives Advance letter (yes/no) Brochure (yes/no) Respondent incentive (yes/no) Index of change in fieldwork efforts Number of countries Range of index
5
11
4
0
13
4
2
13
5
1
13
3
2
17
1
1
15
1
2
15
3
6
9
2
2
15
3
0
14
3
1 2 0
18 15 18
1 3 2
0 1 0
16 13 16
1 3 1
6
7 2 to þ5
7
3
5 2 to þ3
9
a Differences of 10 percentage points or more were counted as an improvement or as a worsening, respectively. b Only changes between payment ‘per interview’ and payment ‘per interview þ bonus’ were observed. The introduction of a bonus system was classified as an improvement, its cancellation as a worsening. It should be noted, however, that we do not know for certain whether or not the bonus system was always related to the achievement of a certain response rate. c Differences of 10 percentage points or more were counted as an improvement or as a worsening, respectively. d Longer briefings were classified as an improvement, shorter briefings as a worsening.
108
RESPONSE AND NONRESPONSE RATES 20
Difference in response rates, ESS 1 vs ESS 2 (%)
CH 15 CZ 10 LU 5
ES
BE PT PL 0
DE
SI SE
GR
FR HU
FI DK
NO
AT IE NL
UK
−5
−10
−15 −3
−2
−1
0
1
2
3
4
5
6
Difference index for fieldwork efforts, ESS 1 vs ESS 2
Figure 5.8 ESS 1–2
Difference in response rates, ESS 1–2, by difference in fieldwork efforts,
(c) Each country is coded for the eight different fieldwork aspects according to whether a change took place between survey rounds. Such a change might, for example, relate to now using/not using a procedure such as incentives that was not used/was used in the previous round, or increasing/decreasing the proportion of experienced interviewers between ESS rounds. An improvement was coded as þ1, no change was coded as 0 and a worsening was coded as 1. Table 5.4 gives the details. The index of change in fieldwork efforts was obtained by simply adding up the eight different aspects in each country, with all being given an equal weighting.13 Figure 5.8 depicts the relationship between the differences in fieldwork efforts and the differences in response rates for the first two rounds of the ESS. According to the index, six countries reduced their fieldwork efforts between ESS 1 and ESS 2, seven countries kept their efforts the same and another seven countries increased their efforts. Of the six countries that reduced their efforts, five countries experienced a 13
In two instances (the proportion of experienced interviewers in Spain in ESS 2, and whether interviewers received training in refusal conversion in Belgium in ESS 1), the information was missing. In both cases, we did not exclude the countries. The respective change variables were coded as 0 (no change).
RESPONSE RATE DIFFERENCES AND FIELDWORK EFFORTS
109
(slight) decrease in response rates (less than 5 percentage points) and one country improved its response rate by 7.5 percentage points (Luxembourg). Of the seven countries with ‘constant efforts’, four countries experienced a slight increase in response and three countries a slight decline (each time less than 5 percentage points change). And of the seven countries with an increase in efforts, two saw a considerable improvement in their response rates (increases of 12.0 percentage points in the Czech Republic and 16.1 percentage points in Switzerland) and another two countries showed a slight rise in response rates (less than 5 percentage points). However, there are also three countries where despite an increase in efforts, there was a slight decline in response (less than 5 percentage points).14 The positive relationship between changes in efforts and changes in response rates for ESS 1 and ESS 2 is expressed in a Pearson’s correlation coefficient of r ¼ 0.44. However, it should be noted that the strength of this correlation is heavily determined by one single country, namely the Czech Republic. According to our measure, the Czech Republic showed the strongest increase in fieldwork efforts between ESS 1 and ESS 2 (þ5), and at the same time experienced a considerable increase in its response rate (þ12.0 percentage points). If the Czech Republic is excluded from the analyses, the correlation coefficient is reduced to r ¼ 0.18. The relationship between efforts and response rates becomes even more equivocal if the results for ESS 2 and ESS 3 are taken into account (see Figure 5.9). Between these two ESS rounds, nine countries intensified their efforts, five did not change their efforts and three reduced their efforts. The three countries with reduced efforts also experienced a slight decrease in response (less than 5 percentage points). Of the five countries with constant efforts, one country did not experience any change in response; another two countries showed a slight increase or a slight decrease, respectively (less than 5 percentage points). Of the remaining two countries, one increased its response rate considerably (Spain: þ 11.3 percentage points), while the other showed a sharp decline in response (Estonia: 14.1 percentage points). Of the nine countries with higher efforts, four experienced a moderate rise in their response rate (less than 5 percentage points), and one achieved a large increase (Slovak Republic: þ 10.3 percentage points). On the other hand, there was also one country with no increase at all and two countries with a decrease of 5.3 percentage points (Slovenia) and 6.3 percentage points (Finland). The strongest apparent counter-example for a positive relationship between change in efforts and change in response rates is Denmark; despite an increase in efforts of þ 2, the response rate in Denmark declined by 13.5 percentage points between ESS 2 and ESS 3. There are, however, grounds for questioning whether Denmark should really be included in this analysis. As mentioned above (Section 5.4.2), almost 12% of the eligible sample units in Denmark in ESS 3 had subscribed to an opt-out list and could not be used at all for fieldwork. These cases were counted as nonresponse. This obviously led to a sharp decrease in response independently of all the efforts expended on fieldwork. After
14
It is of course possible that the decline would have been greater without these additional efforts.
110
RESPONSE AND NONRESPONSE RATES
Difference in response rates, ESS 2 vs ESS 3 (%)
20
15 ES SK 10
5
FR UK HU
0
NO NL
BE
DE PT CH SE
PL SI
−5 FI −10
DK
EE −15 −3
−2
−1
0
1
2
3
4
5
6
Difference index for fieldwork efforts, ESS 2 vs ESS 3
Figure 5.9 ESS 2–3
Difference in response rates, ESS 2–3, by difference in fieldwork efforts,
excluding Denmark from the analyses, the Pearson’s correlation coefficient is r ¼ 0.30. To summarize, we can say that there is at least some evidence of a positive relationship between changes in fieldwork efforts and changes in response rates. The fact that the evidence is far from conclusive is probably a consequence of the various deficiencies that impaired the analyses. Apart from the nonexperimental nature of our data, there are four obvious shortcomings of this approach. Firstly, none of our indicators pertaining to the interviewers captures the actual interviewer behaviour during fieldwork. Interviewer experience, interviewer payment and interviewer training all represent important antecedents for fieldwork behaviour, but not the behaviour itself. More detailed information on interviewer calling behaviour (e.g. number and times of call attempts) and strategies for motivating target persons (e.g. number and success rates of refusal conversion attempts) and the change in these behaviours between rounds should provide better results. At the time this chapter was written, however, this information was only available for a minority of countries in the first three rounds of the ESS. Secondly, in our approach to building an index of fieldwork efforts, the various aspects included each received the same weight. This meant, for instance, that introducing a respondent incentive in the next round of the ESS was treated in the
RESPONSE RATE DIFFERENCES AND FIELDWORK EFFORTS
111
same way as extending the length of the interviewer briefings from a half-day to a fullday event. Whether these two changes really have the same impact on the response rate (and whether these effects will be the same across countries) is not known; at best, this is an unproven assumption. However, this approach was the only pragmatic way to combine the various fieldwork aspects for our analyses. Thirdly, the measures of fieldwork procedures and fieldwork efforts are rather crude. There might be (and in fact actually were) changes in fieldwork procedures that could not be captured by our approach. With regard to incentives, for example, we only measured whether or not an incentive was actually used. However, if a country used an incentive in ESS 1 and also did so in ESS 2, but raised the value of the incentive, this was not classified as a change. Fourthly, it seems plausible that there is an interaction effect between fieldwork efforts and response rates: changes in fieldwork procedures will have a larger effect if a country has a low response rate than if that country already has a high response rate. Unfortunately, the small number of countries available meant that we were not able to differentiate in our analyses between countries with low and high response rates. From another perspective, focusing on the change in fieldwork efforts between rounds (instead of on the consequences of changes in fieldwork efforts on response rates), an interesting result can be observed. It seems that the change in fieldwork efforts between ESS rounds is dependent on the response rate achieved in the previous round. For the two pairs of ESS rounds available (ESS 1–2 and ESS 2–3), the correlation between the change in fieldwork efforts and the response rate in the previous round is r ¼ 0.40 and r ¼ 0.34, respectively (see Table 5.5 and Figure 5.10). This means that countries with low response rates subsequently intensify their fieldwork efforts more than countries with high response rates. This is probably one reason why we observe a fairly strong (negative) relationship between the response rate in the previous round and the observed change in response rates between ESS rounds (r ¼ 0.72 and r ¼ 0.56, respectively) (see Table 5.5 and Figure 5.11).
Table 5.5 Changes in fieldwork efforts, changes in response rates and response rates in previous rounds (Pearson’s correlation) Correlations between
Changes in fieldwork efforts and changes in response rate Changes in fieldwork efforts and response rates in previous round Changes in response rate and response rate in previous round
ESS 1 versus ESS 2
ESS 2 versus ESS 3 (excluding Denmark)
0.44
0.10
0.30
0.40
0.34
0.37
0.72
0.56
0.62
RESPONSE AND NONRESPONSE RATES 6
CZ
5 4 3
ES UK
2
NO
CH
1
IE NL FR
0
LU
−1
BEAT DE
HUPL GR
PT DK FI SE SI
−2 −3 30
40
50
60
70
80
Difference index for fieldwork efforts, ESS 2 vs ESS 3
Difference index for fieldwork efforts, ESS 1 vs ESS 2
112
6 5 4
FR
3 2
SK
1
UK ES
0 −1
SE FI
PT PL EE
SI
BE NL
NO HU
−2 −3 30
Response rate, ESS 1 (%)
Figure 5.10
DK
CHDE
40
50
60
70
80
Response rate, ESS 2 (%)
Difference in fieldwork efforts by response rate in previous round
20
CH 15
CZ 10
LU
5
ES FR
0
BE AT
DE UK
IE
NO
PT
DK HU NL
−5
PL SI FI
GR
SE
−10 −15 30
40
50
60
Response rate, ESS 1 (%)
Figure 5.11
70
80
Difference in response rates, ESS 2 vs ESS 3 (%)
Difference in response rates, ESS 1 vs ESS 2 (%)
Taken together, this suggests a promising picture to the survey researcher. Countries with below-average results intensify their efforts and as a result manage to increase their response rates in future rounds. This provides some evidence that the aim of the ESS – learning from round to round with a view to future improvement – is put into practice at least to some degree. In every round of the ESS, fieldwork and interviewing procedures in all participating countries are analysed and evaluated. Each country participating in the ESS receives feedback about shortcomings and
20 15
ES
SK
10 5
FR
DE
CH
0
UK
NO
HU PL SI
NL
−5
PT
BE SE
FI −10
DK
EE
−15 30
40
50
60
70
80
Response rate, ESS 2 (%)
Difference in response rates by response rate in previous round
RESPONSE RATE DIFFERENCES AND FIELDWORK EFFORTS
113
deviations in survey procedures and targets. Strategies for improvement in forthcoming rounds are discussed and, if feasible, better or new fieldwork procedures are implemented. It seems that these evaluation and feedback procedures, which were deliberately incorporated in the management of this cross-national survey, yield some positive effects. Figures 5.4 and 5.5 reproduced by permission of the authors of the research reports of CeSO, K.U. Leuven.
6
Response Enhancement Through Extended Interviewer Efforts 6.1 Introduction High response rates are pursued in two ways in the European Social Survey: firstly by aiming for a high contact rate, and secondly by aiming for a high cooperation rate.1 Chapter 3 outlined the response requirements and guidelines of the European Social Survey; Chapter 4 demonstrated that the implementation of the Survey in the different countries differs in some important ways; and in Chapter 5 we saw that that there are substantial differences in contact and cooperation rates across countries. This chapter focuses on interviewer efforts, how these differ across countries and whether such efforts lead to enhanced response rates. We will show that both the ease of contact and willingness to cooperate vary across countries independently of the efforts made by interviewers. This is due partly to differences in national implementations of the survey design (e.g. the sampling frame and the recruitment mode), partly to the sociodemographic characteristics of the population (e.g. the at-home behaviour) and partly to possible differences between countries in attitudes towards surveys in general and 1
‘Not able’ is a third factor behind nonresponse. Since there are diverse reasons why someone cannot participate in a survey (having mental or physical disabilities, not understanding the fielding languages), since in many cases there is no way of overcoming this, and since ‘not able’ usually accounts for only a small proportion of the nonresponse, this cause of nonresponse will be ignored in this and subsequent chapters. An overview of all response outcomes in the first three rounds of the ESS is presented in Appendix 6.1.
Improving Survey Response: Lessons learned from the European Social Survey Ineke Stoop, Jaak Billiet, Achim Koch and Rory Fitzgerald Ó 2010 John Wiley & Sons, Ltd
116
RESPONSE ENHANCEMENT
the topics of the ESS in particular. For instance, there is some evidence that survey scarcity enhances survey cooperation (de Leeuw and Hox, 1998), which would result in higher response rates in countries where social surveys are relatively scarce (e.g. Cyprus) and lower response rates in countries where a small population has been receiving many survey invitations for a longer time (Switzerland and Luxembourg). This chapter will describe how high contact and cooperation rates are pursued, which strategies seem to be effective and which groups are likely to be brought into the survey through these strategies. There is ample evidence that accessibility (ease of contact) and willingness to cooperate are independent factors (Lynn et al., 2002b; Stoop, 2005) and that people who are hard to reach may be just as willing or unwilling to cooperate as those who are always at home. Contact and cooperation can therefore be treated as independent processes in obtaining response. It has to be acknowledged, however, that there may be practical reasons why hard-to-reach respondents are also less likely to cooperate. For example, there may simply be too little time left to re-approach them to try to convert them after an initial refusal (Verhagen, 2008). This chapter pays more attention to contacting target persons than to obtaining their cooperation. This may seem strange, because refusal is usually a much more important cause of nonresponse than noncontact (see Appendix 6.1). There are several reasons for this focus on noncontact. The first is simply that contacting target persons is the first crucial step towards obtaining response: only when contact has been made in a face-to-face survey can the interviewer invite the target person to take part in the survey, after which the target person can decide whether or not to cooperate. Secondly, although noncontact rates can be kept low (see the 3% target maximum noncontact rate in the ESS), establishing contact can require a vast amount of time and money. If this investment can be reduced through more efficient contacting procedures, more attention can be paid to obtaining cooperation or to other aspects of the survey. In addition, if target persons are contacted earlier in the fieldwork period, more time will be available for refusal conversion if they do not cooperate at the first contact. A third reason for focusing on contact is that the ESS contact forms provide a wealth of data on the contacting process (when and how calls were made). This compares with the rather limited information on what interviewers and target persons said, and why the latter did or did not cooperate. Other studies show the value of recording these doorstep interactions (Campanelli, Sturgis and Purdon, 1997; Couper, 1997; Loosveldt and Storms, 2001), but – except for reason for refusal and a judgement by the interviewer on the likelihood of future cooperation – this information is not available in the ESS. Chapter 7 addresses refusal conversion attempts in detail. The data in this chapter were first recorded on the ESS contact forms described in Chapter 3 (see Appendix 3.1). They provide paradata in the form of complete information about the calling process and the outcomes of the individual calls. Systematic recording of paradata is required in order to explore correlates of nonresponse (Groves and Couper, 1998) and to distinguish between contact and obtaining cooperation (Lynn and Clarke, 2001, 2002) although good paradata are rarely available (Schnell, 1997; Atrostic et al., 2001; Duhart et al., 2001). Even when
PREVIOUS RESEARCH ON CONTACTABILITY
117
paradata are collected, the operational difficulties of ensuring they are of good quality are considerable (Bates and Creighton, 2000; Duhart et al., 2001), because the main interest is usually in the substantive survey results and it is difficult to set the same quality criteria for the contact forms as for the questionnaire. Despite the wealth of paradata in the ESS, there are several problems concerning quality and comparability across countries. Firstly, not every fieldwork organization is familiar with keeping close track of the fieldwork using contact forms. Recording and keying call or contact information is an extra burden for interviewers and fieldwork organizations, which is reflected in the variable quality of contact form data. This sometimes results in data files with missing or additional (country-specific) variables or ‘wild codes’. Secondly, it appears that interviewers may not always have completed the contact forms consistently across countries and that, for instance, more centrally designed instructions on how to code reasons for refusal and neighbourhood characteristics may be needed. Thirdly, a number of countries were unable to deliver a complete dataset because of stringent national confidentiality laws (e.g. Norway and Iceland). Several other problems emerged during the analyses, which will be referred to when the results are presented. Finally, it should be kept in mind when interpreting the contact form data that more happens in the field than can be captured in a few variables on a contact form. Ironically, perhaps, it is the countries that record the most accurate and detailed information where it is easiest to really examine the data for evidence of deviations from ESS fieldwork requirements. Where there is limited or no call record data, such deviations may never come to light. An overview of the final response outcomes of the countries participating in the first three rounds of the European Social Survey is given in Appendix 6.1. More detailed empirical results in this chapter are based on those countries for which complete and usable call record data are available from ESS 2. Depending on the kind of analysis, a varying number of between four and seven countries have to be excluded because their data cannot reliably support conclusions. These countries are the Czech Republic (CZ), Iceland (IS), Norway (NO), Slovenia (SI), Turkey (TR), Ukraine (UA) and the United Kingdom (UK). Two of these countries cannot be used at all because the call record data are missing (TR) or largely missing (IS). Two other countries (UA and UK) delivered call record data that were too incomplete to provide reliable conclusions. Norway only started recording calls after the first telephone contact had been made. The problems with the remaining two countries (CZ and SI) are less serious and do not prevent reliable conclusions being drawn in most of the analyses. These problems are extensively documented in Billiet and Pleysier (2007).
6.2 Previous Research on Contactability 6.2.1
Factors in establishing contact
Establishing contact with the target person is a necessary first step once the target person has been identified. When contact has been made, the target person may still
118
RESPONSE ENHANCEMENT
refuse or not be able to cooperate; without contact, however, the request to participate cannot even be made. Contactability as an important factor in the process of obtaining response was discussed in Sections 2.3 and 2.6. Noncontact, or difficulty in establishing contact with sample persons in face-to-face surveys, is a problem for several reasons. Firstly, unsuccessful calls where nobody opens the door increase survey costs and the duration of fieldwork, because subsequent calls have to be made to minimize noncontact. Secondly, final nonresponse due to noncontact reduces the sample size and in turn the precision of the results. Thirdly, the literature shows that a number of groups (single-person households, young people and the employed) are harder to contact than others (the elderly, families with small children). Sometimes this can be corrected for by weighting with known socio-demographic variables, but this results in reduced precision. In other cases, where noncontacts are people with a very active social life, people who travel a lot and people who work night shifts and sleep during the day, weighting will not be possible and nonresponse bias will occur (see Section 2.6). There is some evidence (Lynn et al., 2002b; Stoop, 2005) that the difference between the difficult and easy to contact is larger than the differences between other response groups (e.g. immediately cooperative versus initially reluctant respondents). Luckily, as has been shown, high contact rates in face-to-face surveys are achievable. This means that nonresponse bias because of noncontact can be small, although the impact of any remaining noncontacts still has to be considered, especially in the light of the potential differences between the difficult and easy to contact. This section provides a brief overview of the literature on factors behind the contacting process in the context of the model presented in Figure 2.1. The model is referenced as a background for the empirical results of the European Social Survey and therefore only those areas where the ESS can provide empirical evidence are discussed. For example, no attention will be paid to interviewer attributes, because this information was not available in ESS 1–3.2 Similarly, the effect of physical impediments will receive only minimal attention here since, apart from information on whether or not there is an intercom, there is hardly any information about this in the ESS. Instead, the focus will be on two survey characteristics; namely, sample type and recruitment mode (Section 6.4). This section focuses on those studies that provide findings on the relationship between contactability and a range of other factors, including socio-demographic attributes of target respondents, physical impediments at target respondents’ homes, accessibility related to at-home patterns and the call patterns of interviewers. Aspects of the social environment of the participating countries were presented in Chapter 3, providing useful background information concerning the differing situations in which ESS fieldwork takes place.
6.2.2
Who is hard to contact?
Identifying socio-demographic groups as being hard to contact runs counter to present trends in nonresponse studies, which instead focus on the underlying factors 2
Information on interviewer attributes was collected from ESS 4 onwards.
PREVIOUS RESEARCH ON CONTACTABILITY
119
determining nonresponse or noncontact. With regard to noncontact, however, the relationship with socio-demographics appears to be much more direct than is the case with cooperation. Put simply, people who are members of a household where at least one person is usually at home will be easy to contact. However, the literature on noncontact is less enlightening than might be expected. Firstly, in many studies no clear distinction is made between making contact and securing cooperation. Secondly, conclusions on contactability are sometimes based on a comparison of final noncontact rates between different groups, whereas no information is presented on the efforts made to reach these groups in the first place (number and timing of calls). A final problem is that it can be hard to distinguish between contactability and eligibility. An incomplete, somewhat outdated sample of individuals (see Section 6.4) may, for instance, include many persons who no longer live at the address listed. It will not always be clear whether this should be recorded as ineligibility (incorrect address, person moved out of the country) or noncontact (where the person has moved to an unknown address and so cannot be traced). Over the years, the same factors have been identified as determining ease of contact: sex, age, household composition, life stage, labour force participation, socioeconomic status, housing situation and being a member of an ethnic minority group (Smith, 1983; Goyder, 1987, p. 84; Campanelli, Sturgis and Purdon, 1997, pp. 3–13; Groves and Couper, 1998, p. 115; Japec and Lundqvist, 1999; Blohm and Diehl, 2001; Lynn and Clarke, 2001, 2002; Lynn et al., 2002b, p. 142; Stoop, 2005; Johnson et al., 2006). Women and the elderly, for instance, spend more time at home than men and younger people, in part because they participate less in the labour market. Large families may be easier to contact simply because of the greater likelihood that at least one family member will be at home. In addition, families with small children are more likely to be at home. In several other studies, it was found that those active in the labour market (Goyder, 1987; Campanelli, Sturgis and Purdon, 1997; Lynn et al., 2002b) and those with a higher socio-economic status (Campanelli, Sturgis and Purdon, 1997; Johnson et al., 2006) are less easy to contact. However, different overlapping characteristics are sometimes hard to keep separate. Labour market position is an example: elderly people are usually easier to contact; this may have nothing to do with age, however, but may reflect the fact that they are no longer active on the labour market. Being active in the labour market may also mean that a substantial amount of time is spent on commuting, particularly in some urban areas. In that case, labour market position and regional characteristics may become confounded. Other activities may also take persons out of their homes. Stoop (2005) found that people who regularly frequented popular and classical cultural performances (including the cinema) were also more difficult to contact even after controlling for age. People may be hard to reach because they are not at home often (because they have a paid job and a busy social life) or because they are away for prolonged periods. Several studies (Schnell, 1997, p. 236; Blohm and Diehl, 2001; Schmeets and Janssen, 2002; Feskens et al., 2007) ascribe the low contact rates of ethnic minority groups to prolonged stays in the country of origin. This would be especially relevant for elderly
120
RESPONSE ENHANCEMENT
persons from ethnic minorities who do not have work or childcare responsibilities. Another reason for low contact rates among ethnic minority groups could be that some of them may have moved back to their country of origin without having their name removed from the population register. A complicating factor is that even within a single country, the response behaviour of different minority groups will differ (Feskens et al., 2008). Deding, Fridberg and Jakobsen (2008) found that in Denmark immigrants from Pakistan were especially difficult to contact, while refusals were particularly high among people of Turkish origin. Low contact rates in urban areas (and among ethnic minority groups) can be due to practical factors. The higher proportion of ex-directory households, or households that have only a mobile phone, in large cities can lead to a low contact rate when telephone recruitment of sample units is allowed. Practical impediments such as entryphones may hamper access to the high-rise buildings that are more commonplace in large cities. There is also some evidence that it is more difficult to find, recruit and retain interviewers in large cities. In addition, interviewers in inner-city areas are less willing to visit certain neighbourhoods or to make evening calls in those neighbourhoods (Bethlehem and Schouten, 2003). In summary, research on contactability in face-to-face surveys identifies hard-toreach and potential noncontact cases as those who are less often at home, who are away for prolonged periods, are part of (small) households where often no one is at home, who live in dwellings that are difficult to access and who live in neighbourhoods where interviewers are scarce or less willing to make calls, especially in the evening. To an extent, this confirms what is obvious and intuitive. What should be kept in mind is that, whilst noncontacts remain a small proportion of final nonresponse in surveys like the ESS compared to nonresponse derived from noncooperation, they could form a specific class of nonparticipants that could underlie bias.
6.2.3
Call patterns and strategies
Call schedules are developed to maximize the possibility of making contact. Evening and weekend calls make it possible to reach people with a full-time day job; a prolonged fieldwork period makes it possible to reach people who spend long periods abroad; a large number of calls increases the possibility of finally finding busy students at home; and telephone recruitment (see Section 6.4) enables contact to be made with people who may be difficult to contact face-to-face because they live in dwellings that are difficult to access, or in neighbourhoods where interviewers feel uncomfortable walking around in the dark. Until recently, there were few studies that analysed call data from face-to-face surveys. Earlier studies of call data (Campanelli, Sturgis and Purdon, 1997; Groves and Couper, 1998, p. 101; Purdon, Campanelli and Sturgis, 1999; Stoop, 2005) identify evenings as the best time to establish contact, but limiting call attempts only to the evening would probably have other adverse effects. For example, although it would probably make the success rate of each individual call attempt higher, thus reducing the total number of calls to each address, it would probably increase travel costs, as the time available for making evening calls is limited
PREVIOUS RESEARCH ON CONTACTABILITY
121
and it would thus be less easy to combine trips to addresses in the same neighbourhoods. It would also mean that fieldwork would take longer, or alternatively that the number of interviewers would have to increase. A further problem is that interviewers may not be willing to work according to tightly controlled call schedules that limit them to evenings, preferring to vary their call strategy according to their own preferences and proven success strategies. It appears that interviewers generally appreciate being able to organize their own time schedule in making calls, to work during normal working hours and not to visit seemingly dangerous neighbourhoods during evening hours, despite the obvious success of evening calls. Groves (1989) feels that it is not feasible to strictly prescribe call patterns.3 The fact that individual preferences of interviewers have an effect on calling patterns and contact rates was shown by Lievesley (Lievesley,1983, p. 296). She found that interviewers who had another job besides their interviewing work achieved higher contact rates because: ‘Interviewers with lower availability were calling at times when respondents were more likely to be at home, such as weekends and evenings.’ The best measure of contactability is the number of calls to first contact (Groves and Couper, 1998; Lynn et al., 2002b; Stoop, 2005). Once contact has been established, many additional visits to the household may take place, but these are more likely to reflect the reluctance of the target person than the likelihood of contact. Even the number of calls to first contact is an imperfect measure of contactability as interviewers may, through local knowledge, be aware of suitable times to call on target persons and act accordingly, rather than the process being random. If interviewers develop their own calling strategies based on their knowledge of the neighbourhood, characteristics that are ascribed to ‘hard-to-reach’ respondents may be partly due to interviewer call strategies. For instance, if interviewers are wary of making evening calls in inner-city neighbourhoods because they are concerned about their personal security, inner-city sample households will appear as hard to reach. This is not a characteristic of the target respondents but, rather, occurs because they do not receive calls at times when the chances of contact are high. Conversely, if local interviewers know that in certain neighbourhoods most people are employed or otherwise engaged during the day, they may start calling during evening hours and reserve their mornings for neighbourhoods where many elderly people live, and use the afternoons to visit residential neighbourhoods comprising families with children. This strategy is likely to reduce the number of calls. However, it will make the number of calls required to reach a target person a less accurate measure of contactability. Clues from the first, unsuccessful call (comments from neighbours, children’s bicycles in the front garden, overflowing letterboxes) may affect the timing of subsequent calls. Local knowledge, information from previous calls and interviewer 3
‘Even if such estimates of conditional probabilities were available for all call numbers (e.g. probabilities of fourth call success at different times, given all possible combinations of times for the first three calls), it is unlikely that personal visit interviewers could take advantage of this knowledge. Their times of visitations are limited by concerns about the cost of travelling to a sampling area and thus the total number of sample houses that they can usefully visit at one time. Furthermore, the complexity of call procedures based on such empirical guidance would complicate their work’ (Groves, 1989, p. 99).
122
RESPONSE ENHANCEMENT
circumstances may determine when sample units are contacted. These individual differences may be based on sensible choices and be in line with the instructions given to interviewers by the researchers. In the end, however, such decisions end up confounding the timing of calls and accessibility in nonresponse modelling (Groves and Couper, 1998, p. 101). Response modellers are faced with the problem of an interaction between household characteristics and interviewer call strategies, which may bias the estimates of contact probabilities for individual calls. Groves and Couper (Groves and Couper,1998, p. 82) lament this – ‘In short, the ideal data set would have fully randomized visit times for all sample units – a practical impossibility’ – and end their treatise on the timing of calls wondering (p. 98) ‘what characteristics of sample segments or neighbourhoods are related to interviewers choosing different times of day to call’.
6.3 Previous Research on Cooperation 6.3.1
Covariates of cooperation
The contacting of target persons is simpler to analyse than cooperation because it is a process that is more ‘context-free’ than obtaining cooperation. Firstly, the target persons may not be aware of the many unsuccessful calls an interviewer has made before establishing contact, whereas they will notice when an interviewer tries to persuade them to cooperate, especially after an initial refusal. Secondly, specific characteristics of the survey, such as its cognitive burden, topic and sponsor, will hardly play a role in contacting the target persons, but may play a decisive role in the decision on whether or not to cooperate (see, e.g., de Leeuw and de Heer, 2002, p. 46). Partly because of these topical and context effects, there are no simple mechanisms that lead some groups to cooperate less and others more. Indeed, there is very little empirical evidence as to which socio-demographic and socio-economic factors are related to survey cooperation, and the evidence that does exist is usually weak or mixed. Owing to the absence of simple mechanisms and straightforward empirical evidence, most researchers now treat background variables as covariates of survey cooperation, not as causes, and try to unravel the underlying causes that are reflected by these covariates. Age, for instance, is often included in nonresponse analyses as an explanatory variable. When identifying age as a correlate of survey cooperation, it needs to be borne in mind that age can stand for different things: for younger people, having a youthful lifestyle; for older people, having less education, being less comfortable with filling in complicated forms, having a greater distrust of strangers or having a higher sense of civic duty. This section presents a short overview of the literature on survey cooperation, starting with socio-demographic characteristics and then looking at underlying sociopsychological factors. The focus will be on issues that are pertinent to the European Social Survey. Reasons for cooperation and refusal are the focus in Section 6.3.3. The section will focus on the blocks ‘households and individuals’
PREVIOUS RESEARCH ON COOPERATION
123
and ‘household–interviewer interaction’ from Figure 2.2, with some extension to the social environment as well (neighbourhood characteristics). As noted in earlier chapters, survey methodology and procedures are harmonized wherever possible in the ESS, although sample type and recruitment mode sometimes differ. The impact of these procedures and possible differences is covered in Section 6.4. Two pieces of information on the interaction between interviewer and target person are available; namely, the reasons for refusal as recorded by the interviewer and the view of the interviewer about whether the target person may or may not cooperate on a future visit. This information also plays an important role in Chapter 7, and the reasons for refusal are summarized in Section 6.6.4. 6.3.1.1 Age, sex and family composition In an inventory of the literature, Groves and Couper (1998, p. 133) report a mixed effect of age on refusals. They find support for less cooperation from the elderly, but this effect disappears when controlling for household size. The failure to find the expected effect might be due to conflicting influences; for instance, an increased fear of victimization among the elderly, and thus less willingness to let strangers into their homes, might be counteracted by a higher sense of civic duty towards government surveys. Including neighbourhood characteristics, interviewer characteristics and topic salience in modelling did not bring about the expected age effect, however. In their multivariate analyses, Groves and Couper (1998, pp. 148, 150) even found a curvilinear effect of age on cooperation rates, whereby both young and elderly households cooperate more. This they ascribe to a higher interest in social participation among the younger households, more curiosity about efforts to seek information from them and more experience with standardized information-seeking associated with schools and jobs, and stronger norms of civic duty among the elderly. It is sometimes assumed that women (especially older women) are more wary than men about allowing strangers to enter their homes. This would lead one to expect that women cooperate less in face-to-face surveys, and evidence for this was presented by Koch (1997). Men more often have jobs and may thus be more used to filling in standardized forms. This could be one explanation for why they more often participate in web surveys (Stoop, 2009). More often, however, response rates are lower among men than women (Groves and Couper, 1998; Stoop, 2005). This will be due partly to the fact that men are less often at home. Apart from leading to lower contact rates, this could also mean that interviewers less often have personal first contact with men, as it is more likely that a woman will open the door. Therefore, in a face-to-face study, interviewer persuasion skills might less often be directed towards men as direct recipients, and more towards women as general gatekeepers. A complicating factor when considering the impact of gender on survey cooperation is that the decision on whether or not to participate might be a family decision, even when one specific person is the target sample unit. It may well be that other household members object to the interview (Stoop, 2005; Bates, Dahlhamer and Singer, 2008). Another complicating factor is that the ‘doorkeeper’, the person who
124
RESPONSE ENHANCEMENT
first speaks with the interviewer and in some cases has to help with the selection of the target person, may well be the decisive factor in gaining cooperation. This doorkeeper is more likely to be a woman, but will not always be the target person. People who have small children tend to cooperate more in surveys (Groves and Couper, 1998). Larger households are easier to contact than small ones, and it may also be easier to obtain cooperation in a large household if a survey is designed so that any responsible adult can function as a household informant. Groves and Couper (1998, p. 123) mention only one exception, namely the British Expenditure Survey, that stood out because all adult members of the household had to participate rather than relying on a single informant. Single-person households usually cooperate less often in surveys. Koch (1993) found that in ALLBUS surveys with low response rates, the proportion of single-person households was lower than in surveys with higher response rates. 6.3.1.2 Education and socio-economic status Several studies have highlighted the relationship between education level, experience with filling in forms (form literacy) and willingness to participate in a survey. Brehm (1993, p. 31) suggests that interviews are seen as tests and are thus more appealing to persons with a higher level of education. Groves and Couper (1998, p. 128) hypothesize that the better educated have profited from earlier form-filling efforts and may thus be more inclined to cooperate. They point to the consistent finding from the literature that the less-educated groups more often fail to participate in surveys, but find no such results in the governmental surveys they study. They also discuss the possible benefits of cooperating in a survey (p. 122), such as the enjoyment of thinking about new topics, and the possible costs, such as the cognitive burden incurred in comprehending and answering the survey questions. As the cognitive burden of cooperating in a survey might be less for the more highly educated, this might result in a higher cooperation rate among this group (see also Tourangeau and Smith, 1996; Holbrook, Green and Krosnick, 2003, p. 82). Socio-economic status is a household characteristic that comprises a number of socio-demographic characteristics and is closely related to a number of others: education, income, occupation, employment and housing costs. Goyder (1987, pp. 83–5) concluded from both the literature and his own analyses that in the United States occupational socio-economic status and cooperation are strongly and positively correlated. Persons in the upper socio-economic strata are less accessible, but show a higher cooperation rate following contact. Goyder, Lock and McNair (1992) distinguished between individual socio-economic status and geographical aggregates. Home ownership and property status were positively correlated with response. Goyder, Warriner and Miller (2002) estimated socio-economic status by taking photographs of the dwellings of sampled households and having them valued by real estate agents. Their mail survey had a substantial status bias, as persons from high socio-economic strata were easier to contact (less mail undeliverable), responded earlier and responded better to follow-up mailings and reminders.
PREVIOUS RESEARCH ON COOPERATION
125
Johnson et al. (2006), on the other hand, found that at a neighbourhood level concentrated affluence was predictive of noncontact and refusal, but also that survey participation in general was lower in their RDD survey in areas of concentrated disadvantage. Most, but not all, evidence therefore points to more cooperation from people with a higher socio-economic status. 6.3.1.3 Urbanicity and ethnic minorities The majority of studies agree that large city-dwellers cooperate less (Couper and Groves, 1996; Groves and Couper, 1998, pp. 176–87) and consider urbanicity as the main social environmental factor affecting survey cooperation: cooperation decreases as urbanicity increases. The urbanicity effect could be a consequence of three factors. Firstly, as Groves and Couper assert, urbanicity tends to be associated with higher population density, crime rates and social disorganization, which are three highly correlated indicators of social cohesion. The lower social cohesion in urban areas might result in less trust and greater fear of letting strangers into one’s home. Lack of trust in strangers, privacy concerns and fear of government intrusion might also be indicative of social isolation and thus affect responsiveness to surveys (Brehm, 1993, pp. 52–6; Dillman, 2000, pp. 19–21). Secondly, the urbanicity effect could be due to the population composition of these areas. In many cases, the residents of inner-city neighbourhoods are poorer, younger and more often of an minority ethnic origin. Groves and Couper (1998) show that the effect of the urbanicity variables was reduced after adding controls at household level, while in a Dutch study (Jansma, van Goor and Veenstra, 2003) the impact of urbanicity disappeared after incorporating sociodemographic household level variables. A third factor could be that experienced interviewers are less easy to hire and less easy to keep in urban areas (Bethlehem and Schouten, 2003) and inner-city interviewers may therefore be less adroit at obtaining cooperation. As with age, therefore, it is important to identify what urbanicity stands for in order to be able to adapt field strategies or have a keen eye for nonresponse bias. Similar conceptual problems arise with regard to low survey participation by immigrants and ethnic minority groups. Since immigrants belong to different ethnic/country of origin groups in different countries, and the response behaviour of different immigrant groups in a single country may be different, simply grouping all immigrants/ethnic minorities together will have very limited explanatory power. In the Netherlands, response rates among recent immigrants were particularly high in the past (Centraal Bureau voor de Statistiek, 1987, 1991; Bronner, 1988). Although their response rates are now fairly low, there are some indications that the cooperation rate among immigrants is now similar to the general cooperation rate provided that, if necessary, an interviewer is sent who can conduct the interview in their native language. There is inconclusive evidence as to whether the present low response rates among immigrants are due to differences in socio-economic status or to the fact that many of them live in inner-city areas (Schmeets and Janssen, 2002; Schmeets and Michiels, 2003; Feskens et al., 2007). Blohm and Diehl (2001) report on a study of the survey participation of Turkish migrants in Germany. In this
126
RESPONSE ENHANCEMENT
survey native Turkish-speakers made interviewing in minority languages possible. They were, however, not the most experienced interviewers, and this may have had a negative impact on response rates and on survey quality in general. It was expected that gaining access to Turkish women would be difficult, especially for male interviewers. This proved not to be the case: they were easier to contact, were more ready to cooperate and male interviewers actually obtained better results than female interviewers. The main reason for the lower response rates among elderly Turkish migrants was noncontact, possibly because they spent longer periods in their country of origin, even outside the summer holidays.
6.3.2
Causes of cooperation and noncooperation
The previous section highlighted evidence for the complex relationships between survey cooperation and socio-demographic variables. These relationships are often contradictory, because age, sex or urbanicity are correlates only of survey cooperation. The underlying causes can be related to these variables in many different ways. Looking deeper beneath the surface, a number of psychological, social and behavioural characteristics can be identified that are more likely causes of cooperation (see overviews in Groves and Couper, 1998; Stoop, 2005). One underlying factor is the extent of social isolation in terms of social involvement or participation, interest in societal well-being, political interest and knowledge, electoral participation and involvement in voluntary work. There is ample evidence that social isolation results in lower survey cooperation and that being politically interested results in higher cooperation (regardless of the topic of the survey), as does participation in voluntary work (Groves and Couper, 1998; P€a€akk€ onen, 1999; Groves et al., 2004; Voogt, 2004; Abraham, Maitland and Bianchi, 2006; Abraham, Helms and Presser, 2009; Van Ingen, Stoop and Breedveld, 2009). The reason for this seems clear: survey participation is a kind of social participation or voluntary activity, and surveys used to measure contact and cooperation often cover those topics that politically interested people are interested in – or at the very least they understand the need for these kinds of data to be collected. These findings are far from reassuring: if participants are socially more active than refusers, survey results will often be biased. Another assumption is that time concerns, being busy or feeling stressed, may be an important impediment to survey cooperation, either because the interviewer asks for cooperation at an awkward moment or because the sample person says or feels that being interviewed takes too much time: ‘All other things being equal, the burden of providing the interview is larger for those who have little discretionary time. Time limitations of the household should affect both contact and cooperation’ (Groves and Couper, 1998, p. 122). This might be particularly true in interviewer surveys and less so in self-completion surveys, where the respondents can answer the questionnaire at a time that suits them. Of course, in interview-based surveys the interviewer can also call again at a more suitable time if the first call was not convenient. Contrary to expectations, there is no evidence, either in terms of actual (demands of work,
PREVIOUS RESEARCH ON COOPERATION
127
travelling, household chores etc.) or perceived time pressure, that busy people cooperate less (Goyder, 1987, p. 86; Groves and Couper, 1998, p. 122; P€a€akk€onen, 1999; V€ais€anen, 2002; Abraham, Maitland and Bianchi, 2006; Abraham, Helms and Presser, 2009; Van Ingen, Stoop and Breedveld, 2009). A third underlying factor that is sometimes related to survey cooperation is lifestyle. Being a member of a youth group (Smith, 1983), going to pop concerts (P€a€akk€ onen, 1999; Stoop, 2005), using the Internet for games and chatting (V€ais€anen, 2002) and going to popular cultural activities can all be seen as (soft) indicators of a youthful lifestyle that does not seem to sit well with survey participation. On the other hand, the extent to which people are part of ‘mainstream culture’, abiding by the law, and the presence of strong norms of civic duty seem to be positively related to survey participation (Groves and Couper, 1998, p. 33).
6.3.3
Attitudes towards surveys and reasons for refusal
Why do people participate or refuse, and is there such a thing as an underlying attitude towards survey participation? Chapter 8 will present new empirical evidence on this topic. This section summarizes some of the main findings from the literature. One simple factor behind survey participation could be that people like surveys or, alternatively, that they dislike them, considering them a waste of time and money or perhaps feeling threatened by being asked personal questions. Nonresponse researchers have addressed questions about how people feel about surveys, whether these feelings are in fact related to survey participation and why people refuse, by measuring general attitudes towards surveys, attitudes towards particular surveys and the impact of substantive survey characteristics such as topic and sponsor. They have done this by recording doorstep interactions on tape or paper forms (Campanelli, Sturgis and Purdon, 1997; Couper, 1997; Loosveldt and Storms, 2001; Bates, Dahlhamer and Singer, 2008), recording reasons for refusal, incorporating questions on surveys into surveys (Singer, Van Hoewyk and Maher, 1998; Loosveldt and Storms, 2008), conducting surveys on surveys (Goyder, 1986; Stocke and Langfeldt, 2004) and by mounting follow-up surveys among respondents whose attitude towards a survey is known from an earlier survey (Hox, de Leeuw and Vorst, 1995; Rogelberg et al., 2003). Stoop (2005) gives an overview of most these studies. A number of researchers have tried to distinguish those who refuse outspokenly – strongly or for very particular reasons – from those who simply say ‘no’ because the request came at an inconvenient time or for other transient reasons. Smith (1984, pp. 481–5) distinguished propitiousness as a situational factor, and the more permanent inclination or willingness to be interviewed. Couper (1997) related reasons for initial refusal, the statements made by respondents in the introductory conversations and their answers in the subsequent interviews. He concluded that those who show a lack of interest in the topic are less likely to grant an interview and, if they do cooperate, produce less meaningful or less complete data, and differ in their substantive responses from those who do not express a lack of interest. Those who initially refuse to cooperate because they are ‘too busy’ don’t differ systematically from willing respondents, which
128
RESPONSE ENHANCEMENT
might mean that ‘busyness’ is just a polite (and easy) way of saying ‘no’ or that being busy is not related to the topic of the survey. Couper’s results were confirmed by Campanelli, Sturgis and Purdon (1997, pp. 4.21–4). Laurie, Smith and Scott (1999) examined refusal conversion practices in a longitudinal survey using interviewerassisted self-completion. They showed how the initial reason for refusal significantly relates to conversion success. Respondents who gave a survey-related reason for refusal, such as ‘too long’, ‘too complex’ or ‘waste of time’, were less likely to be successfully converted than those who gave a respondent-related reason such as ‘too busy’. Voogt (2004, pp. 41–5) studied the difference between nonrespondents who refused because they claimed not to be interested in the survey and those who refused for other reasons. There was no relationship between reason for refusal and willingness to participate in a short telephone interview (52% of those who were not interested versus 47% who refused for other reasons). Both groups were fairly similar, except that the ‘not interested’ refusers more often indicated that politics are too complicated for them. Hox, de Leeuw and Vorst (1995) analysed the response behaviour of students who had earlier filled in a questionnaire in which their attitudes towards surveys were measured and found that neither a general attitude towards surveys, nor the stated likelihood of cooperation in a survey similar to the one that was later presented, predicted final response behaviour very well. In a study by Rogelberg et al. (2001), willingness to participate in a future survey among students depended on two attitudes towards surveys, namely survey value and survey enjoyment, the latter factor being the more important. Rogelberg et al. (2003) further distinguished between passive and active nonrespondents in a follow-up survey among students. Passive, more or less accidental nonresponse was not based on a conscious and overt a priori decision, whereas active nonresponse was the result of a conscious decision not to respond to the survey as soon as the request to participate was received. As a result, response enhancement techniques may work only for passive nonrespondents, not for active ones. Rogelberg et al. (2003) found that passive nonrespondents were very similar to respondents. Active nonresponding students, however, were less satisfied with the survey sponsor (the university), less conscientious, more likely to leave the university and less agreeable. Loosveldt and Storms (2001, 2003, 2008) asked questions about the meaningfulness of surveys and the credibility of survey results. Not surprisingly, a positive attitude towards surveys was related both to positive experiences with surveys in the past and with willingness to cooperate in the future. Their results also suggest a positive relationship between attitudes towards surveys and trust in the working of democratic institutions, as well as with the attitude towards voluntary work. Verhagen (2008) found that target persons in a Dutch face-to-face survey who had initially refused using the argument that they had no time, or had participated too many times in surveys, could be converted more easily than those who said they were not interested, or never participated in surveys. In this study, refusal conversion attempts were less frequent when no reason for refusal had been recorded. This could be because a nonverbal refusal (slamming the door) may be outspoken enough not to warrant a second attempt. It could also be that refusal conversion is expected to be
SAMPLE TYPE AND RECRUITMENT MODE IN THE ESS
129
much more difficult when no reasons for refusal have been recorded, either because the target persons did not give one (possibly to the vexation of interviewers; see Stoop, 2005) or because the interviewer did not record one. In all studies (with the exception of the study by Hox, de Leeuw and Vorst, 1995), it appears that survey noncooperation might be more or less ‘at random’ when situational factors are the reason for nonparticipation. These ‘random’ nonrespondents are cranky, busy, have minor household crises to deal with, simply do not feel like doing it and have no strong feelings on the value and enjoyment of surveys. They might well participate if the interviewer were to come back later or if they could have completed the interview at their own convenience. However, if nonparticipation is largely determined by the topic or the sponsor of the survey, nonresponse will be ‘not at random’, and cannot be ignored. These persons harbour a strong dislike of surveys and will be more difficult to convert than persons who do not cooperate for more transient reasons. Their nonparticipation may result in bias; for instance, when their attitude is related to survey topics such as trust in government. Their substantive aversion could be compensated for by external incentives (see Groves, Singer and Corning, 2000; Groves, Presser and Dipko, 2004).
6.4 Sample Type and Recruitment Mode in the ESS Following the overview of the literature in the two previous sections, we now turn to the contact and cooperation efforts in the European Social Survey and their results. However, before doing this a number of practical issues need to be discussed. Firstly, different sampling frames are used in the ESS; secondly, different recruitment (although not interview) modes are allowed (see Chapter 3). Both factors restrict the ability to make cross-national comparisons. Table 6.1 shows the sample type, the allowed recruitment mode and the availability of contact forms data in the countries of ESS 2, which is the main database for the following analyses.
6.4.1
Sampling issues
The ESS is based strictly on random sampling. The actual sampling frames used across countries vary to reflect the different types of sampling frame that are available across countries (see Chapter 3, and H€ader and Lynn, 2007). In ESS 2, there were three different sampling frames (individual, household and address; see Chapter 4) and six versions of the contact forms: two for address samples (with Kish or birthday selection4), three for household samples (with Kish or birthday selection or a combination) and one for
4
Kish selection grids provide a random mechanism for selecting a household in a multi-household dwelling, or a person in a household. The birthday selection method causes that person with the next/last birthday in a household to be selected.
130
RESPONSE ENHANCEMENT
Table 6.1 Sample type, allowed recruitment mode and availability of contact forms data in countries of ESS 2 Country
AT BE CH CZ DE DK EE FI FR GR HU IE IS IT LU NL NO PL PT SE SE SI SK TR UA UK
Sample type Austria Belgium Switzerland Czech Republic Germany Denmark Estonia Finland France Greece Hungary Ireland Iceland Italy Luxembourg Netherlands Norway Poland Portugal Spain Sweden Slovenia Slovak Republic Turkey Ukraine United Kingdom
HH IND HH HH IND IND IND IND HH HH IND AD IND AD HH AD IND IND HH IND IND IND IND AD AD AD
Telephone recruitment permitted
ü ü
ü
ü
ü
Complete contact forms data available ü ü ü ü ü ü ü ü ü ü ü ü ü ü ü ü ü ü ü ü
Countries where no, or incomplete, fieldwork data are available from the contact forms or where related problems arose will be excluded from all or part of the tables, figures and analyses in the rest of this chapter. Sample type: IND, sample of named individuals; HH, household sample; AD, address sample. Telephone recruitment: countries in which telephone recruitment was allowed from the start due to the high proportion of telephone numbers available for the whole sample (only if a sample of named individuals is used).
individual samples. In the following sections, contact refers to contact either with the target person, the household or someone else who is present at the selected address. What is clear is that contact has different layers depending on the sampling frame being used. In an individual sample, the interviewer has to ascertain whether the person answering the door is the designated respondent and, if not, when and how the target person can be reached. If the target respondent or the entire household have
SAMPLE TYPE AND RECRUITMENT MODE IN THE ESS
131
moved, they are still part of the sample under ESS rules and the survey organization/the interviewer has to follow them to their new address (unless it is certain they have moved abroad). In a household sample, the interviewer has to select a target person. This means that – except in single-person households or when the person answering the door is the target person – the interviewer has to persuade one person from the household to help select the target person and then sometimes another person to actually participate in the survey. Household samples have been identified by name in advance, so if the entire household has moved they should be followed. If just one household member has moved, he or she is no longer part of the household; this, of course, has no effect on fieldwork. In an address sample, a household or households have to be selected first when there is more than one at a particular address. This is an additional hurdle compared to household samples. Contact rates will be higher in an address sample, because the household living at a particular address is by definition the household from which the target person has to be selected and there is therefore no need to follow moving households. Cooperation rates can, however, be expected to be lower in an address sample when no personalized advance letter can be sent (just one to ‘The occupants of . . .’) (Morton-Williams, 1993) and the interviewer has no household name to mention when they attempt contact. Also, as in a household sample, the additional hurdle of selecting a person within the household has to be overcome. A final challenge ensuing from the sampling frame selected might be the presence of an advanced opt-out system linked to the population register. A regularly updated population register is in many respects the most ideal sampling frame. However, it can be the case (as it is in Denmark) that residents are allowed to indicate that their register entry may not be used for survey sampling. If the percentage of the population in the opt-out register is high, this will result in a large refusal rate, and another sampling frame then has to be used. When comparing contact and cooperation rates, it is important to bear in mind that these may differ partly because of the different sampling frames being used.
6.4.2
Recruitment mode
It is often assumed (Blohm, Hox and Koch, 2007) that refusing over the phone is easier than face-to-face. In face-to-face recruitment, interviewers can tailor their approach to the characteristics of the dwelling and to the person who opens the door. Nonetheless, in a number of ESS countries it was feared that face-to-face recruitment would be counterproductive, as it was highly unusual for interviewers, or visitors in general, to make a personal visit without a previous telephone call to arrange an appointment. Therefore, as mentioned in Chapter 3, in a number of countries telephone recruitment (though never telephone interviewing) is allowed, although only under strict conditions. This can be done where an individual sampling frame is used, and where for the vast majority of sampling units a telephone number is available. In these countries, the number of calls to previously noncontacted sample persons can be very high, as telephone calls are much easier to make than home visits. In countries where this was allowed (see Table 6.1), it was also much easier to make evening and weekend
132 Table 6.2
RESPONSE ENHANCEMENT Telephone recruitment at first and all calls (call record data) ESS 2a First calls
Country AT BE CH CZ DE DK EE ES Fl FR GR HU IE LU NL NOb PL PT SE SK SI a b
All calls N
% by phone Austria Belgium Switzerland Czech Republic Germany Denmark Estonia Spain Finland France Greece Hungary Ireland Luxembourg Netherlands Norway Poland Portugal Sweden Slovakia Slovenia
54.5 13.1 0.1 2.9 22.0 36.2 5.2 12.5 90.3 0.0 0.1 4.3 0.7 47.2 1.8 84.0 0.0 0.0 95.1 8.6 11.7
3 3 4 4 5 2 2 3 2 4 3 2 3 3 3 2 2 3 3 2 2
672 018 863 335 738 420 864 213 873 400 056 462 676 497 006 659 393 094 000 467 190
% by phone
N
41.4 21.8 36.7 2.9 25.5 41.2 14.7 7.5 74.8 0.0 1.3 7.3 5.2 41.2 20.4 65.1 11.3 1.2 80.4 15.2 40.3
9 811 8 865 21 862 8 292 1 555 6 513 7 627 8 286 11 561 11 52 6 539 4 827 8 57 7 522 12 487 7 298 4 815 8 042 12 726 4 705 5 244
Source: call record data (only countries with reliable call record data on mode of contact). Country not used in analysis of contact information about number of calls.
(telephone) calls,5 as it is much cheaper and easier to telephone someone than to visit them in person, especially at times inconvenient for the interviewer. In all other countries, telephone calls were allowed only after four unsuccessful personal visits. Table 6.2 presents an overview of the percentage of telephone calls per country in ESS 2, both at the first and at all calls. The table shows great differences in recruitment modes across countries, and highlights the large amount of telephone recruitment in the Nordic countries. It also shows that in some countries where personal visits were required (AT and LU), many first calls were made by telephone, and that 5
The large number of telephone calls that can be made resulted in practical problems in Norway. A telephone attempt that did not result in a contact was not counted as a ‘contact attempt’ in ESS 2 (apparently, in ESS 1 this was different) and not entered on the contact form; an attempt was defined as a telephone call when somebody answered, or an attempt on the doorstep. In practice, therefore, there were often many unregistered calls before the first contact. It is possible that this also happened to a certain extent in other countries. For this reason, we will not report the contact results for Norway in the tables and figures, except in Table 6.2
ESTABLISHING CONTACT IN THE ESS
133
telephone contact is used extensively for later calls in other countries (Switzerland, the Netherlands and Slovenia). These results make clear that differences in contactability between countries can be due in part to the mode of calls.
6.5 Establishing Contact in the ESS 6.5.1
Introduction
The ESS Specification for Participating Countries (see Chapter 3) provides a target noncontact rate of 3% or less as well as clear guidelines on how to minimize noncontacts. The guidelines are based on the results of previous studies on contactability, as referred to in Section 6.2. The interviewer is expected to make at least four personal, face-to-face calls, ideally preceded by an advance letter to each sampling unit, before it is abandoned as nonproductive (‘noncontact’). These calls should be spread over different times of the day and different days of the week. At least one of these calls should be in the evening and at least one at the weekend (see Section 3.4.2), and they should be spread over at least two different weeks. This is to allow difficultto-contact people to be located, and to minimize noncontacts due to holidays or short absences. In addition, the fieldwork period has to be at least 30 days. As noted in Section 6.4, in some countries telephone recruitment is allowed right from the first call, whilst in others only after four unsuccessful personal attempts. All interviewers must be personally briefed on the survey to make them aware of the importance of all aspects of the administration of the survey. This includes the process of making contact with potential respondents, the selection of target persons at sample addresses and within sample households (see the previous section), as well as methods to obtain high contact and response rates. A final requirement is that fieldwork should be closely monitored, including producing fortnightly reports on response for the CCT. This is to ensure that problems with fieldwork can be identified early and addressed as soon as possible. The following sections give an overview of the noncontact rates that were achieved in the first rounds of the ESS. It shows how many calls were needed to establish contact and how many calls were made to sample units that were never contacted, and presents information on the timing of calls in the participating countries and their success rates (i.e. whether or not they resulted in a contact), depending on their timing.
6.5.2
Noncontact rates
Table 6.3 presents an overview of noncontact rates in the first three rounds of the ESS. The left-hand part of the table shows the countries with a noncontact rate of less than 5% in every round in which they participated. These countries came close to the target maximum noncontact rate, and many countries in this group managed to improve over time. Poland (PL) is clearly most successful at minimizing noncontacts, with a rate of just 1.3% in ESS 3. Norway (NO) also presents a good example, as it managed to
134
RESPONSE ENHANCEMENT
Table 6.3
Noncontact rates over time (%)
Country
ESS 1
Close to target PL 0.8 NO 3.0 FI 1.4 GR 1.7 SE 4.1 CH 3.8 CY NL 2.6 BG IT 2.7 PT 3.1 BE 4.5 IL 3.0 DK 3.8 SK IS
ESS 2
ESS 3
0.9 1.7 2.1 3.6 2.4 2.1
1.3 0.8 2.7
2.7
2.0 2.2 2.2 2.6 2.7
2.7 3.5
3.8 2.9
4.9 4.4 4.6
3.3 3.9
Country High DE RU UA FR LU AT IE RO CZ TR Mixed SI HU ES EE UK
ESS 1
ESS 2
ESS 3
5.7
7.0
5.0 5.0
14.7 6.7 10.1 8.1
6.3 8.6 7.1 6.9 10.6
a
6.6 a a
10.0 11.6
10.9 13.5
2.3 3.1 7.6
10.2 5.7 7.1 3.4 7.9
3.5
2.9 2.9 3.3 13.1 7.2
a
ESS 3 results from Austria (AT), Ireland (IE), Latvia (LV) and Ukraine (UA) came too late to include them in this chapter.
reduce the noncontact rate from 3.0% in ESS 1 to just 0.8% in ESS 3. The low noncontact rate of Norway is partly a result of the fact that nearly all first contacts are realized by telephone, often after many attempts (see Section 6.4). Other countries also showed progress (Sweden/SE, Switzerland/CH, Belgium/BE and the Slovak Republic/SK); but in yet other countries, initially very low noncontact rates actually increased somewhat (Finland/FI, Greece/GR and Portugal/PT). The second block comprises countries with a noncontact rate that is higher than 5% and which thus exceeds the 3% ESS noncontact rate target by a considerable margin. The results from France (FR) are reassuring, as there is a significant reduction in noncontacts between rounds. Ireland (IE), on the other hand, had serious problems with fieldwork in ESS 26 and saw its already high noncontact rate increase. The Czech Republic7 (CZ) has almost the highest noncontact rate in ESS 2 (10.9%), surpassed only by Turkey (TR). Mixed results come from countries in the bottom right of the table. The Slovenian (SI) results require a specific explanation. Slovenia achieved a response rate of 70% in ESS 2 and, for budgetary reasons, stopped when a 70% response had been attained and therefore did not make additional efforts to contact the
6
Ireland also had major problems finalizing the fieldwork in ESS 3. The results came in too late to include them here, but the contact rates were even lower. 7 See also footnote 7 in chapter 5.
ESTABLISHING CONTACT IN THE ESS
135
TR CZ IE SI FR UK ES LU DE AT UA HU DK IS SK GR BE EE PT NL SE CH FI NO PL 0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
% noncontact
Figure 6.1
Noncontact rates in ESS 2 (deviations from target noncontact rate of 3%)
remaining noncontacted target persons. In Spain (ES), the situation improved across rounds, whilst in the United Kingdom it got worse. Table 6.3 shows that on the whole noncontact rates are fairly stable across rounds, although there were some notable exceptions. The remainder of this chapter will focus on ESS 2 noncontact rates. As additional visual background, the noncontact rates in ESS 2 are presented in Figure 6.1. This figure shows once again that in a minority of countries the target noncontact rate of 3% or less has been achieved, that in a small majority noncontact rates of below 5% have been achieved and that the spread is fairly large. It should be noted (see also Section 5.4) that countries with relatively good response rates do not necessarily have low noncontact rates (e.g. Slovenia), or vice versa (e.g. Switzerland).
6.5.3
Ease of contact and number of calls
Figure 6.2 gives an overview of the ease or difficulty of contact with all sample units that were actually contacted. It presents the cumulative percentage of contacts made at the first 10 calls. The figure once again shows great diversity. In the Slovak Republic, at the top of the figure, almost 90% of those who were finally contacted required only one call to establish contact, whereas in Portugal the figure was only 45%. In all countries, at
136
RESPONSE ENHANCEMENT 100
90 SK
% contacted target persons
PL SI LU
80
70
60
CZ
HU ES IE FI DE GR DK BE EE AT SE NL
50
FR CH PT
40
1
2
3
4
5
6
7
8
9
10
Figure 6.2 The number of calls to reach all contacted target persons in ESS 2 (cumulative percentage). NO, UK and UA are excluded because of incomplete data
least 90% (97% on average) of the final contacts were reached after the required four calls, suggesting that as a universal rule four contact attempts is a useful target. The figure shows results for the first 10 calls only. A few countries (Switzerland, Germany, Estonia and Spain) provided space on their contact forms to record more than 10 contact attempts. In the end, however, only 1% (at most) of all contacted target persons received more than 10 contact attempts. In Germany up to 30 calls were made to some of the final contacts, and in Switzerland no fewer than 45 attempts were made to reach the target persons who were most difficult to contact. Figure 6.3 shows whether or not the required number of calls were made to sample units that were ultimately coded as noncontacts. Countries with a small number of noncontacts in ESS 2 (Poland, for instance, had only 18 noncontacted cases) are excluded. The figure illustrates that the required minimal number of four contacts to all noncontacted sample units before abandoning them has not been applied everywhere. Contact rates in these countries in ESS 2 could have been higher, although to a lesser extent in France and the United Kingdom, where in the vast majority of cases more than four calls were made to noncontacted units in any case.
ESTABLISHING CONTACT IN THE ESS
137
CZ LU DE ES IE AT UA UK FR 0
10 <4 calls
20 4 calls
30
40 >4 calls
50
60
70
80
90
100
% final noncontacts
Figure 6.3 The number of calls to final noncontacts in ESS 2. Only countries where the final number of noncontacted sample units is larger than 100 are included. Slovenia is excluded because a fair number of sampling units were not approached at all The more calls that are made, the higher the contact rate will be, assuming of course that there is someone at home at the address, and ignoring the fact that additional calls may result in additional ineligibles; for instance, when a neighbour tells the interviewer that the dwelling is empty. Increasing the (required) number of calls seems fairly pointless in countries where the noncontact rate is already negligible after a small number of calls. In addition, increasing the (required) number of calls will not be effective if they do not result in a contact. If the number of calls to noncontacted target persons is high, this reflects high levels of interviewer effort and little success in establishing contact. Figure 6.4 shows the relationship between the noncontact rate and the number of calls to finally noncontacted units (based on Billiet and Pleysier, 2007). At the bottom of the figure are countries with a high contact rate and thus a small (sometimes very small) number of noncontacts. Two extremes are Poland (PL) and Switzerland (CH). The few noncontacts in Poland received few calls, whilst those in Switzerland received many calls (though presumably these were mostly telephone calls). In the other countries where the noncontact rate was around the target, the number of calls to noncontacted units ranged from being close to the target of four in Estonia (EE) to more than six in Greece (GR). A third group of countries achieved a noncontact rate substantially higher than the target rate of 3% (almost 5% in Denmark/DK, compared to more than 9% in Ireland/IE). Surprisingly, in these countries (with the exception of France/FR and Austria/AT), the average number of calls to these units was lower, or much lower, than required. This suggests that at least in these countries more efforts directed towards contacting target persons would have resulted in a higher contact rate, and possibly in a higher response rate, too. Billiet and Pleysier (2007) computed the difference scores in mean numbers of calls and in response rates for all countries with call record data in ESS 1 and ESS 2. The correlation between differences in response rates and differences in mean numbers of calls is 0.46. Simple regression showed that 21% of the variation in differences in
138
RESPONSE ENHANCEMENT 10 IE 9
FR
8
% noncontact rate
ES
DE
7
LU
AT
SK
6
HU DK
5 4
GR
EE BE
3
NL SE
PT CH
FI 2 PL
1 0 0
2
4
6
8
10
12
14
Average number of contact attempts
Figure 6.4 The average number of contact attempts to noncontacts versus the achieved noncontact rate in ESS 2. Only countries for which reliable call record data are available are included; Norway has been dropped because of nondocumented telephone contact response between ESS 1 and ESS 2 is explained by the differences in mean numbers of calls between rounds. This indicates that the investment in contacting potential respondents has a substantial effect on the response rates, but it is certainly not the only explanation for the change in response rates.
6.5.4
Timing of calls
According to the ESS Specification for Participating Countries, at least four calls should be made to each sample unit, of which one should be in the evening and one at the weekend. Only if no contact has been made after this should a sample unit be finally classified asa noncontact. This section will outline the extent towhich these requirements were met. For this purpose, all calls to previously noncontacted target persons8 were 8
At the first call, all sample units are classified as previously noncontacted target persons. At the second call, sample units with whom contact has been made at the first call are excluded and only those sample units with whom no contact at all has been made remain. The number of sample units in the overviews in this section thus decreases as more calls are made.
ESTABLISHING CONTACT IN THE ESS
139
categorized as ‘weekday morning or afternoon calls’, ‘weekday evening calls’ or ‘weekend calls’.9 It is assumed that in countries where interviewers made many evening and weekend calls, fewer calls would be needed to make first contact with households. The results for the first four calls are presented in Figures 6.5 (a) to 6.5(d). (a) EE CH SI FI DE BE SK CZ PT ES PL LU SE DK NL HU AT GR IE FR 0
10
20
Morning/afternoon
30
40
50
60
70
80
90
100
% Weekend
Evening
(b) EE CH SI FI DE BE SK CZ PT ES PL LU SE DK NL HU AT GR IE FR 0
10
20
Morning/afternoon
30
40
Evening
50
60
70
80
90
100
% Weekend
Figure 6.5 The percentage of morning/afternoon, evening and weekend calls at the first call (a), and at the second (b), third (c) and fourth (d) calls to as-yet noncontacted units in ESS 2 9
The distinction between ‘afternoon’ and ‘evening’ is somewhat arbitrary. In our analyses, we fixed the boundary between ‘afternoon’ and ‘evening’ at 6.00 p.m.; while this is perhaps common sense in most northern and western European countries, it makes less sense in Mediterranean countries. This is not just a conceptual issue, since lifestyle and socio-economic patterns (working hours, shop opening hours) are based upon these differing perceptions of time.
140
RESPONSE ENHANCEMENT (c) EE CH SI FI DE BE SK CZ PT ES PL LU SE DK NL HU AT GR IE FR 0
10
20
Morning/afternoon
30
40
Evening
50
60
70
80
90
100
60
70
80
90
100
% Weekend
(d) EE CH SI FI DE BE SK CZ PT ES PL LU SE DK NL HU AT GR IE FR 0
10
20
30
40
50
% Morning/afternoon
Evening
Figure 6.5
Weekend
(Continued)
Figure 6.5 presents a striking picture of the first four calls to ‘as yet’ noncontacted sample units. Overall, morning/afternoon calls prevail. In Sweden and Finland, countries where first calls were mostly telephone calls, many calls were made during the evening, although even here more than 50% were made during the morning/ afternoon. In fact, the countries with the highest proportion of evening attempts at the first call extensively used the telephone. In subsequent calls, interviewers more often tried to reach sample persons in the evening or during weekends, although even at the fourth call more than 40%
ESTABLISHING CONTACT IN THE ESS
141
of attempts were still made during the morning/afternoon in half the countries. The figure also shows that in some countries, interviewers consistently stick to their preferred pattern. In Slovenia, with the exception of the first call, interviewers tend to call predominantly in the evening; in this country, the interviewers were mainly students. In France, interviewers mainly start during the day, but then rapidly move to other time slots when they have not been able to find someone at home. Figure 6.6 gives some idea of whether the specifications for the timing of calls were adhered to. It shows whether noncontacted units received none, one (as (a) ES DE CZ LU AT IE FR 0
10
20
30
40
50
60
70
80
90
100
70
80
90
100
% evening calls (b) DE ES IE LU AT FR CZ 0
10
0
20
1
>1
30
40
50
60
% weekend calls
Figure 6.6 The number of calls during evenings (a) and at weekends (b) to final noncontacts in ESS 2. Only countries where the final number of noncontacted sample units is larger than 100 are included. Slovenia is excluded because a fair number of sampling units were not approached; UA and UK are excluded because of incomplete call record data
142
RESPONSE ENHANCEMENT
specified) or more than one evening or weekend call. As in Figure 6.3, only those countries are included that had a fair number of noncontacted cases at the end of the fieldwork (at least 100). In France, a substantial number of sample units were not contacted despite many evening and weekend calls. In Germany (DE) and Spain (ES), on the other hand, more evening and weekend calls could have resulted in a higher contact rate. Table 6.4 presents the contact rates for the first three calls depending on the timing (morning/afternoon, evening and weekend calls). At the first call, in many countries either evening (AT, BE, DE, DK, FR, GR, HU, IE, LU, NL, PT, SE and SK) or weekend calls (DK, EE, ES, GR, HU, PL and ES) more often result in contact than daytime calls. There are only a few countries where both evening and weekend calls perform substantially (more than 3 percentage points) better than morning/afternoon calls (DK, GR, HU and SE) at the first attempt. However, in only one country (CH) do morning/afternoon calls are fairly successful, and even then only along the same lines as weekend calls. It is also important to realize that even if evening and weekend calls do not have higher contact rates than morning and afternoon calls, they are likely to bring in people who might otherwise not have been reached, such as those who are employed full-time. The situation changes at the second and third call to as yet noncontacted units. Contact rates are lower at subsequent call attempts, and in some cases calls in the morning/afternoon actually give the best result. Of course, it has to be acknowledged that the number of target persons at the second and third call is smaller, because many have been reached at the first or second call (see Figure 6.2), and also that at later calls interviewers may use information from earlier calls in order to select a good time to call. They will also do this when an appointment has been arranged, but this requires a prior contact. The results suggest that some countries that had low contact rates in ESS 2 would have had higher rates had they followed the rules and called at the prescribed time, but this is not uniformly the case for all countries.
6.6 Obtaining Cooperation in the ESS 6.6.1
Introduction
The previous section focused on contact attempts, or calls. From this point of view, a successful call is when the interviewer establishes contact with the target person or a household member. The target person can still refuse to cooperate, however, or not be able to be interviewed. This section focuses on cooperation, or on the outcomes of the contacts. Of course, nothing can be said about the willingness to cooperate of target persons who have not been contacted. In order to enhance cooperation rates, the ESS has provided a series of recommendations and guidelines to minimize refusal (see Section 3.4.2). These
Contact rate (%) by time of day, previously noncontacted sample units, ESS 2a Call 1
Call 2
Call 3
Country
ma
ev
wk
dif
pref
ma
ev
wk
dif
pref
ma
ev
wk
dif
pref
AT BE CZ CH DE DK EE ES FI FR GR HU IE LU NL PL PT SE SI SK
57 61 69 49 62 60 61 66 67 46 60 69 63 75 51 82 40 54 78 84
68 70 68 48 69 69 59 67 70 51 72 75 80 82 67 84 57 66 80 88
59 63 72 49 64 68 67 72 60 46 69 76 60 75 53 90 43 61 78 84
11 9 4 1 7 9 8 6 10 5 12 7 20 7 16 8 17 12 2 4
ev ev wk ma/we ev ev wk wk ev ev ev wk ev ev ev wk ev ev ev ev
46 52 39 54 59 53 59 51 54 43 50 52 37 45 43 63 32 44 57 63
46 58 38 53 54 53 49 59 53 35 58 54 53 52 56 65 40 53 50 62
52 50 39 52 50 42 59 50 38 42 56 49 48 32 46 60 40 73 58 68
6 8 1 2 9 11 10 9 16 8 8 5 16 20 13 5 8 29 8 6
wk ev ma/we ma ma ma/ev ma/we ev ma ma ev ev ev ev ev ev ev/wk wk wk wk
38 46 27 41 50 41 58 43 41 44 42 54 37 33 36 64 46 34 56 40
46 50 17 43 50 44 52 48 42 34 45 55 48 30 48 61 47 45 39 47
45 46 22 44 55 43 63 42 33 37 57 57 34 25 34 58 37 50 33 46
8 4 10 3 5 3 11 6 9 10 15 3 14 8 14 6 10 16 23 7
ev ev ma wk wk ev wk ev ev ma wk wk ev ma ev ma ev wk ma ev
a
OBTAINING COOPERATION IN THE ESS
Table 6.4
ma, Morning/afternoon; ev, evening; wk, weekend; dif, maximum difference in contact rate between time slots; pref, time slot with highest probability of contact.
143
144
RESPONSE ENHANCEMENT
recommendations relate to interviewer training, sending advance letters, using incentives and converting people who initially refuse. Fieldwork also has to be closely monitored, including producing fortnightly reports on response. This is to ensure that problems with fieldwork can be identified early and addressed where possible. The following sections give an overview of cooperation rates that were achieved in the first rounds of the ESS, show how many contacts were required to secure a final interview and give an overview of the reasons for refusal that were recorded. More information on refusal conversion will be given in Chapter 7.
6.6.2
Cooperation rates
Table 6.5 shows the cooperation rate in the first three rounds of the ESS; Figure 6.7 highlights the results from the second round. Here, the cooperation rate is defined as CR ¼ I/(ES NC NA), where CR is the cooperation rate, I is the number of interviews, ES is the eligible sample, NC is the number of noncontacts and NA is not able/other. The last category is subtracted from ES to make countries more comparable and to focus on refusal as a cause of noncooperation.10 The average cooperation rate is around 70%, but there is wide variation across countries. The high cooperation rate in the new EU countries is striking. In some countries cooperation rates vary over time. There is a drop in Estonia, Finland, the Netherlands, Hungary and Ukraine (in the latter two countries, the ‘not able/other’ rate decreases sharply, accompanied by an increase in the refusal rate; see Appendix 6.1) and Denmark (related to the increase in the number of target persons who opted out from the population register – see Sections 5.3.2 and 6.4). In other countries (Czech Republic, Portugal, Slovak Republic, Spain and Switzerland) cooperation rates increase over time. It cannot be concluded from this overview that target persons in Cyprus and Estonia are much more cooperative than those in France or Switzerland, because what is shown here are the final cooperation rates and not, for instance, the number of contacts required to obtain cooperation: that will be presented in the next section.
6.6.3
Cooperation and number of contacts
Whether or not people cooperate is down to a combination of personal characteristics, survey characteristics, situational factors and the interaction between interviewers,
10
This is the AAPOR cooperation rate COOP3. The cooperation rates will of course change when the AAPOR formula 1 is used: COOP 1 ¼ I / (ES-NC). The overall pattern (high cooperation rates in new EU countries, low cooperation rates in Switzerland, France and Italy) remains the same. See American Association for Public Opinion Research (2008).
OBTAINING COOPERATION IN THE ESS Table 6.5
Cooperation rates in the first three rounds of ESS (%)
Country AT BE BG CH CY CZ DE DK EE ES FI FR GR HU IE IL IS IT LU LV NL NO PL PT RO RU SE SI SK TR UA UK
145
Austria Belgium Bulgaria Switzerland Cyprus Czech Republic Germany Denmark Estonia Spain Finland France Greece Hungary Ireland Israel Iceland Italy Luxembourg Latvia Netherlands Norway Poland Portugal Romania Russia Sweden Slovenia Slovak Republic Turkey Ukraine United Kingdom
ESS 1
ESS 2
ESS 3
69.2 69.9
67.7 69.9
38.9
52.5
72.3 71.7 71.3 55.1 94.3
68.3 64.8 74.2
83.3 60.9 72.3 87.5 68.6 75.9 52.3 82.8 81.5 74.3
60.3 77.8 52.8 82.5 82.2 73.9 76.9
67.5 57.3 77.8 75.3 73.6 53.1 71.4 77.0
56.7 48.7 53.9
58.9
72.2 72.3 78.7 71.8
68.8 71.9 79.5 79.7
76.9 82.5
74.9 82.0 73.4 67.8 80.6 60.0
64.3
79.7 64.2 71.4 81.1 77.7 80.3 74.5 74.0 80.4 83.3 71.6 66.2
those opening the door at target addresses and of course the target person themselves. We will focus here on the number of contacts as an indicator of interviewer efforts to obtain cooperation. In the simplest case, the target person cooperates as soon as they receive the request to do so from the interviewer. Slightly more effort is required when an appointment is made at the first contact and the interview is conducted at a
146
RESPONSE ENHANCEMENT 100 90 80 70 60
%
50 40 30 20 10 0 EE CZ GR SI HU UA PT PL FI SE IE SK DK NO BE NL ES TR AT DE UK LU IS CH FR
Figure 6.7
Cooperation rates in ESS 2
second contact. Additional efforts may turn a first (soft) refusal into a final participant. Figure 6.8(a) shows the number of contacts between the interviewer and the address/household of the target respondent, including the contact when an interview was actually obtained. Figure 6.8(b) shows the number of contacts with all sample units where there was ultimately a final refusal from the target respondent or other household members (proxy refusals). Figure 6.8(a) reflects the effort required to achieve an interview and does not demonstrate ease of cooperation, since it only includes final respondents (refusals are excluded). When interpreting this figure it needs to be borne in mind that in Norway, Finland and Sweden most first calls were made by telephone (see Table 6.2). This means that the outcome of the first contact is at best an appointment, because the interview can never be conducted over the telephone. Telephone recruiting was also allowed in Denmark (see Table 6.1), but the percentage of first calls that were made by telephone was only 36% (Table 6.2). In Austria and Luxembourg, first contacts were supposed to have been face-to-face, but around 50% were actually made by telephone. Here, too, one expects (and finds) a large number of second contacts to conduct the interview. If we exclude AT, DK, LU, NO, FI and SE, countries in which around half or more of the first calls were by telephone, the Netherlands (NL) stands out as a country where the vast majority of respondents or their households had to be contacted at least twice before an interview was obtained. Note that in the Netherlands later calls could have been by telephone. Other countries that required high levels of effort are Switzerland (CH), Belgium (BE) and Germany (DE). On the other hand, in countries such as Greece (GR), Ukraine (UA) and Portugal (PT), there was rarely more than one contact with the interviewer.
OBTAINING COOPERATION IN THE ESS
147
(a) 100 90 80 70 60 % 50 40 30 20 10 0 GR UA PT CZ IT PL SK ES EE FR DE DK AT BE CH LU NL NO FI SE 1
2
3
4
5 or more
(b) 100 90 80 70 60 % 50 40 30 20 10 0 CZ 1
PT LU 2
IT DK FR DE 3
4
PL AT
BE NO ES EE SE
FI
NL CH GR
5 or more
Figure 6.8 (a) The number of contacts in all cases where an interview was obtained (including contacts with other household members) in ESS 2. (b) The number of contacts for all sample units where the final outcome was a refusal from the target respondent or other household members in ESS 2 Figure 6.8(b) gives a rather different view of efforts to gain cooperation.11 It depicts the number of contacts with those target persons who ultimately refused (including those with household members who may have refused on their behalf or before respondent selection). In a number of countries, around 80% of these were 11 The Slovak Republic (SK) and the Ukraine (UA) are excluded here because the contact forms for refusals are less complete.
148
RESPONSE ENHANCEMENT
only contacted once (Czech Republic, Portugal, Luxembourg and Italy), whereas in other countries a large number of contacts took place without this resulting in an interview. This may, of course, reflect a variety of different pathways. For instance, a low number of contacts might reflect successful contact at the first attempt, which then results in a high number of refusals but few if any refusal conversion attempts. On the other hand, a high number of contacts may reflect the need to make contact at the household or address level before being able to approach the target respondent directly, or it might reflect a large number of refusal conversion attempts. We will discuss this in more detail below, giving examples from different countries to illustrate this. Why does the number of contacts with respondents and refusals differ so much across countries? Figure 6.9 can help to explain this. It compares the percentage of ‘one contact only with respondents’ (which could mean the ‘easy’ countries) with the percentage ‘of one contact only with refusals’ (which could mean the ‘low effort’ countries). It should be noted that each extra contact can turn a refusal into a respondent: refusals with one contact only might have been converted into respondents with two contacts had they been revisited. Contacts with respondents and contacts with refusals are therefore not independent variables. We saw earlier 100 90
One contact only with refusals (%)
CZ IT
LU
80
PT
DK
70
FR DE AT
60
BE
NO
50
PL ES EE
40 SE FI
30
NL CH
20 GR
10 0 0
20
40
60
80
100
One contact only with respondents (%)
Figure 6.9 ESS 2
The percentage of ‘one contact only’ respondents and refusals in
OBTAINING COOPERATION IN THE ESS
149
Table 6.6 The relationship between contact success, cooperation and refusal conversion efforts Group Telephone contacts
‘Difficult’ and ‘high effort’ ‘Easy’ and ‘high effort’ ‘Easy’ and ‘low effort’ Intermediate
Description
Countries
Most first calls made by telephone so a second call almost always needed to conduct an interview; large differences between countries in the percentage of refusals with whom only one contact was made Cooperation at first contact highly unlikely; high proportion of refusal conversion attempts Cooperation usually at first contact, and high proportion of refusal conversion attempts Cooperation usually at first contact, but low proportion of refusal conversion attempts Around half of interviews occurred at first contact, but around half of refusals never re-approached
SE, FI, NO, AT, LU, DK
NL, CH
GR
CZ, IT, PT
BE, DE, FR, PL, ES, EE
(Section 5.2.3) that there are substantial differences in refusal rates across countries. These differences could thus partly explain the pattern in Figure 6.9. Here, we see five groups of countries that are summarized in Table 6.6. The first group (SE, FI, NO, AT, LU and DK) comprises those countries that made most of their first calls by telephone.12 As mentioned above, this will minimize the percentage of ‘one call only’ respondents, because a telephone contact can at best result in an appointment, followed by an interview at a second or subsequent contact. There are, however, wide differences within this group in the percentage of refusals with whom only one contact was made. In Sweden and Finland, most of the refusals were contacted more than once, whilst in Luxembourg and Denmark around 80% of refusals at the first contact were not recontacted. Norway and Austria lie somewhere in between. The second group comprises the Netherlands and Switzerland. In these countries it was highly unlikely that a respondent could be interviewed at the first contact, and
12
Note that in cases where no telephone number was available, the first contact obviously had to be face-to-face.
150
RESPONSE ENHANCEMENT
highly unlikely that there was only one contact with refusals, reflecting the intensive refusal conversion programmes implemented in both countries. These countries could be called ‘difficult’ and ‘high effort’. Greece is the only country that can be described as ‘easy’ and ‘high effort’. The cooperation rate in Greece was very high, the majority of interviews took place at first contact, and refusals in Greece are usually reapproached. The next group, ‘easy, low effort’, comprises the Czech Republic, Italy and Portugal. Here, too, the majority of interviews occur at the first contact. Refusals, however, are rarely revisited. The remaining group is a kind of intermediate cluster comprising Belgium, Germany, France, Poland, Spain and Estonia. Here, approximately half the interviews took place at the first contact, while just over half the refusals were not re-approached. It is important to bear in mind that the terms ‘difficult’ and ‘effort’ should not be taken too literally. We do not know how hard the interviewers tried at each contact, nor how experienced and well-trained interviewers were. Furthermore, we do not present information on other efforts, such as the use of incentives or advance letters.
6.6.4
Reasons for refusal
This section focuses on the reasons given for saying ‘no’ to a request by an interviewer to participate in the ESS. As explained in Section 6.3.3, there is empirical evidence that people who refuse to participate for situational reasons are much easier to convert at a subsequent contact than people who object for survey-related reasons. The reasons for refusal are recorded on the contact forms in each ESS round, but we will focus here on the ESS 2 data that have also been used in most other sections of this chapter. Interviewers coded reasons for refusal according to a pre-specified list. Note that there was no script for interviewers to explicitly ask for a reason for refusal and in some cases interviewers may therefore have inferred the reasons from their interactions with the respondent. Those who refused at the first contact in ESS 2 gave a total of more than 20 000 reasons for refusal. Some of these refusals were later interviewed following refusal conversion attempts (see Chapter 7). The interviewers were able to record up to five different reasons for refusal at each contact and to do this for up to three repeat refusals. At first contact, interviewers recorded one or more reasons for refusal for 92.7% of all initial refusers. For most refusers only one reason was noted (71.5%). In a few countries only one reason was ever recorded, suggesting a different approach to refusal recording. Reasons for refusal at second and subsequent contacts are not presented here. Table 6.7 gives an overview of reasons for refusal across countries. Five main categories of refusal are given: no interest, bad timing, never do surveys, privacy concerns and a negative attitude towards surveys. The first four categories do not necessarily represent a negative attitude towards surveys and it is possible that some, particularly those who mention ‘no interest’ or ‘bad timing’ as a reason, may be willing to participate on a future visit. This is less likely in the case of an explicitly
OBTAINING COOPERATION IN THE ESS Table 6.7
151
Reasons for refusal at first contact in ESS 2 by country (%)
Country
No Bad Never do Privacy Negative Total interest timing surveys attitude
AT BE CH CZ DE EE ES FI FR GR HU IE LU NL NO PL PT SE SI SK UA UK
54 41 42 46 45 58 45 46 50 56 40 50 60 45 62 43 56 52 35 47 25 43
13 25 27 7 20 37 13 21 7 6 29 14 12 9 15 6 12 7 28 15 8 25
10 10 13 10 27 2 10 12 10 13 8 15 7 10 11 8 4 12 7 7 13 11
7 6 8 17 5 2 7 5 2 11 10 9 3 19 4 12 6 8 8 12 18 10
16 19 11 20 2 2 24 16 31 14 13 12 19 16 9 30 22 21 22 19 36 10
Mean N R
47
16
10
9
17
100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100
Mean Number number of initial of reasons refusals, reported, N N 1.5 1.9 2.5 1.3 1.2 1.4 1.6 1.5 1.0 2.7 1.0 1.2 1.1 2.6 1.4 2.1 1.3 1.3 1.0 1.7 1.7 2.2
1 078 854 2 164 611 2 334 485 841 700 1 082 527 381 795 1 137 1 375 763 503 553 637 416 652 867 1 315
20 070 0.233
0.089
0.452
0.262 0.047
More than one reason for refusal could be recorded at each contact in most countries. Denmark is not included because the recording of reasons for refusal is not reliable. R: correlation between particular reasons for refusal and refusal rates at country level. Negative attitude comprises a number of different reasons for refusal (see text).
negative attitude towards surveys: this aggregate variable is derived from four specific reasons for refusal that could be recorded by the interviewer: ‘waste of time’, ‘waste of money’, ‘I don’t trust surveys’ and ‘previous bad experience’. If at least one of these reasons was recorded by the interviewer, this is counted as a negative attitude towards surveys.
152
RESPONSE ENHANCEMENT
The average number of reasons for refusal coded at the first contact varied from 1 in France and Hungary to 2.7 in Greece. This makes comparisons between reasons for refusal across countries rather difficult. For this reason, the total number of reasons coded is set at 100%, but it is important to bear in mind that this is based not only on a varying number of refusals recorded at the time of the initial refusal, but also on a varying number of refusals between countries (from just 381 in Hungary to 2334 in Germany; see the last column of Table 6.7). There are wide differences in the reasons for refusal between countries. In nine of the 22 countries in Table 6.7, 50% or more of the reasons recorded for refusal were ‘not interested’ in the survey. In Slovenia and Ukraine, however, ‘no interest’ represents just a quarter and a third of the reasons, respectively. Bad timing was recorded relatively often (in 20% or more of cases) in Belgium, Estonia, Finland, Germany, the United Kingdom, Hungary, Slovenia and Switzerland. ‘Never do surveys’ stands out as the second most popular reason for refusal in Germany (27%), but is never recorded by more than 15% in any other country. Privacy is recorded as the reason for refusal for almost a fifth of all reasons given in the Czech Republic, the Netherlands and Ukraine, but far less often in other countries, and it accounts for just 2% of the refusals in Estonia and France. A negative attitude towards surveys, presumably the most problematic category, is recorded as a reason for refusal in 30% or more of all refusals encountered in Ukraine, Poland and France. And in a number of other countries this accounted for a fifth or more of all refusals (the Czech Republic, Spain, Portugal, Sweden and Slovenia). However, it was rarely recorded in Germany and Estonia. The most important reason for refusal coded at the first contact is ‘no interest’, which can perhaps be seen as a kind of ‘easy escape’ by the respondent, and it is therefore not surprising that this is by far the most popular category (see Table 6.7). It should be borne in mind that ‘no interest’ can have at least two different meanings: the target person may have no interest in participating in surveys or he or she may have no interest in the subject of this particular study. We do not know which was the deciding factor when the interviewer coded ‘no interest’; this is a weakness in the current categories provided on the contact form. The wide variation across countries in the reasons recorded for refusal raises questions as to the relationship between these reasons and the final refusal rates. The strongest predictor for higher refusal rates at country level is the argument ‘I never participate in surveys’ (r ¼ 0.452). ‘No interest’ is moderately correlated with final refusal (r ¼ 0.233) but ‘bad timing’ is weakly related (r ¼ 0.089). This is understandable, since respondents who argue that the timing of a visit is not suitable are presumably more likely to participate if a visit is made at a more convenient time. This topic will be further investigated in Chapter 7. Privacy concerns are moderately but negatively related to the refusal rates (r ¼ –0.262). This is rather strange, since it means that countries where privacy concerns are recorded more often than in other countries have lower refusal rates.
EFFECTS OF ENHANCED FIELD EFFORTS
153
6.7 Effects of Enhanced Field Efforts in the ESS Whereas the previous sections have mainly focused on efforts, this section discusses the aggregate results of those efforts. The first question is therefore to what extent the large number of interviewer calls prescribed and the sometimes even larger number of calls made in the ESS ultimately increased response rates compared to not having made such efforts. The second question is whether early respondents differ from late respondents with respect to socio-demographic variables and core variables of the survey. In other words, does making more calls and being persistent actually make a difference in the survey results? Figure 6.10 shows that additional efforts to include more target persons do indeed have a large effect on response rates. Stopping after one or two calls would mean that a much smaller sample would be achieved, or that a much larger gross sample would be required to end up with the same number of interviews.13 In Greece (GR), the initial response rate after two calls is around 65% and the final rate 79%. This represents a relative increase of 22% due to additional calls. In the Netherlands, on the other hand, the final response rate of 64% is more than three times as high as the response rate of 20% after two calls. However, the key question is whether each additional interview counts, not only in terms of increasing response rates but also in terms of making the
90 80 70 60 50
%
40 30 20 10 0 EE GR PL PT FI
1 or 2 calls 3 calls
Figure 6.10
13
NO HU SE NL DK SK AT
4 calls
IE BE ES DE LU CH FR
All calls excluding refusal conversion All calls including refusal conversion
The effect of an increased number of calls on response rates in ESS 2
Of course, if only one or two calls were made, the calling pattern could change drastically, which could result in higher contact and response rates (for example, only calling in the evening). See also Curtin, Presser and Singer (2000, p. 426).
154 (a)
RESPONSE ENHANCEMENT Urban areas
40
Male
60
35
50
30 40
25
% 30
% 20 15
20
10 10
5 0
0
BE CH DE DK EE ES FR
IE
NL
PT
BE CH DE DK EE ES FR
15–24 years
20
IE
NL
PT
≥ 65 years
40 35
16
30 25
12
% 20
% 8
15 10
4
5 0
0
BE CH DE DK EE ES FR
IE
NL
PT
BE CH DE DK EE ES FR
Active on the labour market 80
90
70
80
60
70
IE
NL
PT
Higher secondary + tertiary education
60
50
% 40
%
30
50 40 30
20
20
10
10 0
0
BE CH DE DK EE ES FR 1
2
3
IE
NL
PT
BE CH DE DK EE ES FR
IE
NL
PT
>4
Figure 6.11 (a) Background characteristics at first, second, third, fourth and more calls (unweighted, not cumulative) in ESS 2. (b) Substantive survey outcomes at first, second, third, fourth and more calls (unweighted, not cumulative). The scales are described in Appendix 8.1 survey results more representative. The answer to this question can be gleaned from Figures 6.11(a) and 6.11(b). These figures present the survey outcomes for those respondents who cooperated at the first call, the second call, the third call and the fourth or later calls. It should be borne in mind that these results are not cumulative. No distinction is made between calls before establishing contact, contacts and unsuccessful calls after first contact has been established. This means that the effect of increasing the contact rate and increasing the cooperation rate cannot be distinguished here, for a very practical
EFFECTS OF ENHANCED FIELD EFFORTS (b)
155
Admit immigrants
Ethnic threat
3.0
5.0
2.5
4.0
2.0
3.0
%
% 1.5
2.0 1.0 1.0
0.5
0.0
0.0
BE CH DE DK EE ES
FR
BE CH DE DK EE ES FR
IE NL PT
Religious involvement
4.5
IE
NL PT
Political interest
3
4.0 3.5 3.0
%
2
2.5
%
2.0 1.5
1
1.0 0.5 0.0
0
BE CH DE DK EE ES FR
IE
NL
PT
7
6
5
5
%
IE
NL PT Social trust
7
6
4
%
BE CH DE DK EE ES FR
Trust in political institutions
4
3
3
2
2
1
1 0
0
BE CH DE DK EE ES FR 1
2
3
IE
NL PT
BE CH DE DK EE ES FR
IE NL PT
>4
Figure 6.11
(Continued)
reason: it is only for those target persons who gave an interview that the substantive information is available that allows comparisons to be drawn. The results in Figure 6.11 are given only for those countries where complete information was available, and exclude countries where the first calls were mainly made by telephone. Furthermore, only those countries are included where additional calls resulted in a considerable increase in response rates. In Greece, for instance, a much smaller change in outcomes is expected than in the Netherlands, because in the former country the majority of the interviews were realized at the first two calls, whereas in the Netherlands this applied only for a minority. Figure 6.11(a) clearly shows the relationship between urbanicity and nonresponse (see Sections 6.2 and 6.3). As expected, urban respondents were interviewed at later calls in half the countries included in this analysis (Belgium, Switzerland, Germany,
156
RESPONSE ENHANCEMENT
Spain, France and Portugal). The relationship is muddled in other countries, however, and operates in the opposite direction in Ireland. The pattern is less clear for sex. In Spain, Ireland and Portugal, a probable underrepresentation of men is compensated for by making more calls. In other countries, such as the Netherlands, additional calls have no such effects and in yet other countries the pattern is erratic. The relationship becomes clearer again for age. Although we do not compare our results with population statistics, and we therefore cannot say here whether our results improve representativeness, it is clear that those who were interviewed at later calls are generally a little less likely to be very young (15–24 years in ESS terms), and much less likely to be in the oldest age group (aged 65 or over). As was found in the literature cited in Section 6.2, and probably related to age too, early respondents are much less likely to be active on the labour market than late respondents. Finally, the education level increases with the number of calls, quite substantially in some countries (e.g. Spain). It should be borne in mind that these are univariate relationships: it could well be that differences in labour market position and education level between early and late respondents are a direct consequence of the overrepresentation of the oldest age group among the early respondents. Figure 6.11(b) presents the results for a number of core substantive variables in the ESS. These changes are discussed again in Chapter 7 (in terms of the impact of refusal conversion on these outcomes) and in Chapter 8 (in terms of identifying possible nonresponse bias). It can be seen that the differences between calls are smaller than for the background variables, although the direction of change is often similar across countries. Furthermore, it should be noted that – if core variables give different means across calls – this could be due to the fact that early respondents simply differ from late respondents, or that the effect of calls is due to the different socio-demographic composition of the respondents at different calls. This is related to the distinction between Not Missing at Random (NMAR) and Missing at Random (MAR) given covariates, discussed in Section 2.6. These issues will be analysed in detail in Chapter 8. Here, we merely show that early and late respondents do differ. Early respondents are slightly more willing to admit immigrants and less likely to see them as a threat, and are somewhat more religiously involved. Contrary to expectation, early respondents in a number of countries are slightly less interested in politics than late respondents. The difference between calls with regard to trust in political institutions and social trust is generally very small. This reinforces the point that nonresponse bias is item-specific rather than applying equally to all variables in a survey.
6.8 Conclusion On the basis of the information presented in the previous section, we can conclude that the ESS strategy of prescribing at least four calls to noncontacted units and recommending re-approaching initially reluctant target persons was successful because it resulted in response rates that were higher, and in some countries very much higher, than without these efforts. It was also shown, however, that a cross-national comparison
CONCLUSION
157
has to take many factors into account to evaluate the real impact of these efforts. Firstly, there are the differences in sampling frames and recruitment modes, which can have an effect on the response outcomes. Secondly, the ease of contact and the willingness to cooperate differs across countries (Section 6.5 and 6.6). For this reason, extra efforts are less necessary or will have a smaller impact in some countries than others. In the Slovak Republic, for instance, almost 90% of those who were finally contacted required only one call to establish contact, whilst in Portugal the figure was only 45% (see Section 6.5.3). Or there is Greece, where the proportion of respondents who cooperated at the first contact was 86%, whilst in the Netherlands this figure was around 13% (see Figure 6.9). In addition, the efficacy of evening and weekend calls is not the same across Europe (Table 6.4). In some countries these calls are really essential, whilst in others their effect is more limited. This suggests that in future efforts should be made to tailor the ESS rules to country-specific situations, whilst always ensuring that the aim is to try to improve representativeness. Such efforts must of course be evidence-based, using the results of contact form data. For instance, some countries might need to put more effort into weekend calls, while in others evening calls should be the priority. The combined evidence perhaps suggests that conducting a survey is easier and less expensive in the newer EU countries than in other countries. This could be due partly to the fact that surveys such as the European Social Survey are a newer phenomenon in the newer member states and that people appreciate being invited to participate in something special and interesting, unlike older EU countries where some survey saturation can be expected. Furthermore, the possibility cannot be excluded that survey participation rates will decline as survey scarcity decreases in these new EU countries. A more operational conclusion of this chapter is that we need to improve the quality of the contact form data. Some countries had to be left out of the analysis because data were missing or incomplete. It is also recommended that the recording and coding of reasons for refusal be more standardized, if only because of the large differences between the number of reasons recorded and some lingering doubts about the equivalence of the categories across countries. This will require clearer guidelines and targeted interviewer training. The same probably holds for the final disposition code ‘not able/other’, which shows some unexplained differences between countries. This brings us to another issue that was raised at the beginning of this chapter: the tables and figures as presented in the previous sections can only give an indication of what is really happening in the field. To explain differences between countries properly, this should be supplemented with local expert knowledge, something that was done only infrequently when preparing this book. A related issue is that ways need to be found to make the contact form data available during the fieldwork. Only in this way can this information be of direct, practical use to the local fieldwork teams. The most important conclusions of this chapter are that response rates can be enhanced, that contact form data can help to evaluate field strategies, and that extra field efforts will bring in respondents who differ from respondents that are easy to contact and immediately willing to cooperate. What we do not yet know is the effect of increased response rates on bias, and how much nonrespondents differ from respondents. These topics will be discussed in the next two chapters.
158
Appendix 6.1 Response Outcomes in ESS 1, 2 and 3 (%) ESS 1 Country Interview
Non- Refusal Not able/ contact other
Interview
ESS 3
Non- Refusal Not able/ Interview Non- Refusal contact other contact
60 58
10 5
27 25
3 12
62 61
7 4
30 26
1 9
33
4
51
13
49
2
44
5
43 52 68
12 6 4
20 28 24
25 14 5
52 73 43 80 69 64 71
8 1 15 2 3 8 3
34 21 39 17 15 23 21
7 5 4 2 13 5 5
55 51 64 79 55 71 43 79 67 62
11 7 5 3 7 2 9 4 6 11
11 33 25 11 25 22 39 17 15 22
23 9 6 6 13 5 9 1 13 6
51
5
39
5
42 43
3 7
44 37
12 14
50
7
35
8
68
3
26
4
64
3
29
4
Not able/ other
64 61 65 50 67
10 3 3 2 2
25 24 26 41 4
2 12 6 7 26
53 51 65 66 64 46
5 3 13 3 3 7
25 38 19 22 23 41
17 8 3 9 10 7
66 57
3 21
26 17
5 5
71 60
10 3
18 33
1 4
RESPONSE ENHANCEMENT
AT BE BG CH CY CZ DE DK EE ES FI FR GR HU IE IL IS IT LU LV NL
ESS 2
65 72 69
3 1 3
25 20 27
7 8 1
66 74 71
2 1 3
26 19 18
6 6 8
70 71
4 2
21 15
6 11
55
4
31
11
65 70 63 51 67 51
2 10 4 14 6 8
22 15 23 24 16 34
10 4 10 12 11 8
64 70 73 72 70 66 65 73
1 1 4 10 5 2 3 4
26 16 21 18 24 23 16 15
9 12 3 1 2 10 16 8
66 52
7 7
26 27
1 14
APPENDIX 6.1 RESPONSE OUTCOMES IN ESS 1, 2 AND 3 (%)
NO PL PT RO RU SE SI SK TR UA UK
159
160
RESPONSE ENHANCEMENT
Figures 6.1, 6.2, 6.4 and Tables 6.1, 6.2 and 6.4 reproduced by permission of the authors of the research reports of CeSO, K.U. Leuven.
7
Refusal Conversion 7.1 Introduction One of the basic principles of the European Social Survey (ESS) is to standardize, where possible, all the steps in the survey process and to reduce unnecessary differences between countries in research design and fieldwork procedures in order to optimize the substantive comparability between countries. As already discussed in Chapter 3, in order to realize this objective a Specification for Participating Countries has been developed (European Social Survey, 2007b), which imposes some important elements that are necessary to reduce and evaluate nonresponse error. These include target response rates, response rate enhancement procedures, response rate calculation and documentation rules, and the definition of field outcomes. The minimum target response rate is 70%, and in order to reach this ambitious target the fieldwork specifications include a number of measures: selecting a fieldwork agency experienced in probability sampling methods; using experienced interviewers where possible; interviewer training; personal briefing sessions for all interviewers (including a session on doorstep introductions and persuasion skills); and the reissuing of all ‘easy-to-convert’ refusals and as many ‘hard’ refusals as possible (Billiet et al., 2007). It is important to remember, however, that the setting of standards and challenging targets does not always guarantee that they will be met (Park and Jowell, 1997). In this chapter, the focus is on one strategy that has been implemented in the ESS; namely, the attempt to convince target persons to cooperate after an initial refusal. We call this refusal conversion. Apart from leading to higher response rates per se, refusal conversion has also been used to study nonresponse bias (Groves and Couper,
Improving Survey Response: Lessons learned from the European Social Survey Ineke Stoop, Jaak Billiet, Achim Koch and Rory Fitzgerald Ó 2010 John Wiley & Sons, Ltd
162
REFUSAL CONVERSION
1998, p. 49). The rationale behind this is that reluctant respondents are often treated as proxies for final nonrespondents (e.g. Smith, 1984). This use of refusal conversion cases will be discussed in Chapter 8. In this chapter, we focus on the response enhancement consequences of refusal conversion and related issues. Using the ESS, we will examine the (optimal) organization of refusal conversion attempts, the differences between cooperative respondents and reluctant respondents in terms of social background variables, and the emerging concerns regarding privacy and refusal conversion in some countries. Refusal conversion is often necessary when random probability sampling without substitution is used as the sampling method for a survey. In the ESS, the requirement to implement refusal conversion complements the rule that that at least four contact attempts should be made before abandoning a sampling unit as a noncontact in order to maximize the contact rate (see Chapter 6). The idea of contacting nonparticipants in a renewed attempt to interview them might appear to be an impossible task. However, refusing to cooperate in a survey is not always seen as a permanent state. Someone may refuse to take part in a particular survey in certain circumstances, but in other circumstances the response may be positive (Loosveldt, Carton and Billiet, 2004, p. 73). The person may be busy when the interviewer arrives, or may be feeling unwell or irritable, and therefore refuse. In many cases, a later visit at a better time, or perhaps by a different interviewer, might generate a more positive response. In their conceptual framework for survey cooperation, Groves and Couper (1998) consider the decision to participate in a survey interview as heuristic. Most respondents do not expend a great deal of cognitive effort on deciding whether or not to participate. A respondent’s decision to refuse is not well substantiated and it can be regarded as changeable. From this point of view, the assignment of the refusal to an experienced interviewer who differs in sex and age from the previous interviewer, or the use of a ‘cooling off’ period, is recommended (Groves, 1989; Groves and Couper, 1998). It appears that consistent hardcore refusers are probably a rather small part of the total group of refusers (Billiet, Koch and Philippens, 2007, p. 126) and some empirical evidence even suggests that there is no hardcore group of adamant refusers who never cooperate in surveys at all (Schnell, 1997, p. 190). The ethics of refusal conversion will be discussed at the end of this chapter.
7.2 Previous Research 7.2.1
Research questions
The amount of previous research on refusal conversion in face-to-face surveys is relatively small. It is therefore necessary to consider research related to refusal conversion in various other modes. In a research report on nonresponse in the 2002 National Surveys of America’s Families (NSAF), Triplett (2006) stated that refusal conversion is a standard practice in most US survey organizations for telephone surveys. The more direct consequence of refusal conversion, according to Triplett, is
PREVIOUS RESEARCH
163
that a substantial portion of final datasets for telephone surveys consist of converted cases. Refusal conversion is probably standard practice in telephone surveys or in web/ postal surveys, because repeat contact attempts to nonrespondents are fairly easy and cheap. In face-to-face surveys, by contrast, refusal conversion is a relatively expensive technique for increasing response, and perhaps compares poorly to the use of respondent incentives or extended interviewer efforts to make contact (Burton, Laurie and Lynn, 2006, p. 459). Many fieldwork agencies involved in face-to-face surveys appear to be disinclined to practise refusal conversion. Apart from the large extra survey costs related to refusal conversion attempts in face-to-face interviews in some countries, the process is complicated by privacy and data collection concerns. This is particularly the case when individual named samples are employed for sampling. In some countries where such a sampling frame is available, it is forbidden to reapproach selected people who refuse to cooperate: those who explicitly state that they do not want to cooperate must be deleted from the sampling list. Examples of this in the ESS include Sweden, Norway and Belgium. In the latter country, for example, the researcher may obtain random addresses from population registers, but is obliged to inform the selected units about their inclusion in the sample and to give them the opportunity to refuse. Individuals who refuse formally at that stage must be removed from the sampling list. Refusal conversion is only possible at later stages when a refusal is not explicit. Interviewers therefore have to be trained to distinguish between such ‘hard’ and ‘soft’ refusals. Several studies have sought to evaluate the success rates of refusal conversion efforts from different angles. Some studies simply specify the proportion of all reapproached refusals who eventually decide to cooperate, or the proportion of all reapproached refusals who refuse for a second time (Triplett, 2002; Retzer and Schipani, 2005). Others compare the characteristics of the total sample before and after refusal conversion efforts, most frequently examining whether the sample composition changes on social background variables such as sex, age and geographical composition (Lavrakas, Bauman and Merkle, 1992; Keeter et al., 2000; Triplett et al., 2002; Retzer, Schipani and Cho, 2005). The findings of these studies paint a mixed picture: some report significant differences with respect to demographic, behavioural and attitudinal variables once reluctant respondents are included (Stinchcombe, Jones and Sheatsley, 1981), but others do not find substantial differences (Smith, 1984; Lynn et al., 2002b). Some studies go further and assess the impact of refusal conversion attempts on nonresponse bias by comparing survey estimates of subjective (attitudinal) variables with and without the converted refusals (Burton, Laurie and Lynn, 2006, p. 459). This has been done for telephone surveys (Curtin, Presser and Singer, 2000; Triplett et al., 2002; Triplett and Abi-Habib, 2005) as well as face-to-face surveys (Lynn and Clarke, 2002; Lynn et al., 2002b; Stoop, 2004; Billiet et al., 2007). These studies interpret differences between cooperative respondents and reluctant respondents, or differences in characteristics of samples before and after including the reluctant respondents as indications of bias. The variables on which these studies focus are suggested by
164
REFUSAL CONVERSION
theories and previous research on nonresponse. They therefore tend to focus on the subjective variables and constructs that are meaningful in the context of substantive research (Curtin, Presser and Singer, 2000; Billiet et al., 2007) rather than on sociodemographic variables. To an extent, therefore, this represents a different emphasis from the usual efforts to identify and correct for nonresponse bias using poststratification weighting according to demographic variables (see Chapter 8). Several studies seek to understand the factors that explain variation in the success of refusal conversion attempts (Dong and Fuse, 2004; Fuse and Dong, 2005). A number of different questions have been asked. Is it better to use the same interviewer or a new interviewer (Groves, 1989, p. 218)? Is the success of refusal conversion dependent on the elapsed time between the initial refusal and the refusal conversion attempt (Triplett, Scheib and Blair, 2001)? What is the effect of incentives on the outcome of refusal conversion attempts (Groves et al., 1999; Kropf, Blair and Scheib, 1999; Stoop, 2005)? Are refusal conversion attempts more successful if interviewers only try to pose a short questionnaire rather than the full-length original questionnaire (Triplett et al., 2002)? Which approach is most effective, the standard refusal approach made as if there had been no previous contact attempts, or an approach in which the previous refusal and the reason for it is mentioned? Owing to the extra costs involved, researchers have studied the cost implications of refusal conversion in the light of what has been gained (Triplett, 2002). Response propensity and measurement error have received significant attention in recent years, including the development of a theoretical framework by Olson (2007). To a certain extent, this stems from interest in the cost benefit of refusal conversion (Miller and Wedeking, 2004). Are additional data collected by spending additional funds on refusal conversion of the same quality as the original sample of cooperative respondents? Do additional respondents contribute to better representativeness of the sample according to the relevant variables that are measured? The reason for asking these questions is the suggestion that respondents who are difficult to persuade to participate may lack the motivation to think carefully when providing answers (Groves and Couper, 1998, p. 271; Olson, 2006; Olson, Feng and Witt, 2008). In turn, it is assumed that reluctant respondents are more likely to ‘satisfice’ rather than ‘optimize’ when answering questions (Krosnick, Miller and Wedeking, 2003; Miller and Wedeking, 2003; Kaminska and Billiet, 2007a: for mail surveys, see Kaminska, Goeminne and Swyngedouw, 2006).
7.2.2
How successful is refusal conversion?
There is some empirical evidence that refusal conversion is successful in increasing response rates. Some studies have found that up to 40% of people who initially refused to participate will subsequently complete an interview if they are re-contacted (Biemer et al., 1988). However, such a large increase in response appears unusual and is generally only found in telephone surveys. A more common finding in a number of telephone studies in the United States is an average conversion rate of between 13% and around 24% of those who initially refused (Kropf, Blair and Scheib, 1999;
PREVIOUS RESEARCH
165
Triplett, 2002, p. 27). Evidence from two US surveys shows how there has been a sharp increase in the number of final respondents obtained via refusal conversion in the last couple of decades or so. Between 1996 and 2002, the number of respondents obtained via refusal conversion in the American National Election Study (NES) increased substantially (Miller and Wedeking, 2006). Between 1980 and 1992, 2% or less of the final respondents were obtained via refusal conversion; between 1996 and 2000, this proportion was around 15%. Given that the overall response rate in 1996 and 2000 was lower than in previous years, the number of reluctant respondents as a proportion of all respondents increased substantially. A similar trend was observed in the US Survey of Consumer Attitudes, where the number of reluctant respondents as a proportion of all respondents increased from around 7% in 1979 to just under 15% in 1996 (Curtin, Presser and Singer, 2000). The Dutch Amenities and Services Utilisation Survey (AVO), conducted in 1999 among approximately 10 000 households, produced some important insights (Stoop, 2004, 2005). In this face-to-face survey, the number of reluctant respondents who finally cooperated was just over a fifth (22%) of the original refusers; the overall response rate increased by around 8 percentage points because of the refusal conversion efforts. The final response rate was 66%. The field agency put considerable effort into refusal conversion. This included a range of measures such as personalized advance letters (where possible), sending a brochure, insisting on face-to-face contact attempts, detailed fieldwork control measures and reissues to new rather than original interviewers (Stoop, 2004, p. 27). Further analysis revealed substantial differences in the success of refusal conversion according to socio-demographic characteristics of the sampled households. The percentage of converted refusals was highest in the urban conglomeration in the west of the Netherlands (the Randstad, incorporating the cities of Amsterdam, Rotterdam, The Hague and Utrecht) and lowest in the more rural central eastern region of the Netherlands. In part, these differences reflected the original level of nonresponse, with refusal conversion being higher in areas where the original response rates had been lowest. However, this pattern was not always observed: single males, for example, had the lowest initial response rates, but the number of converted refusals among this group remained small (Stoop, 2005). Although there is some evidence that refusal conversion is a good strategy for enhancing response rates, it is not unproblematic Firstly, interviewers might be more willing to accept refusals when they know there will be a subsequent refusal conversion phase. Secondly, the success of refusal conversion will obviously be smaller when initial response rates are high. This could mean that more initial efforts have been made in high-response countries, which is likely to result in less refusal conversion because the remaining refusals will be ‘harder’. It could also be that there is little room for refusal conversion for those respondents who were contacted late and when fieldwork is nearing its end (Verhagen, 2008): it is even possible that efforts to convert refusals may interfere with efforts to reach noncontacted target persons. This provides yet another reason to study the effect of refusal conversion in the ESS more closely in this chapter.
166
7.2.3
REFUSAL CONVERSION
Which factors contribute to successful conversion?
Two key issues in refusal conversion that have been studied in some detail are the elapsed time between the initial refusal and the subsequent conversion attempt, and the effect of incentives at the refusal conversion stage. Less research has been devoted to other areas, such as the impact of assigning refusal conversion to a different interviewer from the one who received the original refusal, the number of refusal conversion attempts, or the effect of providing or withholding information about the original refusal from a new interviewer. We will return to some of these issues later when discussing results from the ESS. 7.2.3.1 Elapsed time A review of nine national telephone surveys, all conducted by the Survey Research Center of the University of Maryland, United States (between spring 1995 and summer 2000), found that the refusal conversion rate was at its lowest during the first six days after the initial refusal occurred (Triplett, 2002, p. 26). The success rate increased somewhat when the call centre waited seven days before attempting to make contact again and remained fairly stable until the thirteenth day. Waiting between 14 and 17 days did improve the refusal conversion success rate, however. Thus, waiting a little more than two weeks seemed optimal, but after 18 days the refusal conversion rate began to decline. Some differences in optimal timing where found depending on whether self-refusals or proxy refusals were re-approached. Among proxy refusals, the situation was somewhat different. In these cases, a re-contact attempt appeared to be optimal one week after the initial refusal by the proxy. It was also observed that successful refusal conversion takes almost five prior calls (Triplett, Scheib and Blair, 2001; Triplett, 2002, pp. 26–7). Data from other sources paint a more complex picture. Using data from the 2001 California Health Interview Survey (CHIS) RDD telephone survey, Edwards et al. (2004, p. 3440) found that the refusal conversion rate increased steadily up to about three weeks after the initial refusal. However, when he tried to replicate this for a shorter screening interview using an experimental design in a new RDD telephone survey (CHIS 2003), there was no evidence that the refusal conversion rate increased steadily over three weeks. The study produced conflicting results, which suggests that there might not be a clear rule that applies to such efforts across surveys. 7.2.3.2 Incentives and refusal conversion Kropf, Blair and Scheib (1999, pp. 1081–4) studied the effect of alternative incentives on cooperation and refusal conversion in the National Omnibus Telephone Survey. They found that 27.5% of initial refusers who were offered a conditional incentive (a US$5 donation to a charity of their choice) became respondents compared to 21.8% of those not offered any incentive. However, this
PREVIOUS RESEARCH
167
difference was not significant because the total number of re-approached refusals was small. Stoop (2005) reports the effect of incentives in a small-follow up survey among a sample of 350 initial refusers in the 1999 Amenities and Services Utilisation Survey (AVO 1999; see above). Compared to the regular AVO, the response burden of the follow-up survey was substantially lower: only a subset of questions were asked, respondents could choose between several modes, and interviewers received around D 25 per target person to spend on incentives. The combination of these measures led to a final cooperation rate of more than 70%. Of those who refused to cooperate in the follow-up survey, a third agreed to be interviewed by telephone after a call-back from a telephone unit. Interviewers were encouraged to record their successful strategies as a source of inspiration for their fellow interviewers, and focus groups consisting of interviewers discussed the reasons for success in the follow-up survey. Interviewers referred to the higher than usual payment of the interviewers, the multi-mode character of the survey, and the possibility of giving (monetary) incentives as the key to success (Stoop, 2005, pp. 148–54). 7.2.3.3 Assigning refusal conversion cases to new interviewers A common strategy in fieldwork administration is the reassignment of reluctant cases to more senior interviewers or supervisors. In these instances, it is common for these interviewers to mention the prior visit. Sometimes the refuser will be asked what concerns they have about taking part, a question that may be awkward when asked by the person who received the initial refusal. It can, however, help the new interviewer to understand the reasons for the initial refusal and then address these concerns. This alone will lead to some refusal conversion (Groves and Couper, 1998, p. 291). Unfortunately, there is little solid research that examines the effect of reassignment, and the effectiveness of more experienced interviewers. It would be helpful if experimental research were conducted in this area in future. 7.2.3.4 Contact attempts and reasons for refusal Of course, even if a new interviewer makes a visit to attempt refusal conversion, there is no guarantee that the refuser will be at home or that they can be persuaded to take part. Lind, Johnson and Parker (1998) found that the use of two refusal conversion attempts significantly increased the odds of participation, whilst Laurie, Smith and Scott (1999) showed that the initial reason given for the first refusal is strongly related to the potential for later conversion success. As noted in Chapter 6, this is potentially problematic, since those giving anti-survey reasons are less likely to be randomly scattered among the refusers than those giving more personal situational reasons (see Section 6.3.3). This suggests that refusal conversion might be more successful at recruiting refusers who are similar to cooperative respondents, and therefore that it fails to tackle bias.
168
7.2.4
REFUSAL CONVERSION
Refusal conversion and data quality
Refusal conversion can impact upon two key data quality characteristics, namely nonresponse bias (see Section 2.6) and measurement error (Section 1.3) (Olson, 2006). Below, we will present some descriptive findings about the effect of refusal conversion on socio-demographic distributions. The second aspect of data quality deals with the effect of refusal conversion on measurement error in terms of increasing satisficing, item-nonresponse and other similar indicators of poorer data quality. It is, of course, possible that these two aspects might be in conflict in some cases. The sample achieved might become more representative, but the data quality might decline as more reluctant respondents are included in the sample.
7.2.4.1 Effect of refusal conversion on socio-demographic distributions In a rare example of a study in which the socio-demographic characteristics of reluctant and cooperative respondents were compared, Stoop (2005, pp. 216–17) found that converting refusals actually worsened the sample structure according to the socio-demographic characteristics of the known population. For example, the percentage of single males, who are generally characterized by high nonresponse rates, became an even smaller part of the sample after refusal conversion. And whilst those with a higher level of education had originally been overrepresented (23.4%) compared to the official statistics (22.4%), once reluctant respondents were included they actually became underrepresented (20.9%), with the sample being even less representative of this group than before refusal conversion. It appears that the interviewers did not succeed, did not try or were not instructed to convert highly educated sample persons. In the study of the Index of Consumer Sentiment survey between 1979 and 1996 (Curtin, Presser and Singer, 2000), converted refusals were disproportionately respondents of lower socio-economic status (whilst those who were more difficult to contact were disproportionately of higher socio-economic status). As in the study by Stoop, men were found to be less likely to be successfully converted after a refusal than women. Differences in the propensity to be converted were also found according to race: nonwhites were less likely to end up being converted refusals. It is not clear from this study whether refusal conversion makes the sample less representative according to these background variables. In the nonresponse analysis in the NSAF (see Section 7.2.1), Triplett (2002) found that converted refusals were less likely to live in families that had children and more likely to live in families that contained larger numbers of older adults. Refusal conversion did not increase the proportion of non-US citizens interviewed. These findings on socio-demographics and refusal conversion are not generalizable. In an analysis based on the 1988 National Election Survey (NES), Brehm found that the effect of refusal conversion on the percentages of respondents who are married, working or male were both small and negligible when compared with the effect of increasing the number of calls, which had a far greater impact.
PREVIOUS RESEARCH
169
The only noticeable (though still small) effect from refusal conversion was on the age of the respondents; the mean age of the final sample obtained (45.1) would have been somewhat lower (44.8) without refusal conversion, suggesting that disproportionately larger numbers of older respondents were converted (Brehm, 1993, p. 162). 7.2.4.2 Effect of refusal conversion on measurement error As noted earlier, the data quality among reluctant respondents also needs to be considered. As early as 1963, Cannel and Fowler (1963, p. 263) found that reluctant respondents provided poorer-quality data. They attributed this effect mainly to lower respondent motivation. Their validity study was partly administered through a selfcompletion questionnaire and partly through face-to-face interviews. In the selfenumerative condition, three groups were distinguished: early return (cooperative respondents), late return (after a second mailing if they had not returned the first one after seven weeks) and finally those who had to be contacted by telephone or personal interview in the case of nonresponse to the second mailing. The ‘reluctant respondents’ (late return) provided less accurate reports on past events than the early respondents (Cannell and Fowler,1963, pp. 259, 262). It was not possible to distinguish the effect of motivation and mode on accuracy among the respondents who did not respond to the second mailing in the self-enumerative condition, since these respondents were then approached by telephone or face-to-face. Bradburn (1984) discussed the issue more generally, suggesting a possible negative effect of interviewer persistence on response behaviour, later called ‘satisficing’ by Krosnick and Alwin (1987). Satisficing cannot be measured directly, but six possible actions have been identified: selecting the first response alternative that seems to be reasonable without considering the others; agreeing with all the statements regardless of their content; nondifferentiation when using rating scales; repeatedly answering ‘don’t know’; giving socially desirable answers; and mental coin-flipping (Krosnick, 1991; see also Triplett, 2002, p. 20). Sometimes it is also possible to rely on the interviewer’s evaluation of the effort made by the respondent in order to detect satisficing. Another way of measuring satisficing is to examine the number of activities reported in the case of factual questions about past events (Triplett et al., 1996; Triplett, 2006), or even to look at the correspondence between responses and real facts, as in the case of validity studies (Cannel and Fowler, 1963; see above). Apart from such validity studies, there is no way of knowing whether a response reflects a true score or satisficing. It is therefore recommended not only to use multiple indicators but also to construct a latent variable (Kaminska, Goeminne and Swyngedouw, 2006). Several of these indicators have been used to test the hypothesis that reluctant respondents are prone to satisficing because they are less motivated, but the number of studies available is currently limited. Satisficing means that the respondents have opinions but put less cognitive effort into answering the questions by omitting some steps in the cognitive process of question-answering (Krosnick and Alwin, 1987).
170
REFUSAL CONVERSION
Blair and Chun (1992) found support for the hypothesis that converted refusers are more likely than cooperating respondents to say ‘don’t know’ or refuse to answer questions. In addition, they discovered that interviews with reluctant respondents were of significantly shorter duration than those with cooperating respondents. They found that these differences between reluctant and cooperative respondents were consistent across three different surveys. They did not find higher rates of primacy or recency effects among converted refusals. However, the alternative hypothesis for satisficing, positing that reluctant respondents might have less knowledge about research topics, appeared not to be supported by the data. The differences between reluctant and other respondents were consistent across three general population RDD surveys, despite widely varying subject matters (Blair and Chun,1992). Triplett et al. (1996, pp. 21–2) compared reluctant and willing respondents in a time-use study that was cognitively difficult and that provided a good opportunity to study the relationship between reluctance (motivation) and cognitive effort. The study had a large number of converted refusals (n ¼ 1112), a sizeable number of which were proxy refusals (n ¼ 412). It was found that converted self-refusals showed significantly higher item-nonresponse than both cooperative respondents and converted proxy refusals. A similar result was found for the number of activities reported for a 24-hour period. There were statistically significant differences between the reports of initial cooperators and the converted self-refusals, with the latter providing fewer reports (Triplett, 2002, pp. 22–3). Triplett also compared differences in reporting between initial respondents and cooperative respondents among proxy respondents, assuming that proxy reporting is cognitively more difficult than reporting about oneself. For the proxy sample, the differences between converted refusers and cooperators were not as clear as those reported in the self-sample (Triplett et al., 1996; Triplett, 2002, p. 22). In the 2000 National Election Study (NES), Miller and Wedeking (2006) found a strong indication that satisficing is more likely among reluctant respondents. According to the findings, reluctant respondents took significantly less time to answer the questions, showed less interest, less cooperation and more suspicion, and were more likely to make negative comments about the survey. Reluctant respondents were also more likely to select the ‘don’t know’ option and use the ‘mental coin-flipping’ response strategy. A limitation of this study, however, is that converted self-refusals cannot be distinguished from converted proxy refusals. Kaminska, Goeminne and Swyngedouw (2006) sought evidence of satisficing in a mail survey in which response differences between early and late respondents were analysed. Respondents were distinguished according to whether they responded to the initial mailing or to the first, second or third reminder. They found that respondents to the initial mailing scored lower on some measures of satisficing compared with respondents who only participated after a reminder. The problem of measurement quality among reluctant respondents has not only been examined for attitudinal questions. Studying 17 behavioural questions, Olson (2007) found that correlations between nonresponse and measurement error were not always negative. For many items, especially financial aid items, the correlation was
REFUSAL CONVERSION IN THE ESS
171
positive, implying that higher cooperation propensity has more measurement error, as measured by mismatch with administrative records. In this study, the relationship between nonresponse propensity and measurement error is item-specific, it can be positive as well as negative and it depends on possibly common causes of nonresponse and measurement errors.
7.3 Refusal Conversion in the ESS 7.3.1
Efforts and effects
Chapter 3 described the different ESS guidance documents that are available to NCs in each round, one of which outlines a variety of possible refusal conversion techniques. It is recommended that refusal conversion attempts be made by a different interviewer from the one who received the original refusal, and that attempts be made to convert all ‘soft’ refusals and as many ‘hard’ refusals as feasible. It is fair to say that the ESS specifications on refusal conversion are quite general, and there is no reference to using the data collected on the contact forms in order to plan and conduct refusal conversion during fieldwork. No clear definition is given of soft refusals, but it is suggested that these relate to the mood of the refuser at the time of the initial survey request and the specific circumstances at the time; for example, the interviewer calling during a family meal. The suggestion to reissue all soft refusals is in line with theory and follows the best practice identified earlier in this chapter. If the aim is to increase the response rate, this is clearly the optimal strategy, since soft refusals should be easier to convert than hard refusals. Table 7.1 summarizes refusal conversion in ESS 2. It shows that the effect of refusal conversion is a combination of the initial refusal rate, the number of re-approached initial refusals and the success rate of the conversion attempts. It is clear that the need, and possibly also the opportunity, for refusal conversion is much higher in some countries (the Netherlands and Switzerland, with initial refusal rates of 47%) than in others (11.2% in the Czech Republic). The final effect on response rates is shown as a percentage increase. This is computed by multiplying the percentage of initial refusals by the proportion of these refusals re-approached, and then by the success rate of the refusal conversion attempts. The effect on response rates increased when more initial refusals were re-approached and when the success rate of refusal conversion was higher. Taking the Netherlands as an example, we see that the percentage of initial refusals was reduced from 47% to 29%, a decrease of 18 percentage points. Additionally, the response rate also increased by 18 percentage points. The second factor determining the effect of refusal conversion is the percentage of initial refusals that are re-approached. In Greece, a country with a low initial refusal rate, more than 90% of the initial refusals were re-approached. In the Netherlands, the initial refusal rate was very high; here, almost 90% of the initial refusals were called on again. Hardly any initial refusals were re-approached in Denmark, France, Ireland, Luxembourg or Norway, countries with an initial refusal rate varying between 20% and 35%.
Initial refusal rate, number of refusal conversion attempts, and success rate of refusal conversion in ESS 2
Country
a
(1) Initial refusals
Austria Belgium Switzerland Czech Republic Germany Denmark Estonia Spain Finland France Greece Hungary Ireland Luxembourg Netherlands Norway Poland Portugal Sweden Slovak Republic
N
%
(2) Percentage of initial refusals re-approached
1078 854 2164 611 2334 575 485 841 700 1082 527 381 795 1127 1375 602 364 481 432 391
29.8 29.4 47.0 11.2 41.4 24.9 19.3 41.4 24.5 26.1 17.3 16.9 21.6 34.9 47.0 28.7 21.6 19.2 21.4 27.2
9.9 40.7 77.3 7.9 48.4 2.1 67.6 35.1 38.7 2.8 92.4 11.3 0.1 0.9 87.8 2.1 27.6 13.0 32.2 40.0
Expressed as percentage point differences relative to initial refusal rate.
(3) Success rate of refusal conversion (%)
(4) Effect on response ratesa [(1)/100] [(2)/100] (3)
1.9 24.7 10.4 8.3 43.5 16.7 61.0 26.4 18.5 40.0 4.7 39.5 0.0 20.0 43.3 38.5 36.7 20.8 10.2 40.2
0.06 2.96 3.78 0.07 8.73 0.09 7.95 3.84 1.75 0.29 0.75 0.76 0.00 0.06 17.89 2.33 2.19 0.52 0.71 4.38
REFUSAL CONVERSION
AT BE CH CZ DE DK EE ES FI FR GR HU IE LU NL NO PL PT SE SK
172
Table 7.1
REFUSAL CONVERSION IN THE ESS
173
The final factor is how successful conversion attempts are. Here too, we see large differences between countries. In France, Hungary, the Netherlands and the Slovak Republic, around 40% of the initial refusals decided to participate after all. In France, this represented 40% of the 2.8% of initial refusals who were re-approached; in the Netherlands, it represents 43.3% of the 87% re-approached initial refusals. The exercise in Estonia was fairly successful: 61% of the 67.6% re-approached initial refusals cooperated. Two other high-effort countries (Switzerland and Greece) had low success rates: 10.4% of the 77.3% re-approached initial refusals cooperated in Switzerland, and only 4.7% of the 92.4% in Greece. It should, however, be borne in mind here that the initial refusal rate was very high in Switzerland and rather low in Greece. This means that in Greece in particular, intensive efforts were directed towards bringing in relatively few additional respondents. The combination of countervailing factors that explain the effect on response enhancement is expressed in the moderate country-level (Pearson’s) correlation coefficient between the percentage of initial refusals that are re-approached and the final success rate (r ¼ 0.26). The success rates in some countries considerably exceed the 14–24% generally seen in previous studies. However, allowance has to be made for the fact that some countries seem to direct their refusal conversion attempts towards only a few cases (possibly the most promising), while others re-approach a substantial number, and sometimes almost all of the initial refusals. In the latter case, a high rate of refusal conversion is obviously much more impressive than in the former. Figure 7.1 summarizes the refusal conversion efforts and successes and highlights the large differences across countries. At one end of the continuum, an extraordinarily high percentage of refusals are reissued in Greece (92.4%), the Netherlands (87.8%) and Switzerland (76%). This is perhaps surprising in Greece, since the target response rate has already been exceeded, whereas in Switzerland this strategy is undoubtedly inspired by the high number of refusals and the consequent low response rate. At the other end of the continuum, virtually no effort is made to convert sampling units who refused. In Ireland, Luxembourg, Denmark and France, fewer than 3% of the initial refusals were reissued for a refusal conversion attempt. The case of Greece also nicely illustrates the combination of factors that affect response enhancement via refusal conversion; although a very high proportion of the initial refusals are re-approached, the success rate is very low. However, this is not a serious problem for Greece, since the initial refusal rate is already very low, at 17.3% (see Table 7.1). The last column of Table 7.1 shows the increase in response rates due to refusal conversion in ESS 2. In absolute numbers, the effect is limited in the majority of countries. In the first round of the ESS, only five countries (out of 21) had more than 100 reluctant respondents – that is, converted initial refusals – in their sample (AT, CH, DE, NL and UK). In ESS 2, there were again five countries (out of 24) with more than 100 reluctant respondents (CH, DE, NL, SK and EE). In ESS 3, there was an increase, with nine countries (out of 25) obtaining more than 100 converted refusals (BE, CH, DE, ES, FR, UK, NL, NO and SE). Later in this chapter we will pay special attention to Germany and the Netherlands, each of which obtained more than 400 converted refusals respondents. Switzerland, which had the lowest response rate in
174
REFUSAL CONVERSION GR NL CH EE DE BE SK FI ES SE PL NO PT HU AT CZ FR DK LU IE 0
10
No attempt
20
30
40
Attempt, no contact made
50 %
60
Attempt, no interview
70
80
90
100
Attempt, completed interview
Figure 7.1 The outcome of refusal conversion attempts as a percentage of all initial refusals in ESS 2 ESS 1, is the country with the next largest number of converted refusals (more than 140 converted refusals in each round). Figure 7.2 offers an alternative way of looking at the effect of refusal conversion on response enhancement. It shows the increase in the overall response due to refusal conversion. It is again quickly apparent that, with a few exceptions, the impact of refusal conversion on final response rates is minor. Substantial changes in response rates ( þ 3 percentage points) are observed in just five countries (CH, DE, EE, NL and SK). There is almost no effect on the response rate in 10 other countries (AT, CZ, FR, GR, HU, IE, IT, LU, PT and SE). In the six remaining countries, a small increase of between 2 and 3 percentage points is observed. Two other countries could not be included in the table because of data problems; however, they would also appear to show improvements of more than 3 percentage points (SI and UK).1 The call record data for ESS 3 (available for 19 countries) suggest that the impact of refusal conversion on response rates has increased compared to earlier rounds (see 1
Cooperative and reluctant respondents cannot be clearly distinguished because of inconsistencies in the case identification code. Not all interviews in the main file have call record data, and not all interviews detected in the call record data are identified in the main data file.
REFUSAL CONVERSION IN THE ESS
175
100 90 80 70 60 % 50 40 30 20 10 0 AT BE CH CZ DE DK EE ES FI FR GR HU IE IT LU NL NO PL PT SE SK Increase after refusal conversion Response rate before refusal conversion
Figure 7.2
The effects on final response rates of refusal conversion in ESS 2
Figure 7.3). In ESS 3, at least 10 countries obtained an increase in the response rates of more than 3 percentage points. The countries with the largest increases in response rates due to refusal conversion were once again the Netherlands (NL) and Germany (DE), with increases of 13 and 10 percentage points, respectively. Other successful countries include Spain (ES), Sweden (SE) and Slovenia (SI), all with increases of more than 5 percentage points. Note that in Spain refusal conversion appears to be part
100 90 80 70 60 % 50 40 30 20 10 0 AT BE CH DE DK ES FI FR GB HU IE NL NO PL PT RU SE SI SK UA Increase after refusal conversion Response rate before refusal conversion
Figure 7.3
The effects on final response rates of refusal conversion in ESS 3
176
REFUSAL CONVERSION
of a more concerted effort to increase response rates, since the initial refusal rate was lower in ESS 3, yet refusal conversion still accounted for a higher proportion of the final response rate than in earlier rounds. In Sweden, on the other hand, the initial refusal rate in ESS 3 is higher than in ESS 2, and the increase in refusal conversion as a proportion of the final response rate appears to reflect efforts to prevent the response rate from decreasing compared to earlier rounds. Without refusal conversion in ESS 3, five countries would have achieved response rates of less than 50%. In the end, only France obtained less than 50% response, and here the number of initial refusals reapproached for refusal conversion was very low.
7.3.2
Refusal type and refusal conversion
The success or failure of refusal conversion efforts is rarely discussed in terms of the relative propensity of different socio-demographic groups to allow themselves to be persuaded to take part. This is probably because there are rarely sufficient numbers of converted refusals or experimental designs that allow such questions to be answered in detail. Instead, the focus tends to be on the decisions made by survey organization and interviewers and how these affect the number of initial refusals that are eventually converted. The key questions are therefore as follows: What kind of initial refusals are re-approached for conversion attempts? What strategy is used for deciding which initial refusals to re-approach? What information, if any, from the contact forms is used to inform such a decision? ESS contact forms include both reasons for refusal and the interviewer’s opinion as to the likely success of a refusal conversion attempt. Using data from ESS 2, we will examine whether the elapsed time between the initial refusal and the conversion attempts matters, whether asking a different interviewer to attempt refusal conversion is more successful than using the same interviewer, and whether using interviewer estimations of future cooperation makes a difference. The remainder of this section draws on an analysis by Beullens, Vandecasteele and Billiet (2007). Firstly, however, we will discuss an important definition in relation to refusal conversion.
7.3.2.1 Defining reluctant respondents In the earlier sections on refusal conversion and measurement error, a distinction was made between cooperative and reluctant respondents, as well as those who remain as final refusals. The literature sometimes also refers to ‘soft’ and ‘hard’ refusals. It is therefore important to be clear as to the meaning of these categories and to try to apply them in the same way across countries. One approach to measuring reluctance distinguishes between hard and soft refusers. This approach categorizes respondents on a one-dimensional scale according to their degree of willingness versus resistance to survey participation. This scale is then used in ‘continuum of resistance’ models. Another, more qualitative approach assumes that all sample members have the potential to respond, but that nonrespondents or refusers drop out for various reasons.
REFUSAL CONVERSION IN THE ESS
177
This approach therefore defines a ‘classes of non-participants model’ (Lin and Schaeffer, 1995). It is therefore important to be able to look at the different categories of reluctant respondents; for example, according to whether the refusal was obtained from the respondent or by proxy. In the European Social Survey, the contact form provides information about the reason for refusal (see Section 6.6.4) and whether the refusal was communicated by the target person, by a proxy or even before the target respondent was identified. Moreover, supplementary information is provided by the interviewer; namely, his or her estimation of the future cooperation probability of the target respondent. In the remainder of this chapter, reluctant respondents will be defined empirically according to one or more of the following indicators: .
refusal by sample unit or by proxy;
.
interviewer estimation of future cooperation (see Appendix 3.1);
.
reason for refusal; and
.
number of refusals before cooperation was obtained.
Only one of these indicators is directly related to the subjective judgement of interviewers (estimation of future cooperation), although the reasons given for refusal are recorded by the interviewer and are not directly asked as a question in all cases. These categories have been applied to the samples from Germany and the Netherlands, since both these countries have a large enough number of converted refusals to allow more detailed distinctions to be drawn. Since Germany uses an individual named sample while the Netherlands uses an address sample, there are of course some differences regarding the estimation of future cooperation. In the German case, this information is only completed in cases of refusal by the target respondent (the selected sample unit from the individual named sample). In the Netherlands, interviewers provided an estimate not only when a refusal was obtained from the target respondent, but also in the case of proxy refusals and refusals before respondent selection. The categories used to indicate potential future cooperation are ‘will definitely not cooperate’, ‘will probably not cooperate’, ‘will probably cooperate’, ‘will cooperate’ and ‘no information’ (in the event of a missing value). In the Netherlands, there should always be an estimate, and the last category is therefore not applicable except in cases where interviewers did not provide an estimation.
7.3.2.2 Conversion success as a function of estimation of future cooperation in the German sample Table 7.2 reports the probability for Germany that an initial refuser will be reapproached and that the approach will be successful. The refusals are divided into categories according to type of refusal and estimated cooperation probability. The table first distinguishes between refusals by target respondent and refusals by proxy. If
178
REFUSAL CONVERSION
Table 7.2 Probability of conversion attempt and conversion success, by interviewer assessment, and refusal by target versus proxy in Germany (logistic regression) in ESS 2 N
Reissuing probabilitya
Conversion success probabilityb Same interviewer
New interviewer
Refusal by target person ‘Will definitely not cooperate’ 1402 ‘Will probably cooperate’, 338 ‘will cooperate’ or ‘will probably not cooperate’ No estimation of cooperation rate 227
0.46 0.62
0.28 0.61
0.40 0.74
0.52
0.66
0.77
262
0.62
0.37
0.51
Refusal by proxy a b
n ¼ 2229; G ¼ 43.23; df ¼ 3; p < 0.0001. n ¼ 1136; G2 ¼ 128.55; df ¼ 4; p < 0.0001. 2
the target person refused to cooperate, the interviewer attempted to evaluate the future cooperation probability. The original categories used in this estimation, ‘will definitely not cooperate’, ‘will probably not cooperate’, ‘will probably cooperate’ and ‘will cooperate’, are collapsed into two categories.2 A combination of both determinants results in one variable containing four categories: ‘target respondent will definitely not cooperate’, ‘target respondent will probably cooperate’, ‘no estimation of cooperation rate’ of target respondent and ‘refusal by proxy’. As future cooperation is dependent on the interviewer, the probability of successful refusal conversion is shown for the type of interviewer (same or new). Overall, we find that if the interviewer predicts future cooperation or if a refusal was obtained by proxy, the reissuing probability is higher. Cases where the interviewer predicts that the respondent will definitely not cooperate are the least likely to be re-approached and have a low conversion success probability. This suggests that interviewers can distinguish between hard and soft refusals, or that the survey organization has developed an effective reissuing strategy. On the other hand, the possibility cannot be excluded that taking converted refusers as a proxy for ‘real’ refusers may cause the nonresponse bias in the ESS to be underestimated. This issue is discussed further in Chapter 8. One striking observation is the low success rate of conversion by the same interviewer after a proxy refusal. We have no explanation for this.
2
This categorization is used because of the rather low number in the ‘will (probably) cooperate’ category in the German sample.
REFUSAL CONVERSION IN THE ESS
179
7.3.2.3 Reasons for initial refusal and probability of success in the German sample It is possible to relate the reasons for refusal (see Section 6.6.4) to the interviewer’s assessment of the likelihood of future cooperation. A correspondence analysis for Germany of the relationship between the estimated probability of future cooperation and the reasons for refusal resulted in three clusters: (1) target persons who are ‘not interested’, or for whom the interview comes at a bad time, generally have high estimated future cooperation rates; (2) a relatively low future cooperation probability is attributed to respondents who refuse for privacy reasons, because they never cooperate in surveys, because they have cooperated too often in the past or because they do not trust surveys; and (3) proxy refusals are categorized as ‘other’ reasons, since the respondent’s own position is not clear. Although we do not have information about how German interviewers went about making their assessments of target respondents’ future cooperation, these findings suggest that the information about the reason for refusal noted by the interviewer might be used when they are asked to estimate future cooperation prospects. It appears that ‘soft’ reasons for refusal, such as ‘bad timing’ or ‘not interested’, seem to be interpreted by the interviewer as an opportunity to successfully revisit the sample unit. On the other hand, interviewers might see straightforward motives such as ‘interference with privacy’ or ‘don’t trust surveys’ as clear indicators of future unwillingness to participate.
7.3.2.4 Conversion success as a function of estimation of future cooperation in the Dutch sample As we saw in Table 7.1, the reissue probabilities of refusals are much higher in the Netherlands (87.8%) than in Germany (48.4). The probability that Dutch refusals will be reissued seems to be related to the assessment of the cooperation probability made by the initial interviewer. However, the difference is relatively small; 85% of the sample units where the first interviewer thought a conversion attempt would definitely not succeed were re-approached, compared to 89% where the interviewer felt the respondent probably would cooperate. This 4 percentage points difference compares to 16 percentage points in Germany, highlighting a quite different strategy of concentrating on the softer refusals in the latter country. Table 7.3 shows that conversion success in the Netherlands was related to three variables: (1) whether or not there was a refusal before selection of the target person; (2) whether or not the case was reissued to a new interviewer; and (3) the estimated cooperation rate at first refusal. The original categories used in the estimation of future cooperation are collapsed in the same way as in Table 7.2. The clearest observation is that refusal conversion is much less successful in cases where the initial refusal came before the target person was selected. For example, in those
180
REFUSAL CONVERSION
Table 7.3 Probability of conversion attempt and conversion success, by interviewer assessment in the Netherlands (logistic regression) in ESS 2 N
Reissuing probabilitya
Conversion success probabilityb Same New interviewer interviewer
Refusal by target person ‘Will definitely not cooperate’ ‘Will probably cooperate’, ‘will cooperate’ or ‘will probably not cooperate’ No estimation of cooperation probability
914 345 556
0.85 0.89
0.26 0.24
0.48 0.45
13
0.77
0.57
0.78
Refusal by proxy ‘Will definitely not cooperate’ ‘Will probably cooperate’, ‘will cooperate’ or ‘will probably not cooperate’ No estimation of cooperation probability
262 64 103
0.78 0.92
0.27 0.25
0.50 0.47
14
0.92
0.59
0.80
Refusal before selection of target 251 person ‘Will definitely not cooperate’ 91 ‘Will probably cooperate’ 170 No estimation of cooperation probability 15
0.85 0.94 0.93
0.17 0.15 0.44
0.35 0.32 0.67
a b
n ¼ 1371; G2 ¼ 14.72; df ¼ 4; p ¼ 0.0053. n ¼ 1207; G2 ¼ 44.83; df ¼ 5; p < 0.0001.
cases where a new interviewer was engaged and cooperation was seen as probable, only 32% were converted compared to 45% where the interviewer had received a refusal directly from the target person. The table also shows that there is little difference between cases where refusals were received directly from the target person or by proxy. As in Germany, when refusals were reissued to another interviewer the conversion was more successful. However, the success of refusal conversion is only weakly related to the interviewer’s assessment of cooperation made at the time of first refusal, whereas in Germany this relationship was quite strong. This is primarily because interviewers in Germany were better at converting the ‘will probably cooperate’ cases than in the Netherlands. Only in the few cases within each category, where interviewers had not estimated future cooperation, was
REFUSAL CONVERSION IN THE ESS
181
there a higher success rate, but the number of cases is too small to enable conclusions to be drawn. 7.3.2.5 Reasons for initial refusal and probability of success in the Dutch sample The relationship between the interviewer’s estimation of further cooperation and the reasons for refusal as recorded by the interviewer (either from the target person or the proxy) is more or less in line with the German findings in correspondence analysis. When reasons such as ‘no time’ or ‘not interested’ are given by the target person, Dutch interviewers estimate that the target person will probably cooperate in the future, as do their German counterparts. Similarly, when target persons state that surveys are a ‘waste of time’ or that they ‘never do surveys’ or that it ‘interferes with my privacy’, Dutch interviewers again conclude, as do their German colleagues, that the target person will ‘definitively not’ cooperate in the future. 7.3.2.6 Number of attempts and new interviewer Another key difference between the Dutch and German experiences relates to the number of attempts made to convert initial refusals, a criterion that has been used to differentiate between refusals who are ‘hard’ and ‘easy’ to convert. The problem with hard refusals is that it is not really possible to identify who is a hard refusal. In the Netherlands, a substantial proportion of those who were expected to be hard refusals did eventually cooperate. In the Dutch sample, 293 respondents agreed to participate after one refusal and 230 respondents were converted after two refusals. In Germany, only 37 respondents were converted after a second refusal. It therefore appears that in the Netherlands a concerted effort was made to convert hard refusals. This might explain some of the differences found in measurement error between various categories of reluctant respondents reported later in this chapter (Figure 7.5). In Germany, therefore, ‘hard’ refusals were less likely to be re-approached. As a result, reluctant respondents who were converted may well differ from the final refusals, since they appear primarily to have been drawn from the ‘softer’ refusals. This may cause problems for the valid estimation of nonresponse bias. The data from the Netherlands and Germany can also be used to provide some indication of the relative success of using the same or a new interviewer for refusal conversion. In both countries, it is clear that reissuing refusals to a new interviewer is much more successful than using the same interviewer again. In the other countries with more than 100 reluctant respondents (Switzerland, Estonia and the Slovak Republic), new interviewers were also more successful than the original interviewers, although in Estonia and the Slovak Republic the usual procedure was to reissue to the same interviewer. These findings are what would be expected on theoretical grounds,
182
REFUSAL CONVERSION
since it is anticipated that more experienced interviewers will often be used for refusal conversion attempts.
7.3.3
Timing of refusal conversion attempts3
We will now consider the impact of elapsed time between the initial refusal to cooperate and the refusal conversion attempt using ESS call record data. Once again, the analysis is focused on ESS 2 data from the Netherlands and Germany, countries in which extensive refusal conversion efforts resulted in substantial numbers of reluctant respondents. In the empirical analysis presented here, only the first refusal conversion attempt is taken into account. This means that 230 Dutch and 37 German respondents who were converted after a second refusal are considered as final refusals for the purpose of the analysis of timing. It should be noted that the ESS contact form in use at the time did not allow a distinction to be made in cases where the same interviewer attempts to convert a refusal as to whether this was the interviewer’s decision or whether it was based on an instruction from the survey organization. Firstly, we try to estimate the elapsed time between the initial refusal and refusal conversion attempts, and then we try to assess which timing conditions are most likely to produce a successful outcome. 7.3.3.1 Elapsed time between initial refusal and first conversion attempt: strategic constraints The fieldwork conditions that determine the possible elapsed time between initial refusal and refusal conversion attempt can be divided into three classes: (1) practical barriers, such as the time left until the fieldwork deadline/deployment of a new interviewer; (2) pragmatic indicators of the extent to which sample units can be considered as hard refusals (an indication that the target person ‘will definitely not cooperate in the future’, as estimated by the initial interviewer); and (3) other information about the contact procedure that may be used as a selection variable for refusal conversion (time until first contact and reasons for refusal). The expected predictors for the elapsed time between an initial refusal and the conversion attempt may accordingly also be classified into three groups: (A) Selection criteria used by the interviewer and/or survey organization: these include the reasons for refusal (‘bad timing’, ‘not interested’ and reasons that are categorized as a ‘negative survey attitude’ – see Table 6.5) and knowledge about the contactability of the target person measured by the number of visits until first contact.
3
The analyses of the elapsed time between any refusal and the conversion attempt, and the timing factors that affect the likelihood of success, are largely based on a paper by Beullens, Billiet and Loosveldt (2009b).
REFUSAL CONVERSION IN THE ESS
183
(B) Pragmatic criteria such as the estimation of future cooperation (‘will definitely not cooperate’ versus the other categories) and refusal by the target person (instead of ‘proxy’ or ‘before selection’). (C) Factors determined by the fieldwork organization, such as the introduction of a new interviewer (c1) and the number of days until the expiry of the fieldwork deadline (c2). Considering elapsed time as a dependent variable in a regression context implies that the variances of the residuals in an ordinary least squares regression decrease as the fieldwork progresses. This may seem obvious, since at the end of the fieldwork period the opportunities to prolong the elapsed time since the initial refusal are rather restricted compared to the beginning of the fieldwork. The variance of the residuals also decreases in the case of a new interviewer, because this decision is taken at a later stage in the fielding period. Therefore, a weighted least squares regression is proposed in order to obtain more efficient estimates (Beullens, Billiet and Loosveldt, 2009b).4 The estimates of the regression are reported in Table 7.4. On average, the elapsed time between the initial refusal and the re-approach is 38 days in Germany and 53 days in the Netherlands, even though the countries had broadly similar fieldwork periods.5 The parameter estimates should be interpreted as deviations from these mean elapsed time intervals. For example, in Germany, if ‘bad timing’ is recorded at the initial refusal, the elapsed time before the refusal conversion attempt decreases by 4.58 days. This is to be expected, since this reason is an indication that a successful interview is likely to be secured at a more suitable moment. The effects of the reasons for refusals are not significant in the Netherlands. The contactability of the target person does not seem to lead to a shortening of the period between initial refusal and the refusal conversion attempt. The interviewer’s projection of future cooperation does, however, have a significant effect on the elapsed time, although this operates in different directions in the two countries. In the Netherlands, in cases where interviewers recorded that the refuser ‘will definitely not cooperate’, there was an extended period of time of almost 13 days before a refusal conversion attempt was made. In Germany, by contrast, the effect operated in the opposite direction and the elapsed period was actually shorter by around four days. In Germany, the period before a conversion attempt was made was also slightly longer when it was the respondent him- or herself who had refused, as opposed to a proxy refusal. Also in Germany, the effect of employing a new interviewer increased the period before a refusal conversion attempt by over two
4
The WLS regression is denoted by the following model: 1=2
1=2
wi ðElapsed timeÞ ¼ wi ðb0 þ b1 X1 þ b2 X2 þ . . . þ bp Xp þ eÞ
ð1Þ
where wi refers to the inverted predicted squared residuals obtained from the OLS variant of this equation. The fieldwork period in Germany was from 26 August to 16 January, whereas in the Netherlands it was from 11 September to 19 February.
5
184
REFUSAL CONVERSION
Table 7.4 The elapsed time (in days) between initial refusal and conversion attempt (weighted least squares regression) in ESS 2 DE Germany
NL Netherlands
Selected refusals for conversion programme, N Intercept (unconditional mean)
1138 37.94
1204 53.37
A. Selection criteria used by the interviewer Reasons for refusal . Bad timing . Not interested . Negative survey attitudea Visits until first contact (log)
4.58 2.82 n/a 0.84
144 1.90 0.62 0.01
B. Pragmatic criteria Estimation: ‘will definitely not cooperate’ Initial refusal by target person (¼yes)
4.40 7.46
12.77 1.03
C. Decisions by fieldwork authorities New interviewer (¼yes) (c1) Time (days) until fieldwork deadline (c2) R2
18.34 0.41 0.43
3 22 0.39 0.26
a
Rarely recorded in Germany; see Table 6.5. p < 0.001; p < 0.01; p < 0.05. Reproduced from Beullens, K., J. Billiet & G. Loosveldt (2009b) The effect of the elapsed time between initial refusal and conversion contact on conversion success: evidence from the 2nd round of the European Social Survey. Quality & Quantity, DOI 10.1007/s11135-009-9257-4.
weeks, whereas this pattern was not observed in the Netherlands. Perhaps the most influential factor is the number of days remaining until the end of fieldwork, with the elapsed time becoming shorter, the later the initial refusal is received during fieldwork. For each day by which the deadline comes closer, the elapsed time reduces by 0.39 days in the Netherlands and by 0.41 days in Germany. We have distinguished between three main factors that impact on the length of the elapsed time between the initial refusal and the refusal conversion attempt (see the subdivisions in Table 7.4). Their respective contributions to the explained variance of the elapsed time are presented in Table 7.5, with the effect of each category shown as an addition to category A. The coefficients indicate that the remaining fieldwork period has the most impact on the elapsed time in both countries. The pragmatic criteria, in terms of the estimation of future cooperation by the original interviewer, are also important in both countries, though they operate in different directions. It therefore appears that the elapsed time between the initial refusal and the refusal conversion attempt can be an important tool to help develop a refusal conversion strategy. The Dutch strategy of waiting longer (almost 13 days) before the refusal conversion attempt for hard refusals is a good example of this (although this was not observed in Germany) (Beullens, Billiet and Loosveldt, 2009b).
REFUSAL CONVERSION IN THE ESS
185
Table 7.5 The coefficient of determination (R2) of the elapsed time between initial refusal and follow-up attempt, regressed by different clusters of covariates in ESS 2.a Model covariates (A) (A) (A) (A) (A)
þ þ þ þ
(B) (c1) (c2) (B) þ (C)
DE Germany
NL Netherlands
0.01 0.02 0.14 0.34 0.43
0.03 0.11 0.03 0.17 0.26
a
Categories are explained in Table 7.4. Reproduced from Beullens, K., J. Billiet & G. Loosveldt (2009b) The effect of the elapsed time between initial refusal and conversion contact on conversion success: evidence from the 2nd round of the European Social Survey. Quality & Quantity, DOI 10.1007/s11135-009-9257-4.
7.3.3.2 Effect of the elapsed time on conversion success A refusal conversion programme can tell us more than simply whether or not a successful interview was achieved. Some additional ineligible cases will be determined as a by-product of these procedures. A new interviewer may discover that the initial refusal should be classified as ‘not able to cooperate’ because of language problems; or new reasons for refusal may be given, and the new interviewer may give a different estimate of success at a future contact. Here, we focus on the question of whether the conversion attempt resulted in an interview or in a second refusal, and again explore the relationship with elapsed time. Logistic regression analysis has been used on ESS contact form data to examine the effect of elapsed time on conversion success, or in other words the ratio between ‘conversion’ into an interview versus receiving a ‘double refusal’. The number of converted plus double refusals in the two countries (i.e. all of the initial refusals) is shown at the top of Table 7.6. The global (unconditional) success rates are also presented at the top of the same table.6 As noted earlier in this chapter, the rather scarce previous research we have found does not indicate a clear relationship between the length of the elapsed period before the refusal conversion attempts and final outcomes. Triplett, Scheib and Blair (2001) and Triplett et al. (2002) found that conversion success grows rapidly during the first seven days but stabilizes somewhat thereafter. As expected, the success curves differ between refusals by target persons and by proxies, and between hard and soft refusals (as estimated by the initial interviewer). Allowing interaction terms with regard to these two variables therefore seems to be appropriate. Because of the possible curvature in the relationship between the elapsed period and the conversion success,
6
As we are focusing on the elapsed time between first refusal and the outcome of a subsequent refusal conversion attempt, a second refusal is always considered an unsuccessful attempt, even when this turns into cooperation at a later occasion.
186 Table 7.6
REFUSAL CONVERSION Success in refusal conversion procedure (logistic regression) in ESS 2.a
Selected refusals for conversion programme, N Unconditional success rate (%) Reasons for refusal . ‘bad timing’ . ‘not interested’ . Negative survey attitudeb Interviewer estimation: ‘will definitely not cooperate’ Initial refusal by target person (¼yes) Visits until first contact (log) New interviewer (¼yes) Estimate for performance by follow-up interviewer (higher values correspond to better cooperation rates, see Appendix) Time until fieldwork deadline at initial refusal Elapsed time since previous refusal (days) Elapsed time since previous refusal (days)2 Elapsed time (days) interviewer estimation: ‘will definitely not cooperate’ Elapsed time (days) refusal by target (¼yes) R2 (Nagelkerke) in the complete model R2 (Nagelkerke) in the simple model
DE Germany
NL Netherlands
923 49.19
1126 25.89
0.719 0.914 n/a 0.257 3.000 1.051 1.284 2.054
0.887 0.527 0.844 1.020 1.259 0.923 1.462 4.221
1.020 1.031 0.9997 1.010
1.020 1.020 1.000 0.990
1.000 0.28 0.10
1.010 0.20 0.03
a Logistic regression with the ratio conversion/double refusal as dependent variable, the elapsed time between initial refusal and conversion attempt as predictor, and a number of exogenous variables as covariates (odds ratios). b Rarely recorded in Germany; see Table 6.5. p < 0.001; p < 0.01; p < 0.05. Reproduced from Beullens, K., J. Billiet & G. Loosveldt (2009b) The effect of the elapsed time between initial refusal and conversion contact on conversion success: evidence from the 2nd round of the European Social Survey. Quality & Quantity, DOI 10.1007/s11135-009-9257-4.
quadratic terms7 are also used in the logistic regression. The dependent variable is the logarithm of the ratio ‘conversion/double refusal’.8 Furthermore, additional covariates should be included in the model, since the elapsed time seems to be determined by some factors that are not recorded on the ESS contact forms. These covariates include all the explanatory variables shown in Table 7.4, as well as an indication of the performance of the follow-up interviewer 7
In order to avoid multicollinearity problems, all contributing variables in multiplicative terms have first been mean-centred. 8 The dependent variable is modelled as follows: pconversion ¼ b0 þ b1 X1 þ b2 X2 þ . . . þ bp Xp ð2Þ log pdouble refusal
REFUSAL CONVERSION IN THE ESS
187
based on interviewer-specific response rates (refusal conversion rates excluded) (see Appendix 7.1). The inclusion of these variables should account for occasional interference from selection bias, strategic decisions or other circumstantial factors. The importance of including the covariates of the elapsed time variables is assessed by the difference in explained variance (Nagelkerke R2) between the (complete) model with covariates and the (simple) model without covariates (only the elapsed time variables in the model). Table 7.6 reports the parameter estimates. In the previous section, the operationalization of the elapsed time variable was calculated as the elapsed time between the initial refusal and the first conversion attempt, regardless of whether or not contact was made. This choice can be justified since the elapsed time is considered as a decision that is taken by the survey organization or the interviewer. The initial refusers, however, have no say in this decision. Conversely, when modelling the conversion likelihood, they cannot be ignored, because they are important (if not the most important) contributors. They are (normally) not expected to be aware of any previous attempts to contact them for refusal conversion where no contact was actually made. For this reason, the elapsed time that will be used as an independent variable in the logistic regression denoted in equation 7.1 (in footnote 8) is constructed as the time interval between the initial refusal and the subsequent contact with the sample unit, irrespective of the result (conversion or double refusal) (Beullens, Billiet and Loosveldt, 2009b). Since the dependent variable is the ratio between successful conversion and a second refusal among all re-approached initial refusers, the parameters express the increase or decrease in this ratio. Parameters that are not significantly different from 1 indicate that there is no effect on the ratio as a consequence of belonging to a category of a predictor, or in the case of a continuous (metric) predictor as a consequence of unit change in this predictor. Parameters of less than 1 indicate a negative effect. Controlling for all effects included in the complete model, the elapsed time between the initial refusal and the subsequent realized contact has a positive effect on the conversion likelihood: increased elapsed times result in better conversion rates. This may indicate that reluctance to cooperate in the survey can be expressed as a timedependent decreasing serial correlation. In Germany, the only significant negative effects on conversion success were for cases where the interviewer noted, at the time of the initial refusal, that the target person would ‘definitely not cooperate’, and a very weak negative effect of the elapsed time since the previous refusal. The strongest positive predictors for success were when the initial refusal was a self-refusal (by the target person), and when the followup interviewer had higher cooperation rates. The finding that self-refusals are more easily converted than proxy refusals in Germany is surprising and was not expected. This relationship operates in the same direction in the Netherlands, but is no longer significant. The only significant negative effect in the Netherlands was for cases where the interviewer had recorded ‘not interested’ at the time of the initial refusal. The largest positive effect in the Netherlands was when interviewers who had performed very
188
REFUSAL CONVERSION
well in the past were selected for refusal conversion attempts. This relationship was almost twice as strong as in Germany. The effects of the elapsed time between the conversion attempt and the prior refusal are small, but differ significantly from 1 because the time variable is measured in days and the standard errors are small. Controlling for all effects included in the complete model, the increase in elapsed time between the initial refusal and the next actual contact has a positive effect on the conversion likelihood: increased elapsed time between the initial refusal and conversion attempts results in better conversion rates in both countries. In Germany, there seems to be some evidence for a degressive increase in conversion success because of the significant and negative quadratic effect of elapsed time, suggesting a stabilizing influence of the elapsed time after a while. This brings further increases in the success rate to an end. Consistent findings across the two countries are observed with respect to the distance to the fieldwork deadline at the initial refusal. Those refusing at a late stage of the fieldwork seem to be less willing to participate at a renewed cooperation request, although of course the refusal conversion attempt probably comes sooner for them than for early refusals. As might be expected, better interviewers convert relatively more refusals than poorer performing interviewers. However, one of the shortcomings of the ESS contact forms is that the contact data cannot tell whether the decision to attempt refusal conversion, and its timing, depends on the survey organization or on the interviewer (Beullens, Billiet and Loosveldt, 2009b). Particular attention should be paid to the difference in explained variance between the simple models with only elapsed time as a predictor and the complete models. The Nagelkerke R2 values in the simple model are 0.10 in Germany and only 0.03 in the Netherlands. Stepwise inclusion of covariates suggests that the change in the amount of explained variance is due mainly to the inclusion of a new versus the original interviewer. The net effect of using new interviewers on success rates is not significant in the complete model, although new interviewers seem to be more successful. It has to be borne in mind that the decision to pass refusal conversion to new interviewers normally increases the elapsed time. When estimating the effect of the elapsed time on conversion success, it is important to be aware of the circumstantial factors and tactical choices that coincide with the elapsed time (Beullens, Billiet and Loosveldt, 2009b).
7.4 Refusal Conversion and Data Quality High response rates may not always lead to better data quality. An overview of studies on refusal conversion and data quality suggests two key questions. The first is whether refusal conversion improves the quality of the obtained sample in terms of representativeness vis- a-vis the total universe from which the sample was drawn. The second is whether refusal conversion increases measurement error compared to the data collected before refusal conversion.
REFUSAL CONVERSION AND DATA QUALITY
7.4.1
189
Refusal conversion and sample representativeness
Before trying to examine whether the sample distributions are closer to known population distributions after refusal conversion, the preceding stage is to examine whether refusal conversion makes any difference at all in terms of socio-economic distributions. Five countries in ESS 2 had more than 100 reluctant respondents obtained via refusal conversion, sufficient to allow this question to be examined. These are Switzerland (165), Germany (492), Estonia (200), the Netherlands (523) and the Slovak Republic (105). Two countries had about 7% reluctant respondents in the final achieved sample (Switzerland and the Slovak Republic); the other three countries had more than 10%. The response distributions for sex, age, education, level of urban environment and job activity are compared before and after refusal conversion. In previous research it was sometimes found that distributions of age, sex and education were altered by refusal conversion. We also look at ‘single males’,9 because it was found that the distribution of this variable was also affected by refusal conversion (see Stoop, 2004, 2005). Table 7.7 shows the differences in distributions among a number of social background variables between the samples of initial respondents and the complete samples after refusal conversion. There are very few statistically significant differences between the sample obtained before and after refusal conversion. In three of the five countries, there are no statistically significant differences (Switzerland, Germany, Slovak Republic) at all. Even allowing for nonsignificant differences of more than 1 percentage point only adds a few more examples. The proportion of males in the Netherlands drops by 2.2 percentage points, showing that refusal conversion increased the number of females in the sample; among the cooperative respondents 55.3% were female, but among the converted refusals this rose to 62.9%. It appears to be easier to persuade, or reapproach, women than men to cooperate in the survey after a prior refusal, which is in line with the previous studies discussed earlier. The proportion of less-educated respondents also declines after refusal conversion by –1.0%, although this is not significant. Previous studies have found statistically significant differences in the Netherlands on this variable (Stoop, 2005). In Estonia, the percentage of respondents with a lower-secondary education is smaller in the final sample after refusal conversion, dropping by 2.6 percentage points. This suggests that reluctant respondents are more likely to have completed higher-secondary education. Finally, in Germany the proportion of urban respondents increases after refusal conversion. We will return to these results in Section 8.3.2. The next key question is whether the samples after refusal conversion are closer to the population distribution than before refusal conversion. Unfortunately, we only have reliable statistics for age and gender distributions.10 In Germany, Estonia and
9
‘Single male’ is operationalized as male over 25 years of age without a partner. These statistics are available on the ESS data web site (http://ess.nsd.uib.no/index.jsp? year¼2005andcountry¼BEandmodule¼documentation) and in a report on weightings (Vehovar and Zupanic, 2007). The distributions of education level are not comparable with our education variable. 10
190
Table 7.7
Differences in background variables between samples of initial respondents and complete final samples in ESS 2a Difference final – before conversion (%-points)
Variable
a b
DE Germany
EE Estonia
NL Netherlands
SK Slovak Republic
0.04 0.2 0.1 0.1 0.1 0.3 0.1 0.2 0.4 0.0 0.0 0.1 0.1 0.3
0.58 0.7 0.8 0.9 0.3 0.4 0.0 0.6 0.6 0.0 0.5 0.5 0.4 1.1
0.22 0.5 0.8 0.7 0.4 0.5 0.7 2.6 0.6 0.2 0.4 0.5 0.8 0.1
0.08 0.1 0.3 2.2 0.8 0.4 1.0 0.6 0.3 0.2 0.7 0.9 0.6 0.9
0.25 0.4 0.2 0.1 0.6 0.4 0.1 0.7 0.6 0.2 0.0 0.0 0.4 0.1
Differences that exceed the 0.95% confidence limits computed in the final sample are in bold. Differences mean age in years.
REFUSAL CONVERSION
Mean ageb % 15–24 years % >65 years % male % single man (>25 years) % single man (<34 years) % lower education % lower secondary education % higher secondary education % higher education % never paid job % ever job % rural % urban
CH Switzerland
REFUSAL CONVERSION AND DATA QUALITY
191
particularly in the Netherlands, the samples before refusal conversion are closer to the known population distributions. This is because women, who were already overrepresented in the initial samples, are more willing than men to cooperate after an earlier refusal. The results are mixed for respondents aged between 15 and 24. The final samples are closer to the population figures for this category in only two countries out of five. In four countries, the final sample for the over-65 group is closer to the population figure. In summary, the results of refusal conversion in countries with a reasonable number of converted cases (100 þ ) are generally disappointing in terms of improving the socio-economic representativeness of the final sample. In some countries, it made no statistically significant difference with regard to key demographics. The limited evidence available also suggests a mixed picture in terms of whether the direction of the change brought the final sample closer to known population statistics.
7.4.2
Refusal conversion and measurement error in the ESS11
Do reluctant respondents reduce data quality by ‘reluctantly’ and carelessly providing data after being converted into respondents? As discussed earlier in this chapter, previous research has examined the accuracy of reports made by reluctant respondents (Cannell and Fowler, 1963; Olson, 2006; Olson, Feng and Witt, 2008), as well as satisficing (Krosnick and Alwin, 1987; Triplett, 2002; Olson, Feng and Witt, 2008) in comparison to cooperative respondents. Since the ESS does not often contain reports of factual past behaviour or events that are verifiable using external data sources, our focus is on satisficing as an indication of lower motivation. There are a number of good indicators for satisficing that can be derived from the ESS, such as an agreeing response style and frequent use of ‘don’t know’. In addition to these respondent measures, it is also possible to use the interviewer evaluations provided at the end of the questionnaire. One of the questions, for example, is ‘Did the respondent try to answer to the best of his/her ability?’ Three other questions in the interviewer questionnaire can also be used to assess the extent of response difficulty (or, conversely, ‘easiness’) experienced by the respondent.
7.4.2.1 Indicators of satisficing, low motivation and response difficulty Is it possible to find examples of higher measurement error among the converted refusals compared to the cooperative respondents? To answer these questions we rely mainly on studies by Kaminska and Billiet (2007a,b), which analyse the data from ESS 1 and ESS 2. Data from Germany and the Netherlands are once again used because the large number of reluctant respondents in the samples (more than 400) 11
The work on measurement error among reluctant respondents was carried out by Olena Kaminska as part of a PhD project at Gallup Research Center (Lincoln, Nevada, United States), under the supervision of A. McCutcheon, and in cooperation with J. Billiet.
192
REFUSAL CONVERSION
makes it possible to distinguish between two different kinds of reluctant respondents, those who were easy and those who were hard to convert. The distinction between these two groups of converted refusals is based upon the investment that was needed in order to persuade those respondents who initially refused (Billiet et al. 2007, p. 146). Easy-to-convert refusals are original refusers who decided to cooperate after one new attempt. Hard-to-convert refusals are refusers who decided to participate only after several attempts, or after special incentives had been used in order to persuade them. The analysis is limited to ‘self-refusals’, since previous research suggests that proxy refusals do not differ from cooperative respondents in terms of data quality (Safir et al., 2002; Triplett, 2002).12 There are 428 converted self-refusals in Germany and 473 in the Netherlands in ESS 2. Cooperative respondents (immediate respondents) are compared with the reluctant respondents using the evaluation of the respondents’ task performance, and other ‘objective’ measures of satisficing. Box 7.1 shows the indicators that were used and describes how they were measured (see also Greenleaf, 1992a,b,; Baumgartner and Steenkamp, 2001). 7.4.2.2 Subjective indicators Before examining these objective measures of satisficing, we will first consider the interviewers’ evaluations of the respondents they interviewed. Their evaluation of the respondent’s effort was recorded by means of a set of questions completed by the interviewer (ideally) shortly after the interview took place (see Box 7.1). The items measure the interviewer’s view about the frequency of requests for clarification, the frequency of reluctance to answer questions and the frequency with which the respondent appeared to understand the questions. A latent variable ‘response easiness’ with good internal consistency13 was measured using this information. The scale is scored in such a way that respondents achieve higher values on a derived 10-point scale the less they ask for clarification and find the questions difficult and the more they understand the questions. Conversely, the lower the scores on the scale, the more difficulty they have in answering the questions. Another variable measures the interviewer’s perception of respondent effort during the interview. This allows a distinction to be made between ‘high-effort’ respondents (‘very often answering to the best of his/her ability)’, ‘middle-effort’ respondents (‘often answering to the best of
12
The classification of respondents as reluctant respondents is not self-evident, since it is possible that some converted refusals are actually cooperative (mainly in the case of proxy refusal, but also because of interviewer behaviour), and even that some initial respondents are reluctant. For this reason, several operationalizations were tried out. Kaminska (2009) used the criterion of proxy or self-refusal and some questions in the interviewer’s evaluation of the respondent, and she applied latent class analysis in order to classify the respondents into initial respondents and reluctant respondents. The criterion of ‘converted refusal’ then plays a less crucial role in the definition of ‘reluctance’. 13 The three indicators have factor loadings in excess of 0.63 in the two countries that are considered in this section. The internal consistency is sufficient, since Cronbach’s alpha is 0.70 in DE and 0.65 in NL.
REFUSAL CONVERSION AND DATA QUALITY
193
Box 7.1 Objective Indicators of Satisficing and Interviewer Evaluations Available from the ESS Satisficing indicator Straight-lining
Easy answering Agreeing
Don’t know (DK)
Response easiness Clarification needed
Reluctant to answer
Understand questions
Effort Best ability
Measurement The number of identical response categories selected among a large set of items with different response scales. The number of extreme and middle responses given to questions using 11-point response scales. The number of times respondents agree or strongly agree with Likert-type items (5-point scales varying from ‘agree strongly’ to ‘disagree strongly’). The number of ‘don’t know’ answers was counted among all items, except those used for straightlining, agreeing and easy responses. Questions Did the respondent ask for clarification on any questions? Never, Almost never, Now and then, Often, Very often, Don’t know Did you feel that the respondent was reluctant to answer any questions? Never, Almost never, Now and then, Often, Very often, Don’t know Overall, did you feel that the respondent understood the questions? Never, Almost never, Now and then, Often, Very often, Don’t know Question Did you feel that the respondent tried to answer the questions to the best of his or her ability? Never, Almost never, Now and then, Often, Very often, Don’t know
his/her ability’) and ‘low-effort’ respondents (‘never, almost never or only now and then answering to the best of his/her ability’). It should be borne in mind that the answers to these questions are subjective interpretations on the part of the interviewer. In addition, the interviewer had prior information about the respondents’ earlier refusal(s) that were recorded on the contact form. This means that these subjective indicators may have been influenced by the
194
REFUSAL CONVERSION
kind of respondent (cooperative, reluctant), with interviewers perhaps giving reluctant respondents more negative ratings. Some previous research (Loosveldt, 1999) has shown that interviewer evaluations correlate with more objective measurements on data quality. However, using data from ESS 3 it was found that the reliability and validity of this interviewer assessment instrument has to be questioned. Interviewers appear to have an overall opinion about the respondent that they use to classify their response behaviour, and may not all follow the same logic when they assign an evaluative score to respondents. There are indications that interviewers sometimes base their evaluations on irrelevant respondent information (Beullens, Symons and Loosveldt, 2009c). It is thus possible that there is a substantial interviewer effect. As Figure 7.4 shows, in ESS 2 fewer respondents are classified as ‘high effort’ in the DutchsamplethanintheGermansample.Thisdoesnotnecessarilymeanthatthequalityin the Netherlands is lower than in Germany, since this evaluation depends on the expectations of the interviewers. In the Netherlands, 4.3 times more respondents are considered as ‘high-effort’ respondents among the cooperative respondents than among the hard to convert reluctant respondents (p < 0.01). The number of ‘high effort’ respondents is 4.3 times higher among the ‘easy to convert’refusals than among the ‘hard to convert’. This is in linewith what was expected from satisficing theory. The differences operate in the same direction in the German sample but are far less pronounced. The mean scores on the 10-point response easiness scale are shown in Figure 7.5. The ‘hard to convert’ reluctant respondents score significantly lower than the cooperative respondents and the easy to convert reluctant respondents in the two countries (p < 0.001), which means that they appear to have more difficulty with the 100 90 80
77.1 72.8
69.8
70 60
54.6
51.4
%
50 40 30 20
16.2
10 0
Germany Cooperative
Easy to convert
The Netherlands Hard to convert
Figure 7.4 The percentages of respondents making a ‘high effort’ according to the interviewers’ evaluations by kind of respondent in the Netherlands and Germany in ESS 2. The sizes of the subgroups are as follows: cooperative respondents – DE 2370, NL 1358; easy to convert – DE 455, NL 293; hard to convert – DE 37, NL 229
REFUSAL CONVERSION AND DATA QUALITY
195
10 9
8.22
8.26
7.99
8 6.81
7
7.83 6.73
6 %
5 4 3 2 1 0
Germany Cooperative
Easy to convert
The Netherlands Hard to convert
Figure 7.5 Mean scores on the latent variable response ‘easiness’ by kind of respondent, in the Netherlands and Germany in ESS 2. The sizes of the subgroups are as follows: cooperative respondents – DE 2370, NL 1358; easy to convert – DE 455, NL 293; hard to convert – DE 37, NL 229 interview according to the interviewers. This is in line with the literature, which suggests that response quality might be lower among reluctant respondents recruited via refusal conversion. However, it may be that this conclusion is incorrect because of differences in the way interviewers code their assessment of respondents, perhaps assuming that reluctant respondents have more difficulty. The data structure of the interviewers’ evaluation of respondents’ effort and response easiness (or difficulty) is multilevel, since subsets of respondents are evaluated by the same interviewers, which might result in a high intraclass correlation.14 Interviewers’ evaluations may play a role in the measurement of both the independent variable (kind of respondent) and the dependent variables (respondent effort and response difficulty). Moreover, in a number of cases new interviewers are deployed to visit refusals, which might then also result in inter-interviewer differences. In order to investigate whether the reluctant respondents are really less engaged respondents, regardless of the differences in evaluation between interviewers, multilevel regression was applied, with ‘the interviewer’ specified as a higher-level variable. In the model, the original five-point effort variable and the 10-point response easiness scale are the dependent variables. Due to skewness, logarithmic transformation was applied.15
14 This analysis is inspired by the PhD project by Olena Kaminska, who investigated the relationship between reluctance and measurement error (satisficing in particular) for ESS 3 (see Kaminska, 2009). 15 5 11 Effort ¼ ln 6Effort and Difficult ¼ ln Difficult þ1 . We wish to thank Koen Beullens for testing the models.
196
REFUSAL CONVERSION
Table 7.8 Parameter estimates of the effects of kind of interviewer (predictor) in fixed effects models (Model 1) and random effects models (Model 2) on interviewer evaluations of respondents’ effort and response easiness in ESS 2a N
Model 1
Model 2
95% confidence limits Models for ‘Effort’ DE Cooperative (intercept) Soft refusal (contrast to intercept) Hard refusal (contrast to intercept) NL Cooperative (intercept) Soft refusal (contrast to intercept) Hard refusal (contrast to intercept)
2323 449 34 1356 293 230
1.4249 1.4522 1.4459 0.0649 0.0038 0.0139 0.2962 0.0671 0.1160 1.3438 1.3908 1.3473 0.2849 0.1733 0.2188 0.8503 0.7256 0.6807
Models for ‘Response easiness’ DE Cooperative (intercept) 2323 1.5658 1.6160 1.6111 Soft refusal (contrast to intercept) 449 0.0276 0.0975 0.0532 Hard refusal (ccontrast to intercept) 34 0.4741 0.0575 0.1390 NL Cooperative (intercept) Soft refusal (contrast to intercept) Hard refusal (contrast to intercept) a
1356 1.4454 1.5085 1.4748 293 0.0669 0.0832 0.0110 230 0.5122 0.3454 0.3796
Higher values correspond to more effort and easier to interview. p < 0.0001; p < 0.001; p < 0.01; p < 0.05.
Two models were fitted using linear mixed models (multilevel modelling). The transformed variable ‘effort’ and the transformed variable ‘response easiness’ were respectively specified as the dependent variables. The kind of respondent (‘cooperative’, ‘easy to convert’ or ‘hard to convert’) was specified as the fixed factor variable. In the second model, ‘the interviewer’ was included as a random factor variable. Both models also include the age and education level of the respondent as fixed effects factors (covariates), though these are not shown in Table 7.8. Two different models allow a focus on the effect of kind of respondent before (Model 1) and after (Model 2) inclusion of the random factor (the interviewer). Higher values correspond to more effort and more response easiness (or lower response difficulty). The expectation is that the parameters of the reluctant respondents should be negative. The parameter estimates for ‘effort’ suggest that the contrasts between cooperative and reluctant respondents hardly change once the effect of the interviewer is introduced in Model 2. All Model 2 estimates remain within the confidence intervals of the Model 1 interpretation, although the contrasts of interest tend to weaken. It seems that interviewers do not base their assessment on previous information about the case, but instead make a reasonable effort to reflect on the interview. In Germany, in particular, the contrast between hard refusals and cooperative respondents (which
REFUSAL CONVERSION AND DATA QUALITY
197
was initially significant at a ¼ 0.0023) seems to reduce to a less differentiating level. The contrast under Model 2 (0.116) is only significant at a ¼ 0.0344. It should be noted that the random interviewer effects (intraclass correlation) account for 28% (DE) and 22% (NL) of the variance in the case where no fixed effects are involved, reducing to 23% and 19% when all fixed effects (individual-level covariates) are included. With regard to response easiness (or difficulty), the inclusion of random effects attributable to interviewers also tends to mitigate the contrast between hard refusals and cooperative respondents, suggesting that some interviewers use prior information about the case (for example, about the prior refusal) when making their final post-interview assessment more than others. Again in Germany, the Model 2 estimate for hard refusals is only significant at a ¼ 0.15 (p ¼ 0.0124 under Model 1). Intraclass correlations are 0.24 (DE) and 0.16 (NL), irrespective of the inclusion of fixed effects. 7.4.2.3 Objective indicators Do we find differences between reluctant and cooperative respondents on the more objective indicators of satisficing (see Box 7.1)? Firstly, it is necessary to consider the relationships between the more objective indicators and the ‘subjective’ interviewer’s evaluation. We rely here on a one-way ANOVA of the relationship that made use of the data of the first round of the ESS (Kaminska and Billiet, 2007b). The indicators are measured as proportions of specific response types to a preselected set of ESS items. Table 7.9 shows the relationships between the objective and subjective indicators. In the Netherlands, negative interviewer evaluations were clearly related to greater ‘easy answering’ (the number of extreme and middle responses given to questions using 11-point response scales) (F ¼ 17.8; p < 0.01). This relationship was not found in the German sample. The expected relationships between negative interviewer evaluations and providing ‘straight-line responses’ were found in both countries, but were weaker in the Netherlands (F ¼ 2.27; p ¼ 0.10) than in Germany (F ¼ 6.51;
Table 7.9 Relationships between objective indicators of satisficing and interviewer evaluation in ESS 1 Interviewer evaluations linked to . . .
NL Netherlands
DE Germany
Easy answering Straight-line responses Agreeing Don’t know (DK)
Expected Expected Unexpected Not found
Not found Expected Unexpected Expected
Expected: relationship found in expected direction. Not found: relationship not found. Unexpected: relationship found, but in unexpected direction.
198
REFUSAL CONVERSION 100 90 80 70 60
%
50 37.8
40 30
24.8
23.6
27.4 21.9
22.5
20 10 0
Germany Cooperative
Easy to convert
The Netherlands Hard to convert
Figure 7.6 The percentages of ‘DK’ answers by kind of respondent in Germany and the Netherlands in ESS 2. The sizes of the subgroups are as follows: cooperative respondents – DE 2370, NL 1358; easy to convert – DE 455, NL 293; hard to convert – DE 37, NL 229 p < 0.01). Somewhat surprisingly, the relationship operates in the opposite direction for ‘agreeing’ responses, since high-effort respondents are more likely than low- and middle-effort respondents to agree with items in the two countries (F ¼ 24.18; p < 0.1 in NL: and F ¼ 7.39; p < 0.01 in DE). The relationship between ‘don’t know’ responses and estimated effort is completely absent in the Netherlands, but is significant in the German sample, where low-effort respondents are more likely to admit that they do not know the answer (F ¼ 23.22; p < 0.01) (Kaminska and Billiet, 2007b). In terms of the mean number of answers given to the four different satisficing categories by cooperative, easy-to-convert and hard-to-convert respondents, there are almost no significant differences16 on the questions that were chosen for analysis. Only two of the indicators show significant differences, and even then not always in both countries. This is the case for ‘don’t know’ (DK) and ‘agreeing’. For DK, the differences are expressed as percentages and the test is performed by logistic regression. The results are shown in Figure 7.6, which demonstrates much more pronounced differences in Germany than the Netherlands in ESS 2. The differences remain after controlling for age, education and the interviewer’s evaluation of the 16
It is recommended to look at the differences as measures of possible measurement error and not at the absolute mean frequencies or percentages in the three groups. This is because we do not know the extent to which the mean numbers or percentages as such really reflect satisficing or ‘true’ answers. We do, however, know that the differences between the three categories that operate in the expected direction are indications of measurement error among the reluctant respondents. But even then, these indicators are ambiguous. The DK indicator might be the most valid measurement of reduced motivation, although DK may also be a true value for some respondents and not an indication of satisficing.
DISCUSSION AND CONCLUSIONS
199
respondent’s effort. In terms of agreeing the differences did not operate in the expected direction – in other words, cooperative respondents were more likely than reluctant respondents to agree, although this difference was only significant in the Netherlands for hard-to-convince refusals (Kaminska and Billiet, 2007a). In summary, with the exception of ‘don’t know’ (DK) answers, there are virtually no indications that converted refusals are satisficing more than the initial respondents in the Netherlands and Germany. However, the interviewers’ evaluations of respondents’ effort and response difficulty differ between cooperative and ‘hard to convert’ reluctant respondents.17 A final reflection deals with the unexpected results concerning ‘agreeing’.18 The finding that cooperative respondents in the Netherlands seem to be somewhat more likely to endorse the survey items than reluctant respondents, especially the hard-toconvert among them, demands further reflection. We expected the opposite, and we also expected low-effort respondents to be more likely to agree, but this was also not the case. According to the theory, satisficing respondents do not search their memories for all possible arguments; they skip the retrieval and judgement stages of questionanswering and agree with a statement as soon that they have found one reason to agree; when they fail to find such a reason, they disagree (Krosnick, 1991). How, then, are we to interpret the finding that cooperative respondents are somewhat more likely to agree with the statements? One interpretation is that the cooperative respondents who more readily agreed to participate in the survey are also more willing to agree with statements (survey items). This is in line with the finding that acquiescence is a fairly stable characteristic of respondents over time (Weijters, 2006; Billiet and Davidov, 2008). This would mean that a response style and acquiescence must be distinguished from satisficing. Style is a trait, or a characteristic, of the respondent that does not depend on the questionnaire and on actual motivation. Of course, certain questions are needed in order to be able to measure style in the case of acquiescence agree–disagree items and a balanced set of items (Billiet and McClendon, 2000), but the trait may still be there even if it cannot be measured. Satisficing, by contrast, is a temporary state that is subject to change if motivation changes.
7.5 Discussion and Conclusions A number of ethical aspects of survey research were put forward in Chapter 2 (Section 2.7), including the need to respect target respondents. Target persons should not be harassed and should be aware that their cooperation is voluntary. The question could be asked as to whether refusal conversion – that is, re-approaching target persons who have refused to participate on an earlier occasion – is not irreconcilable with this principle. Some countries, or survey organizations in some 17
The conclusion can be affected by the interviewer’s prior knowledge, insofar as we cannot completely control for it. 18 The same unexpected result was obtained with ESS 1 data.
200
REFUSAL CONVERSION
countries, are very reluctant to attempt refusal conversion because they feel that ‘no means no’ and that reissuing a refusal could be considered harassment. On the other hand, empirical evidence suggests that many initially reluctant target persons – as initial refusals are also called – do not feel harassed by a second request. One piece of evidence is that, as discussed in this chapter, the success rate of refusal conversion is high in a number of countries. A second piece of evidence comes from Dutch research by Stoop (2004, 2005), where an extensive programme of refusal conversion was followed by re-approaching persistent refusals in a follow-up study. Of those approached in this follow-up study, 70% cooperated, some of them without question, hesitation or difficulty. A third piece of evidence comes from an overview of Schnell (1997, p. 190), who concludes from his study of the empirical literature that there is no empirical evidence for the existence of a hard core of adamant refusers who never cooperate in surveys. He attributes the belief in such a hard core of refusals to a fundamental attribution error, as observers tend to ignore situational factors that might explain why persons do not respond in a particular case. The decision to participate in a survey is often taken on the spur of the moment (see Chapter 2), which could mean that at the second attempt the negative first decision might easily turn into a positive decision. Finally, the ESS nonresponse surveys show that many nonrespondents are willing to cooperate to a certain extent. It could therefore be assumed that making great efforts to interview specific persons (repeatedly travelling to their home address, sending a new interviewer following an initial refusal) is not always seen as harassment, but sometimes as a sign of the seriousness of a study and of a real interest in the opinion of the target person. This chapter has looked at the use of refusal conversion as a tool to improve response rates and data quality in final survey samples. The use of refusal conversion as a tool to detect bias is covered in the next chapter. After an overview of findings from previous research, most of the chapter was devoted to empirical findings from the ESS. It appears from previous research that there is evidence that refusal conversion is accounting for an increasing proportion of finally obtained survey response. There is also some evidence of this in the ESS, with a larger number of countries achieving over 100 converted refusals in ESS 3 compared to earlier rounds (for the percentages, see Table 7.1). The success rate of refusal conversion in the ESS differs across countries. This is partly a consequence of the initial refusal rate, partly of the percentage of initial refusals who are re-approached, and partly of the selection of who to re-approach. Some countries reissued almost all refusals, while others reissued none, with a range of 3% to 92%. The mean reissue rate of 30% therefore hides vast differences in the use of this technique. In the end, however, the impact of refusal conversion on final response rates is small in most ESS countries. Previous research suggests that there needs to be a time lag between the initial refusal and a refusal conversion attempt in order to increase the probability of successful conversion, although the pattern is far from consistent across all surveys. Data from the ESS provide further evidence that longer elapsed times are helpful in
DISCUSSION AND CONCLUSIONS
201
increasing the success of refusal conversion and provide further weight to the idea that in many instances a refusal to participate in a survey is a time-dependent phenomenon. In addition, analysis of ESS contact form data clearly confirms that it is better to use a new and more experienced interviewer in the conversion attempt, since they have a higher likelihood of success. There is clear evidence that refusal conversion can be used as a tool for increasing response rates and obtaining more cases during survey fieldwork. However, it is far from clear that this is the most effective way to achieve such an increase. More research on the relative benefits of increased contact attempts versus refusal conversion is required. What undermines the case for refusal conversion is the evidence of its impact. Previous research presents an inconsistent picture, but the limited evidence that is available suggests that the impact of refusal conversion on improving the representativeness of the achieved sample is questionable and sometimes fails to make the sample more representative. Data from the ESS suggest that refusal conversion makes little significant difference to the profile of the final achieved sample, and that when it does there is variability in terms of whether it makes the samples more or less representative. However, this analysis is limited to no more than five countries and often just two, so is therefore limited in its generalizability. What this perhaps does reflect is the greater likelihood that fieldwork organizations and interviewers will focus reissue efforts on the soft refusals. This makes sense if the goal is to increase the overall response rate, since the conversion probability is significantly higher for this group than for hard refusals. However, such a decision may perhaps result in the conversion of more cases that are similar to already cooperating respondents, which may minimize the ability of refusal conversion to reduce nonresponse bias. Future research that looks into the impact of focusing refusal conversion on hard refusals would be welcome, and may suggest a more cost-effective use of refusal conversion resources. In terms of data quality, previous research suggests that those who refuse in person but are then converted into respondents provide poorer-quality data than initially cooperative respondents. However, this was not found universally and there are also some examples of the opposite pattern. Data from the ESS show virtually no evidence of satisficing (with the exception of providing ‘don’t know’ answers, which by itself is not a convincing indicator of satisficing), and the data from more subjective indicators of response difficulty appear unreliable. The ESS therefore suggests that converted refusals, at least those who are being converted at present, do not provide poorerquality data. Unfortunately, there are two serious problems from the perspective of comparative cross-national research that make all the analysis referred to difficult. The implementation of and amount of effort put into refusal conversion varies considerably from country to country and the success rate also demonstrates substantial differences. This makes comparisons of obtained samples and discussion about adjustments for nonresponse bias using refusal conversion information difficult if one wishes to compare all countries.
202
REFUSAL CONVERSION
In the final analysis, it appears that a new approach to refusal conversion may need to be adopted in an era when reducing nonresponse bias is considered the ultimate goal, rather than simply increasing the overall response rate. The current emphasis on attempting to convert the most promising cases therefore needs to be reassessed. More research to tell us what the impact would be of converting the hard refusals should be the next step.
Appendix 7.1 Interviewer Variance in Cooperation Rates Interviewer variance seems to be an important factor in explaining the cooperation (and contact) rate in face-to-face household surveys (Pickery and Loosveldt, 2002). Some interviewers are better at realizing unit response than others. Biemer and Lyberg (2003, pp. 149–87) relate this variation in response rates to the skills and tailoring abilities of the interviewers. The interviewer’s age and sex, experience, voice, accent, style, expectations and preferences, attitude and confidence are believed to be predictors of this interviewer variance. The following equation models this interviewer variance: 2 3 b1 6 b2 7 pcooperation 6 7 ¼ g00 þ 6 .. 7 þ eij log 4 . 5 prefusal bk The subscripts ‘cooperation’ and ‘refusal’ refer to the fieldwork outcomes per sample unit, excluding the refusal conversion outcomes. Ineligible sample units, final noncontacts and other nonresponse cases are also excluded. The overall response success is indicated by g00; the interviewer-specific deviations are measured by the vector containing b1, b2, . . ., bk. A serious drawback is that interviewers are usually Table 7A.1 DE Germany N Number of interviewers (k) Fixed effects . Intercept (s.e.) Random effects . Intercept (s.e.) . Residual (s.e.) s.e., Standard error.
NL Netherlands
4575 203
2729 77
0.05
0.02
0.59 (0.10) 0.96 (0.02)
0.22 (0.06) 0.98 (0.03)
APPENDIX 7.1 INTERVIEWER VARIANCE IN COOPERATION RATES
203
assigned to a specific local area, in order to reduce travel costs and simplify fieldwork coordination. As a consequence, it is not clear whether observed differences between interviewers can be attributed to the interviewers themselves (skills, experience, motivation, etc.) or to characteristics of the particular neighbourhood in which they were active (socio-demographics, concentration of apartments, presence of vandalism, litter, safety etc.). It is advisable here to control interviewer effects for ‘area’ effects (Campanelli and O’Muircheartaigh, 1999). Variables that enable such features to be controlled for are included on the ESS contact forms, but due to the relatively high proportion of missing codes, they unfortunately had to be omitted from the analysis. The estimates indicated in Table 7A.1 suggest the presence of interviewer variance with regard to cooperation rates. The residual term in the random effects confirms the appropriateness of binomial variation within each interview. Tables 7.2 and 7.3 reproduced by permission of the authors of the research reports of CeSO, K.U. Leuven.
8
Designs for Detecting Nonresponse Bias and Adjustment 8.1 What is Nonresponse Bias? In the foregoing chapters, we have shown that there are major differences in fieldwork processes and response outcomes between ESS countries. The variation in response rates decreased from round to round because countries that initially had the highest response rates had difficulty maintaining them, while countries that initially had very low response rates obtained higher response rates in later rounds. The substantial differences in response rates between countries raise questions about bias, and especially about differences in bias between countries. This chapter will present the results of studies on bias in the ESS, and will focus on methods for detecting and adjusting for nonresponse bias, including the use of auxiliary data. It is partly based on analyses that are published in Billiet et al. (2009) and Matsuo et al. (2009). Nonresponse is a major threat to the validity of survey research because it can produce bias in the results. As shown in Chapter 2, nonresponse bias is determined by two factors, namely the nonresponse rate and the differences between respondents and nonrespondents (e.g. differences in means and variances): m yr yn ¼ ½y y ð8:1Þ n r m
Improving Survey Response: Lessons learned from the European Social Survey Ineke Stoop, Jaak Billiet, Achim Koch and Rory Fitzgerald Ó 2010 John Wiley & Sons, Ltd
206
DESIGNS FOR DETECTING NONRESPONSE BIAS
In this expression, yn refers to the total sample mean, yr indicates the respondent mean, ym is the nonrespondent mean and m/n is the nonresponse rate.1 Theoretically, the biasing influence of nonresponse is eliminated under two conditions: either the nonresponse rate is zero (m ¼ 0: there are no nonrespondents) or there are no differences between respondents and nonrespondents on the statistic of interest (Couper and de Leeuw, 2003, p. 166).2 In cross-national studies, differences in country means and variances could be due in part to biased results in one or more countries.3 Formula (2) represents the effects of nonresponse on the estimates of the difference in means between country 1 and country 2 (or between rounds 1 and 2 of a survey; see de Leeuw and de Heer, 2002, p. 45): m1 m2 y1r y2r ¼ ðy1n y2n Þ þ ½y y ½y y ð8:2Þ n1 1r 1m n2 2r 2m The difference in estimated bias between two countries is then (Groves, 1989): m1 m2 Bðy1 y2 Þ ¼ ½y1r y1m ½y2r y2m ð8:3Þ n1 n2 It is often implicitly assumed that bias is stable across countries or subgroups. Such an assumption cannot be made a priori, however, since the hypothesis of comparable response rates is very unlikely to be sustained (see Chapter 5). In addition, a number of the topics in the ESS (social participation, political interest and involvement, civic duties) have been found to correlate with survey participation in earlier research (see Section 6.3.1). It cannot be assumed that this correlation is similar in different countries. Therefore, we have to be aware of the possible impact of nonresponse error, not only for simple descriptive statistics such as country means and differences, but also for the estimation of correlations between variables (Couper and de Leeuw, 2003, p. 166) and for the estimation of variances that are used to estimate standard errors. As a working hypothesis, it is to be expected that nonparticipation in the ESS will cause biased estimates. Section 8.2 presents a series of methods to assess nonresponse bias. In Section 8.3, some of these methods will be used to find out if there is nonresponse bias in the ESS and if this bias is identical across countries. 1
Groves provides a more general expression that takes account of the idea that everyone has an unobserved ‘propensity’ to be a respondent or nonrespondent. The sample-based expression (1) does not have an expected value equal to the population expression but, rather, includes a term involving the covariance between the nonresponse rate on the one hand and the difference between respondent and nonrespondents means on the other (Groves, 2006, p. 648). 2 Note that in expression (1) no distinction has been made between the components of nonresponse, such as the refusal rate and the noncontact rate. If one distinguishes these two crucial components, formula (1) takes the following form (Groves and Couper, 1998, p. 12; Heerwegh, Abts and Loosveldt, 2007): yr ¼ yn þ
3
m h i m moth ref nc yr ymref þ yr ymnc þ yr ymoth n n n
Bias can, of course, also have an impact on comparing survey estimates over time.
ð1bÞ
METHODS FOR ASSESSING NONRESPONSE BIAS
207
8.2 Methods for Assessing Nonresponse Bias Groves sets out five methods for assessing nonresponse bias in household surveys. These are: (1) comparing response rates across subgroups in samples; (2) comparing respondent-based estimates with similar estimates from other sources; (3) comparing estimates between subgroups in the obtained samples; (4) using enriched sampling frames with data from external sources; and (5) contrasting alternative post-survey adjustments for nonresponse (Groves, 2006, pp. 654–6). The philosophy behind these approaches is that by obtaining estimates for the missing observations, it is possible to detect bias and adjust the survey estimates for nonresponse bias.
8.2.1
Comparing response rates across subgroups in samples
The easiest approach is to compare response rates across subgroups. This is a rather indirect method, which tends to rely on assumptions derived from a small number of variables. If the zero hypothesis (no difference in response between subgroups) is not rejected, the researcher assumes that there is no nonresponse bias (Groves, 2006, p. 654). Note that in most cases comparison of response rates between subgroups is only possible for a small number of variables that are available for all target persons (sex, age, urbanicity and region). This approach assumes that the only systematic source of nonresponse stems from differences between these subgroups and that other variables only produce random nonresponse (the Missing at Random (MAR) assumption; see Section 2.6.1). However, there is no guarantee that nonresponse does not vary systematically with other more relevant variables for which no information is available in the sampling frames. Another, more specific problem in a cross-national context concerns the differences in sampling designs (see also Section 6.4). This method is most relevant in countries that use individual named samples (population registers) that contain the relevant information about each sample unit. When these individual named samples are used, subgroups can be compared according to background variables such as sex, age category, municipality and region, or some combination of these. This method cannot be used for household and address samples, and its usability is therefore limited for cross-national surveys based on different types of national sampling frames.
8.2.2
Comparing respondent-based estimates with similar estimates from other sources
A second method for assessing nonresponse bias consists in comparing responsebased estimates with similar estimates from other more accurate sources (Groves, 2006, p. 655). This procedure is used in post-stratification (see below). Bias is defined as the amount of deviation between the ‘true’ population distributions and the distributions in the obtained sample. In practice, information is only available for
208
DESIGNS FOR DETECTING NONRESPONSE BIAS
a limited number of variables from sources such as population statistics. It is optimal when the joint distribution of these variables is available. The post-stratification weight of each cell of this joint distribution is then simply the percentage of the population divided by the percentage in the sample in this cell. When only the joint distribution is available for (n 1) variables and the marginal distribution for the nth variable is known, for example, weights can be calculated that best fit both distributions using iterative proportional fitting. This procedure is called raking; the resulting weights are termed ‘raking rations’ (Brackstone and Rao, 1979; Kalton and Kasprzyk, 1986; Bethlehem, 2009). This method can indirectly give an impression of the total bias for a large number of variables by comparing the population-based weighted sample with the unweighted sample, and has been used in the ESS (Meuleman and Billiet, 2005; Vehovar, 2006; Vehovar and Zupanic, 2007). Its advantage in a crossnational context is that, unlike the previous method, no information at the individual level is required in order to compare respondents and nonrespondents. A limitation is that even for a basic demographic variable such as education, equivalent measurements in surveys and official statistics are not always available (Hoffmeyer-Zlotnik, 2005, pp. 223–40).
8.2.3
Comparing estimates between subgroups in the obtained samples
A third method focuses on variation within the survey data itself (Groves, 2006, p. 655). This method covers a range of techniques, some of which are very easy to implement while others require more effort and more funds. The simplest and most straightforward technique is comparison of estimates from early and late respondents to mail or web surveys following several requests and reminders to participate, as part of the ‘Total Design Method’ (Dillman, 1978, 2000). It is assumed that the late respondents who react only after several repeat requests are more comparable to the final nonrespondents than the early respondents who react at the first or second request. An advantage of this technique is that data are available from all questions in the survey and not just about a small number of background variables. However, it is based on the weak assumption that late respondents can be used as a proxy for final respondents. A variant of this method involves planning a survey in several phases, to allow comparison of respondents from the first phase with those from the final dataset (Curtin, Presser and Singer, 2000). This is a particularly useful method for telephone surveys where late respondents, those who required many calls to establish contact, can be classified as difficult to contact. Another type of comparison between groups of respondents is discussed by O’Muircheartaigh and Eckman (2007), who developed a subsampling design in which the fieldwork organization uses all means to try to obtain a very high response (over 95%) among a subsample from a total sample that is, say, five times larger. The remainder of the total sample is treated in the usual way. Differences between estimates of the subsample, which might be considered free of nonresponse bias, and estimations of the regular sample are then
METHODS FOR ASSESSING NONRESPONSE BIAS
209
used to draw conclusions about the direction and amount of bias in all variables in the survey.4 Another variant that is applicable for face-to-face surveys is the study of converted refusals (Smith, 1984; Burton, Laurie and Lynn, 2006; see also Chapter 7). The underlying assumption is that with less field effort the reluctant respondents would have been final refusals, and that with even more field efforts additional refusals could have been converted (Lin and Schaeffer, 1995; Groves and Couper, 1998). The theory that the reluctant respondents can be used to draw inferences about the likely responses from the final nonrespondents has received support from some (Voogt, 2004), but has been refuted by others (Stoop, 2004, 2005). This method offers no direct information about the nonrespondents in a survey. It can be more useful when additional information about the refusals is available (for example, the reason for refusal, or the strength of the refusal). A weakness of this method is that it only produces data related to the refusal component of nonresponse, and sheds no light on the noncontact component. The final variant of the ‘subgroup approach’ is to collect information on core variables, including some nondemographic variables, from the nonrespondents directly. This approach was described in Section 2.6.2 under the heading ‘Information on Core Variables’. With information on core variables, the Not Missing at Random (NMAR) situation can turn into a MAR situation (see Section 2.6.1). Three ways of collecting information on core variables from nonrespondents have been found in the literature. Firstly, additional information from nonrespondents can be collected during regular quality control back-checks; for instance, when checking whether the target persons are real refusals. A very small number of ‘crucial’ questions are asked as part of this process. The idea behind this method is that reluctant respondents are often ready to cooperate when the burden of the questionnaire is very small. A second method, proposed by Bethlehem and Kersten (1985, p. 292), is the ‘Basic Question Procedure’ (see also Bethlehem, 2009). This method was used by Voogt (2004, p. 165) in his study of nonresponse bias in election research in the Netherlands. Refusals were asked to answer a small set of key questions that were included in the weights used to correct nonresponse bias. Related to this is the ‘PEDAKSI’ (Pre-Emptive Doorstep Administration of Key Survey Items) approach described by Lynn (2003a). It implies that the survey interviewer, having made contact with a target person, asks a small number of key survey questions as soon as it becomes apparent that an interview is not going to be achieved at that visit to the address, even though it might still be possible to achieve the interview at a later visit. This is done using a Key Item Form (KIF) (Lynn, 2003a, p. 241).5 In Section 8.3.4 this approach will be called a doorstep questionnaire survey (DQS). 4
Clearly, the intensive subsample could be affected by sampling error, but it might be possible to calibrate this sample and then use it as a ‘gold standard’ for adjusting the complete sample. 5 In a large-scale field test of PEDAKSI in the British Crime Survey, a survey with an 83% response rate, a KIF containing 13 questions, six of them on crime, was completed by 6% of the final respondents and 25% of the nonrespondents (Lynn, 2003a).
210
DESIGNS FOR DETECTING NONRESPONSE BIAS
A third option, which is more difficult to realize, is to organize a new survey with both respondents and nonrespondents from the original survey after a short time period. This method was described nicely by Elliot (1991, pp. 38–40), who stated: ‘One of the most widely used sample-based weighting methods is to follow up all or, more usually, just a sub-sample of initial nonrespondents to a survey, and then weight the respondents to the follow-up to represent all the initial nonrespondents. The method is most commonly used in postal surveys where it is often difficult to obtain a satisfactory response.’ In the follow-up survey, either the original questionnaire can be used, or a shorter version in order to increase the probability of response. The latter is more usual (see Stoop, 2005). It can be targeted at nonrespondents only, or at both nonrespondents and respondents. In the latter case, it is possible to control for differences between context (there is usually a time gap), possible mode effects and effects of question context when a shorter questionnaire is used. In Section 8.3.4 this approach will be called a nonresponse survey (NRS). Both DQS and NRS require that a substantial number of ‘final nonrespondents’ take part. The method also requires that responses to the core questions co-vary substantially with nonresponse.
8.2.4
Enriching the sampling frame data with data from external sources
In some situations, it is possible to match the individual records in a sampling frame with individual records from other sources, providing much more information than was available in the sampling frame alone (Groves, 2006, p. 654). The possibility of finding differences between respondents and nonrespondents on a larger number of variables offers the chance to detect some variables that genuinely co-vary with both nonresponse and the target questions in a survey, and also offers the possibility of providing more effective weighting coefficients, because more relevant variables than in post-stratification studies are used in the predictive models of response propensities. It is usually not possible to apply this method in general surveys, because individual records with extra information that match the sample records are mostly not available, or fall under very strict rules for privacy protection. Nevertheless, this method has been used in some specific studies (see, e.g., Lin and Schaeffer, 1995; Schouten and Cobben, 2006). In face-to-face surveys, interviewers can collect additional information about each eligible target person, including both respondents and nonrespondents. This involves, for example, the interviewer observing and recording information about the specific dwelling and the neighbourhood in general. This information is one type of ‘paradata’. Paradata comprise information that is collected as part of the process of conducting a survey and can in principle be recorded or collected for all cases, respondents and nonrespondents. It is fairly easy to collect paradata in the context of web surveys (Heerwegh, 2003). In face-to-face surveys, paradata can consist both of information collected as part of the normal conducting of the survey (e.g. records of calls) and of special observational data that interviewers collect about all target
METHODS FOR ASSESSING NONRESPONSE BIAS
211
persons. Maitland, Casas-Cordero and Kreuter (2008) used information collected during the process of recruiting sample households to participate in the National Health Interview Survey (NHIS) to find out whether this kind of paradata could be used to adjust for nonresponse. The paradata variables were the effort involved in making contact with the household, occurrences such as barriers to contact and inability to locate a household, and reasons for not participating in the survey. It was found that none of the paradata variables were correlated strongly enough with the survey variables that were of interest (e.g. response outcomes) needed for nonresponse adjustment. One of the reasons for weak correlations might be measurement error related to the collection of paradata. Kreuter and Kohler (2009, pp. 215–18) used the call record data from ESS rounds (see Chapter 3, including Appendix 3.1), analysed the contact sequences and explored whether this information might be useful for nonresponse adjustment. Examples of indicators derived from contact histories are the number of contact attempts, the proportion of noncontacts, the characteristics of contact sequences that might be related to cooperativeness or contactability, the number of different episodes within a sequence, patterns within contact histories, and a distance measure through optimal matching of several characteristics. The results raise doubts about the possibility of using sequence indicators as nonresponse adjustment variables due to low correlations with the survey outcome variable. In this study too, the problem of measurement error in the call record data is mentioned as an explanation for low correlations (Kreuter and Kohler, 2009, pp. 222–3). The use of the ESS paradata method is discussed in Sections 8.3.2 and 8.3.3. Aweaker but more generally applicable variant of the enrichment method consists of matching records at aggregate level with the individual-level records in the sample. When applying this method, it is recommended that aggregate statistical information be taken from the smallest possible area, such as the neighbourhood or sampling stratum from which all cases, respondents and nonrespondents are drawn (Smith, 2009). It is likely that some relevant individual characteristics of the target persons correlate with contextual information at the neighbourhood level, such as social or ethnic composition and the physical condition of the neighbourhood and the housing stock (see Chapter 6). The use of this type of contextual information falls well within even the strictest privacy regulations. The main weakness of this method compared with individual matching is that correlations with the relevant individual variables are not very strong. In addition, the availability and accessibility of these aggregate data will differ across countries, making cross-national applications difficult.
8.2.5
Contrasting alternative post-survey adjustments for nonresponse
Different post-survey adjustment methods can provide different sets of alternatively corrected estimates, all of which aim to measure the same population parameters. These different estimates can be compared with each other and with the unadjusted sample data (Groves, 2006, p. 656). We will briefly present four classes of adjustment methods: weighting, extrapolation, imputation and modelling (Voogt, 2004, p. 133).
212
DESIGNS FOR DETECTING NONRESPONSE BIAS
Weighting involves assigning each observed element, in this case each respondent, an adjustment weight. Weighting adjustments are derived from auxiliary information available for the whole population or for the total gross sample (nonrespondents included). Broadly speaking, there are three ways of constructing weights: population-based, sample-based and probability of response weights (Brehm, 1993, pp. 118–19). Post-stratification (PS), or stratification after selection, is a well-known and frequently used population-based weighting technique that can improve the precision of estimates in the event of full response where only sampling fluctuations occur (Bethlehem, 2002, p. 277). PS may be defined as the use of stratified sample estimators for unstratified designs. The idea behind PS is to divide the population into homogeneous strata according to a number of variables with known population distributions. If all elements within a stratum resemble each other, stratum estimates are not very biased. In order to increase the precision, strata should ideally be constructed using auxiliary variables that have a strong relationship with the target variable that is of interest in the study (Bethlehem and Kersten, 1985, p. 295; Gelman and Carlin, 2002, pp. 291–2). Post-stratification is not very effective when the target variables do not show a sufficiently strong relationship with the weighting variables (Bethlehem and Stoop, 2007, p. 123; Loosveldt and Sonck, 2008). Two problems may occur when weighting is carried out; namely, the presence of strata without observations in the sample, and a lack of adequate population information in the auxiliary variables. Bethlehem and Keller (1987) developed a general weighting technique based on linear models in order to avoid these problems. Post-stratification can also be effective in reducing nonresponse bias (for a useful overview, see Bethlehem, 2009, pp. 250–3). One of the sometimes overlooked consequences of PS is that the sample variance increases, which has an effect on the standard errors. An application of poststratification in ESS 2, and its effects on the sample variances, is discussed in Section 8.3.1. Sample-based weightings adjustments are based on information in the sample about differences in nonresponse according to classes of respondents. The weighting factors are the inverse of the response rates in each of the classes, and are applied in such a way that the distribution of classes of respondents is made identical to the distribution of classes of nonrespondents and respondents in the population. The probability of response weighting adjustments, or propensity weightings, produces weights that are derived from the likelihood of obtaining a response from each class of respondents (Brehm, 1993, p. 118). The response propensities are often estimated within logistic multivariate regression models, with the probability ratio ‘response/ nonresponse’ as dependent variable. This approach is only useful when information about relevant covariates is available for both respondents and nonrespondents (Lee and Valliant, 2008). An application of probability of response weighting adjustments is discussed in the context of a survey among nonrespondents in Section 8.3.4. Extrapolation is based on the idea that certain groups of respondents are more like the nonrespondents than others. These groups are used to estimate values on variables for the nonrespondents. Several methods of extrapolation are possible, depending on
METHODS FOR ASSESSING NONRESPONSE BIAS
213
the assumptions made about the similarity of the groups of respondents to nonrespondents (Voogt, 2004, p. 134). This method builds on the method of comparing estimates between cooperative and reluctant respondents, but it goes a step further because of the extrapolation (see, e.g., Potthoff, Manton and Woodbury, 1993). Although mainly developed to correct for item nonresponse, imputation can also be used for unit nonresponse. Imputation of missing values in a number of variables is based on a set of variables (covariates) that are common to both respondents and nonrespondents. Missing values from nonrespondents are substituted by estimates (Voogt, 2004, pp. 134–5). There are many imputation methods, such as unconditional and conditional mean imputation, hot-desk imputation, regression imputation and multiple imputation (see Kalton and Kasprzyk, 1986; Little and Rubin, 1987). As in other approaches, the strength of this method depends on the covariance between the variables measured for both respondents and nonrespondents and the variables for which missing values must be substituted for the nonrespondents. The weakness is obvious: there are generally only few variables with known values for all selected sample units that correlate strongly with the target variables in the survey. The final correction method is the development of models of response probabilities in order to correct for nonresponse bias (Gelman and Carlin, 2002). Some scholars use models that are based on Bayesian methods; others use (logistic) regression models or log-linear models to model response probabilities (Rizzo, Kalton and Brick, 1996; Voogt, 2004, p. 135). Post-stratification has also been used as a basis for modelling (Gelman and Carlin, 2002). Other approaches are response propensity modelling (Rosenbaum and Rubin, 1983; Laaksonen and Chambers, 2006), calibration estimation (Deville and S€arndal, 1992), extended use of auxiliary data (Hidiroglou and Patak, 2006) and the introduction of instrumental variables and jack-knife methods in variance estimation with replicate weights (Knot, 2006). The information needed for applying the model-based approach can be obtained from the samples themselves (respondents and nonrespondents), from the geographical context and from auxiliary variables in the population, or from other reliable sources. Applying and comparing the effects of different survey adjustment techniques is expected to give a better idea of the amount of nonresponse bias. According to Groves (2006, p. 656), the strength of this technique is that ‘. . . when the alternative estimators are based on very different assumptions about the nature of the nonresponse and are similar in magnitude, the researcher can have more confidence in the conclusions from the survey. If they differ, the researcher has some reason for caution.’ The weakness is the lack of an unambiguous gold standard, because each of the adjustment schemes is based on a number of untestable assumptions. This means that, if different adjustments yield different estimates, it is not clear which is the best or ‘true’ estimate, due to the absence of an external benchmark (Groves, 2006, p. 656). In other words, we can never be sure whether the distributions of the target variables represent the population values better after weighting. We can only assume that if different adjustment methods using different types of auxiliary data provide comparable estimators, weighting will improve the results.
214
DESIGNS FOR DETECTING NONRESPONSE BIAS
8.3 Detecting and Estimating Bias in the ESS6 In the first three rounds of the ESS, four strategies for bias detection and estimation were implemented. These were adjusting the sample to the population using post-stratification weighting, comparison of cooperative with reluctant respondents, using information from observable data available for all target persons, and collecting core information from nonrespondents. We will present examples and discuss the strengths and weaknesses of each of the approaches from a cross-country comparison point of view.
8.3.1
Post-stratification
Post-stratification, or PS, has two functions in the assessment of nonresponse bias. Firstly, it facilitates the study of the effects of PS weights on the distributions of the PS variables and on many other variables in the sample. On the basis of certain assumptions, this provides an impression of the amount of bias in the sample, and of the variables that are most sensitive to nonresponse bias. Secondly, the weighted samples, again under certain assumptions, are considered as being (partially) corrected for nonresponse bias. A complete report on PS weighting from ESS 1 and 2 is provided in the studies by Vehovar 2006, 2007) and Vehovar and Zupanic (2007). This section presents an overview of how PS weighting was implemented in the ESS and how the effects can be evaluated. To evaluate the effect of post-stratification weighting, differences between unweighted and weighted means were computed. There are three important caveats here. Firstly, differences between weighted and unweighted means may be due to other factors besides nonresponse: the sampling procedures, the use of incorrect population statistics, or differences in the measurement of post-stratification variables in the population statistics and in the surveys. This means that PS might well overestimate the amount of nonresponse bias. Secondly, nonresponse effects cannot be separated into the effects of noncontact and refusal. And finally, in this section we do not use the unweighted samples but the samples weighted using the so-called ‘design weights’.7 Ideally, weights should be based on variables that co-vary strongly with the target variables of the survey and with the probability of response. In practice, the choice of weighting variables for which the population distribution is known is very limited. In this case the weighting variables were sex, age and education.8 These three variables are commonly used in post-survey adjustments. Population distributions for all ESS countries are given in the survey documentation submitted by the National Coordinators (see European Social Survey, 2003, 2005, 2007a). The PS 6
Parts of this section have been published previously in a shorter version in Billiet, J., Matsuo, H., Beullens, K. and Vehovar, V. (2009). Non-response bias in cross-national surveys: designs for detection and adjustment in the ESS. ASK. Society. Research. Methods, 18, 3-43. 7 The design weights correct for deviations from the equal probability design (EPSEM), and thus for differences in sampling design across countries. See also Section 3.3.3. 8 Other relevant PS variables that might be used include urbanicity or context variables related to geographical environment. Such data are, however, not available in the ESS or cannot be used due to privacy restrictions.
DETECTING AND ESTIMATING BIAS IN THE ESS
215
approach used to estimate nonresponse bias and to correct for nonresponse is based on strict post-stratification or on the raking method. The latter has been used when post-stratification was impossible because no joint distribution for the three population variables mentioned was documented in reliable population statistics (see Section 8.2). In 10 of the 24 countries involved in ESS 2, the raking method was applied (Vehovar and Zupanic, 2007). Age was grouped into three categories (15–34, 35–54, and 55 and older); sex, of course, into two. Education is more problematic than sex and age in a cross-national survey. This is firstly because in a number of countries the joint distributions of age and sex with education were not available from population statistics. In addition, differences between countries in the approach used to code education into ISCED created further difficulties.9 8.3.1.1 Weight calculation and the impact on stratification variables (ESS 2) PS weights are computed by dividing the cell proportions in the multivariate table (sex age education) in the population by the corresponding cell proportions in the obtained sample (R€assler et al., 2008, p. 375). They have a value of 1 when the sample cell proportion is identical to the population proportion; are in the range 0 < w 1 when the sample proportion is higher than the proportion in the populations, and are > 1 when the sample proportion is lower than expected (in the population). The final weighted sample (W2) is the sample weighted by the design weights multiplied by the PS weights. It is possible to use this product of both weights, since it is very likely that both are independent. The weighted samples reflect the population distribution of the stratification variables sex, age and education. The unweighted sample (W1) is the sample weighted by the design weight only. What is the impact of weighting on the distributions of the three post-stratification variables? Small deviations indicate that the distributions fairly represent the population distributions for sex, age and education. Large deviations point to strong over or underrepresentation of certain categories and result in weighting factors that deviate strongly from 1. Table 8.1 gives an overview of the impact of post-stratification by presenting the ratios between the final weighted (W2) frequencies and the (W1) frequencies for each separate post-stratification variable. The bold entries in Table 8.1 indicate ratios that diverge seriously from 1 (equal proportions). ‘Serious’ deviations are defined here as ratios equal to or higher than 1.33 or lower than its inverse (between 0 and 0.75).10 Higher values (1.33) point to an underrepresentation in the sample, while lower values (0 < x < 0.75) suggest an overrepresentation. 9
For details, see Vehovar (2007, p. 338). During the analysis of ESS 3 data, some problems in previous PS weightings were detected. Since it is our aim to assess the ways in which bias estimation has been applied in past ESS rounds, we report the way it was done in ESS 2, with some reservations based on additional checks that were done afterwards (Vehovar, 2008). In the preliminary report on weighting for the three previous ESS rounds, Vehovar found that the distributions reported by the NCs are not always correct and optimal presentations of the population. It was therefore proposed to investigate the possibility of changing the source for the weightings. It will be investigated whether the Labour Force Statistics are preferable as a ‘gold standard’ over the population statistics. This research is still in progress. 10 These ratios are independent of the absolute size of the cell frequencies themselves.
216
DESIGNS FOR DETECTING NONRESPONSE BIAS
Table 8.1 Ratios of (marginal) percentages in the estimated population distribution per country (W2 sample (weighted by final weight) divided by W1 sample (design weight only)) in ESS 2a Country
Sex
Educationb
Age
Male Female 15–34 35–54 55 þ Low Middle high AT BE CH CZ DE DK EE ES FI FR GR HU IE IS LU NL NO PL PT SE SI SK UA UK
Austria Belgium Switzerland Czech Republic Germany Denmark Estonia Spain Finland France Greece Hungary Ireland Iceland Luxembourg Netherlands Norway Poland Portugal Sweden Slovenia Slovak Republic Ukraine United Kingdom
1.02 0.98 1.04 1.02 1.02 1.00 1.10 0.96 1.04 1.00 1.11 1.12 1.14 1.09 0.94 1.16 0.94 0.98 1.17 1.00 1.04 0.96 1.22 1.04
0.98 1.02 0.96 0.98 0.98 1.00 0.93 1.04 0.96 1.00 0.91 0.91 0.89 0.93 1.07 0.86 1.06 1.02 0.88 1.10 0.96 1.04 0.87 0.96
0.89 0.82 1.07 1.20 1.12 1.00 1.13 1.91 1.03 1.31 1.30 1.13 1.29 1.09 0.73 1.36 1.14 0.95 1.13 1.10 1.03 0.97 1.30 1.22
1.29 0.82 1.03 0.89 1.06 1.00 1.06 0.56 0.97 1.03 0.91 1.00 0.97 0.91 1.10 0.98 0.88 1.03 1.06 0.94 1.00 1.10 0.95 1.29
1.45 1.89 0.97 0.94 1.03 1.00 0.84 0.52 0.92 0.89 0.87 0.91 0.76 0.73 1.43 0.76 1.03 1.04 0.89 0.83 0.91 1.04 0.82 0.53
0.52 0.93 1.58 2.61 1.42 1.00 2.58 0.98 1.06 0.92 1.09 1.45 1.02 1.17 0.64 0.79 1.50 1.02 1.03 0.52 1.09 2.71 3.14c 0.46
2.30 0.93 0.93 0.56 0.91 1.00 0.49 1.06 1.09 5.29c 0.89 0.74 0.93 4.44c 1.37 1.52 1.24 1.03 0.93 2.09 0.90 0.48 0.69 2.71
1.64 1.17 0.72 1.30 0.85 1.00 0.39 1.05 0.89 0.40 0.88 0.50 1.03 0.33c 1.14 0.83 0.57 0.88 0.89 1.00 0.87 0.85 0.27c 1.19
a Based on the W1 and W2 cell frequencies in table 18 in the report by Vehovar and Zupanic (2007, p. 39). Some countries have been left out because of incomplete data. b A three-category education variable was used, based on ISCED1997 (Vehovar, 2007, p. 338). The categories are as follows: Low, Not completed primary education, primary or first stage of basic, lower secondary or second stage of basic; Middle, Higher secondary; High, Post-secondary, nontertiary, first stage of tertiary, second stage of tertiary. The International Standard Classification of Education (ISCED 1997) was designed by UNESCO and is an instrument suitable for assembling, compiling and presenting statistics on education both within individual countries and internationally. c The large ratios in the distribution of education might – at least partly –be ascribed to different categorizations in the reported population statistics and in the applied ISCED 97 coding.
It is clear that the marginal distributions of the variable ‘sex’ do not differ strongly between the population and the sample. Deviations with respect to age are more frequent, but are small and not significant. Most notable are those in the Spanish sample (ES), where the youngest category is seriously underrepresented (theweighting
DETECTING AND ESTIMATING BIAS IN THE ESS
217
factor for the 15–34 age group is 1.91) and the older age categories are overrepresented. The oldest age category is also overrepresented in the United Kingdom, but in Belgium respondents older than 55 years are strongly underrepresented in the sample. There are other countries with rather high deviations in age distribution (Austria, the Netherlands, Iceland and Luxemburg), but at the end of the day the deviations are rather modest. Generally, it appears that the size of the youngest age category (15–35 years) is underestimated in the samples while the oldest is more often overrepresented. It should be borne in mind that age can be related to contactability and cooperation (see Sections 6.2.2 and 6.3.1). These factors may have operated differently across countries. The finding that the youngest age categories are more often underrepresented is in line with other studies on the contactability hypothesis; older respondents are easier to contact. Education seems to be much more related to nonresponse, as there are serious deviations between sample and population in no less than 15 countries. The lesseducated are seriously underrepresented in the samples of eight countries (CH, CZ, DE, EE, HU, NO, SK and UA) and considerably overrepresented in four countries (AT, LU, SE and UK). With the exception of Austria, the number of more highly educated people is generally seriously overrepresented (CH, EE, FR, HU, IS, NO and UA). The proportion of the group educated to a ‘middle’ level was more variable, being seriously overrepresented in some countries (AT, FR, IS, LU, NL, SE and UK) and underestimated in others (CZ, EE, HU, SK and UA). The largest deviations between sample and population are observed for the middle category of education in France (FR) and Iceland (IS), and for the high level of education in Iceland and Ukraine (UA). It is important to bear in mind that differences in categorization between the original ISCED 97 coding and population statistics as reported in the ESS documentation might be responsible for these large (and possibly artificial) deviations with regard to education, and that the nonresponse bias is therefore being overestimated. The most important conclusion from a cross-national point of view is that there is no stable pattern of overrepresentation or underrepresentation in the categories of the post-stratification variables that applies to all countries. Since all the samples were random probability samples, and since they are comparable over sample designs because the design weights were always applied in the computations, the deviations can reasonably be assigned to nonresponse for the majority of countries. This finding suggests that there is no universal cross-national relationship between nonresponse and background variables. 8.3.1.2 Post-stratification weighting and variance inflation11 Weighting not only has an effect on the precision of the estimates (means and percentages), but also has consequences for the sample variance and thus the 11
This section is based on Vehovar (2008).
218
DESIGNS FOR DETECTING NONRESPONSE BIAS
estimation of the standard errors (Little and Vartivarian, 2005). Weights themselves are estimates (R€assler, Rubin and Schenker, 2008, p. 375). Whether or not weights should be used depends on whether the reduction in bias outweighs the loss in precision (Sturgis, 2004; see also Section 2.6.1). The estimated variance in the weighted sample is usually, but not always, inflated by the variation of the weights (Little and Vartivarian, 2005). The coefficient of variation (CVw) and the variance inflation factor (VIF) are used for evaluating this effect of weighting on the estimate of the variance. These statistics are calculated separately for the unweighted samples (W1) and for design weight combined with the post-stratification weight (W2, the final weight). As above, the latter is defined as the product of the sex/age/education weighting coefficient times the coefficient of the design weight. The increase in sample variance is one of the most important consequences of weighting, since it has implications for the rejection of the null hypotheses of statistical analyses. The estimate of the increase of the sample variance (ignoring design weights) is based on the well-known Kish (1965) formula for the coefficient of variation of the weight variable: CV 2 ðwÞ ¼
S2 w 2 w
ð8:4Þ
Here, CV 2 ðwÞ expresses the ratio between the elementary variance of the weight variable w and the square of the arithmetic mean for the same weight variable w. The larger the variation around the mean value of the weight variable, the larger is the coefficient of variation. The variance inflation factor (VIF) is directly related to the coefficient of variation: VIF ¼ 1 þ CV 2 ðwÞ
ð8:5Þ
VIF expresses the increase in the sample variance of a weighted sample in comparison with the sample variance (with the same sample size) where there would be no need for weights. According to this expression, the minimum value of VIF is 1.0 in the case of zero variation of the post-stratification weights. The consequence of weighting is an increase of the sample variance (unless VIF has its minimum value): Varðyw Þ ¼ VarðyÞ VIF
ð8:6Þ
In countries with large design weights, the total VIF would be somewhat higher. Roughly speaking, the increase in the sample variance due to clustering – that is, the design effect – is generally around 1.2–1.5 for this type of survey, but can also be higher than 2 or even 3 for some variables that are related to the neighbourhoods that were used as primary sample units (PSUs) (Vehovar, 2007, p. 341). Variance inflation factors for the final weights in ESS 2 are presented in Figure 8.1. Countries at the top have a design weight of 1 and the variance is only inflated because of post-stratification weights. In ESS 2, the VIF values are relatively moderate; in countries with the largest VIF, the categorization of education may have played a role. Confidence intervals increase with the square root of VIF; in practice, the expansion of
DETECTING AND ESTIMATING BIAS IN THE ESS
219
FI PL SI ES BE NO DK DE GR CH PT NL IE SE LU AT SK EE UK HU CZ IS UA FR 1
1.5
2
2.5
3 VIF
3.5
4
4.5
5
Figure 8.1 Variance inflation factors for final weights in ESS 2. Reproduced by permission of Acco publishing Company, Belgium the confidence interval is rarely larger than 10%. More details on this topic are to be found in Vehovar (2007, pp. 352–3). 8.3.1.3 The effects of the weightings as an indication of nonresponse bias In the PS approach, the size of the weights is assumed to give an indication of the amount of nonresponse bias. This is likely when the differences between the weighted and unweighted estimates are merely attributed to nonresponse. This is a risky assumption because we do not know about other potential sources of bias, such as noncoverage bias, fieldwork errors, processing errors and measurement errors. The PS results provide an indication of the upper limit of the nonresponse bias insofar as nonresponse is correlated with age, education and sex. To evaluate the effect of weighting, the target variables of the survey have to be identified. In this case, the target variables are the attitudinal variables and values in the ESS core module as well as the variables in the rotating modules (see Chapter 3). Vehovar (2008, p. 344) included no fewer than 45 items in his study of the deviations between W1 and W2 samples in ESS 2. The items were selected based on their importance, relevance and appeal for the concepts they seek to measure. They were
220
DESIGNS FOR DETECTING NONRESPONSE BIAS
drawn from the following sections: media (three items); social trust (three); politics (12); well-being (three); religion (three); economic morality (six); family and life–work balance (six); socio-demographic profile (five); and human values (five). There are different measures of bias that can be used to assess the effect of weighting. Bias is defined here as the difference between the estimates in the unweighted and the weighted sample: biasðyÞ ¼ yyw
ð8:7Þ
The standardized bias (Sbias) compares the nonresponse bias with the standard error of the estimate – that is, the sampling error: Sbias ¼ Standardized bias ¼
biasðyw Þ seðySRS Þ
ð8:8Þ
In this expression, the standard error is calculated based on the assumption of a simple random sample (SRS), which of course underestimates the sample variance for the design effect and for the VIF.12 In this section, we will also use the absolute standardized bias (ASbias). Sbias can be positive or negative; ASbias ignores the sign. We will also present the average ASbias, where ASbias is either presented per item averaged across countries, or per country averaged across items. We will focus on averages only to summarize the 1125 bias estimates: estimates for 45 items in 25 countries. To evaluate the size of the Sbias, the usual 5% level of significance test can be used, with the value t ¼ 1.96 as the benchmark. Table 8.2 presents six of the 45 items for which the average ASbias of all countries was significant.13 The number of household members seems to have a large bias, but – as can been seen in Table 8.3 – the direction of bias is not identical across countries (and the bias is extremely large in Austria). The other items with a high bias reflect interest in politics and society and two of these reflect attitudes towards immigrants. These findings about items that are sensitive to nonresponse bias are comparable with previous research on ESS 1 data, in which it was found that reluctant respondents are more likely to have negative attitudes towards immigration and a low level of interest in politics (Billiet et al., 2007, pp. 153–5). The relationship between survey participation and interest in politics has been found in earlier studies, among others by Voogt (2004) and Groves et al. (2004). A second way to summarize the results of the PS approach is to count the number of items with an ASbias larger than 1.96. Table 8.3 presents this count for each country, as well as the average ASbiases and the item with the largest Sbias. There are wide differences between countries. The largest ASbias is observed in Austria (AT) in 12
Sbias is a rather conservative estimate of nonresponse bias (Vehovar, 2007, p. 346). See also Annex 2 of Vehovar and Zupanic (2007), available online at http://mi.ris.org/uploadi/editor/1169212272appendix2.xls 13 Of course, even the ASbias value of 2.077 for NWSPPOL does not truly mean statistical significance for the following reasons: it is only the average of the country ASbiases; and the properly inflated standard errors are not used.
DETECTING AND ESTIMATING BIAS IN THE ESS
221
Table 8.2 Estimates of ASbias of the six (out of 45) items with significant values of ASbias in ESS 2 ASbias Average across countries
Item
Content
2.077
NWSPPOL
2.082
IMWBCNT
2.189
IMBGECO
2.477
NWSPTOT
3.504 3.646
HHMMB POLINTR
Newspaper reading, politics/ current affairs on average weekday Immigrants make the country a worse or better place to live Immigration bad or good for the country’s economy Newspaper reading, total time on average weekday Number of household members How interested in politics
ASbias, absolute standardized bias.
the household size of the respondent (HHMMB). The country with the largest number of biased items is Iceland (IS), with no fewer than 37 items (out of 45) with an Sbias larger than 1.96. At the other end of the scale, we find six countries with no items where the ASbias is larger than 1.96: Germany (DE), Spain (ES), Finland (FI), Poland (PL), Portugal (PT) and Slovenia (SI). Four of these countries had response rates of over 70%. The relationship between the amount of bias and obtained response is, however, disrupted by Estonia (EE), which has a very high response rate of 79.3% but still has 18 items with an ASbias larger than 1.96. It should be noted that Estonia changed its education coding between ESS 2 and ESS 3, which might point to a possible problem in ESS 2 with the education coding that forms the basis for the PS weights. The response rate in the sample of Greece (GR) is nearly as high (78.8%), but the average ASbias is small and only five items have ASbias larger than 1.96. Figure 8.2 compares the information from Table 8.3 with the response rates in each country. This is a three-dimensional presentation, with the average ASbias on the vertical axis, the response rate on the horizontal axis and the number of items with ASbias larger than 1.96 in each country expressed in the size of the bubbles. The figure suggests that response rates matter: the Pearson’s correlation between the average ASbias and the response rate is 0.29, and between the numbers of items with ASbias > 1 and the response rate it is 0.26.14 Country samples with higher response rates are more likely to be characterized by smaller nonresponse bias.
14
If the outlier Iceland is not included, the correlation between absolute average standardized bias and response rate at country level decreases to 0.16.
222 Table 8.3
DESIGNS FOR DETECTING NONRESPONSE BIAS An overview of bias per country in ESS 2
Country Average Maximum Item ASbias Item Sbias AT
2.6
28.69
HU
1.3
3.28
PT
0.4
1.42
SI
0.4
1.25
BE
0.7
2.87
DE
0.0
0.06
EE
2.1
8.01
FR
2.0
5.99
GR
0.7
3.26
CZ
1.6
5.52
DK
1.7
6.16
LU
1.2
4.40
NL
0.9
2.79
UA
1.7
4.20
SE
1.4
3.82
SK
2.3
6.55
Content
Number of items with ASbias > 1.96
Number of people living in HH HHMMB Number of people living in HH HHMMB Number of people living in HH HHMMB Number of people living in HH POLINTR How interested in politics POLINTR How interested in politics POLINTR How interested in politics POLINTR How interested in politics POLINTR How interested in politics NWSPTOT Newspaper reading politics/current affairs NWSPTOT Newspaper reading politics/current affairs NWSPTOT Newspaper reading politics/current affairs NWSPTOT Newspaper reading politics/current affairs NWSPTOT Newspaper reading politics/current affairs WMCPWRK Women be prepared to cut down paid work WMCPWRK Women be prepared to cut down paid work
17
HHMMB
10 0 0 1 0 18 15 5 10
14
8
4
12
11 18 (continued )
DETECTING AND ESTIMATING BIAS IN THE ESS Table 8.3
223
(Continued)
Country Average Maximum Item ASbias Item Sbias
Content
CH
1.2
3.59
IMBGECO
ES
0.0
0.02
PRAY
Fl
0.4
0.16
CTZHLPO
IE
1.2
3.93
RLGATND
IS
4.9
13.02
PPLFAIR
NO
1.6
4.28
IMBGECO
PL
0.2
0.48
GINCDIF
UK
2.2
6.55
TVTOT
Immigration good or bad for country How often pray apart from religious services Citizens spent some free time helping others How often attend religious services Most people try to take advantage of you Immigration good or bad for country Government should reduce income differences TV watching, total time on average weekday
Number of items with ASbias > 1.96 7 0
0
9 37
16 0
15
Source: Vehovar and Zupanic (2007, p. 31). Sbias, Standardized bias; ASbias, absolute standardized bias.
8.3.1.4 Does post-stratification on sex, age and education affect substantive findings? This is the key question for users of ESS data, who are not interested in the methodological detail but, rather, in the implications of weighting for estimated statistics, such as means of latent variables or constructs, correlations, regression parameters or standard errors. Good candidates for evaluating the effect of weighting on substantive findings are the distributions of two latent variables: political competence and perceived ethnic threat (for an overview of the three underlying variables of each latent variable, see Appendix 8.1). Table 8.2 showed that political interest, one of the factors of political competence, has the highest average ASbias across countries. Two of the variables that are part of perceived ethnic threat are also significantly biased (IMWBCNT and IMBGECO; see Table 8.2). The means of the two latent variables in the W1 (design weight) and W2 (PS weight) samples are shown in Figures 8.3 and 8.4.
224
DESIGNS FOR DETECTING NONRESPONSE BIAS 6
Average absolute bias
5
IS
4
3
AT
FR
2
SK
UK
DK UA NO SE HU
CZ CH
1
LU
IE NL BE
DEES
0 0
10
20
30
40
50
EE
60
GR SI FI PT PL 70
80
90
100
Response rate ESS 2 (%)
Figure 8.2 The absolute average standardized bias and response rates in ESS 2. Reproduced by permission of Acco publishing Company, Belgium
5.0 5 = high political 4.5 competence 4.0 3.5 3.0 2.5 2.0 1.5 1.0 0 = low 0.5 political competence 0.0 PT CZ FR ES BE EE PL FI SI SK UK GR LU NL SE IE HU CH NO UA DE AT IS DK Design weight
Final weight
Figure 8.3 Mean scores per country on the latent variable ‘political competence’, weighted by design weight (W1) and PS weight (W2) in ESS 2
DETECTING AND ESTIMATING BIAS IN THE ESS
225
10 = low 10 perceived 9 ethnic threat 8 7 6 5 4 3 2 0 = high 1 perceived ethnic threat 0
GR PT CZ EE HU SI SK UK FR AT BE UA DE NL NO DK PL CH ES IE FI LU SE IS Design weight
Final weight
Figure 8.4 Mean scores per country on the latent variable ‘perceived ethnic threat’, weighted by design weight (W1) and PS weight (W2) in ESS 2 As we would expect, because of the small post-stratification weights in a number of countries, the two estimates are not very different, but there are some exceptions. For political competence (Table 8.3), Estonia, Austria, France, the Czech Republic, the Slovak Republic, Denmark, Hungary and Ukraine show a change larger than 5% in mean score values. The W2 score is lower in these countries (except for Austria), which means that political competence is probably overestimated because of nonresponse bias. With respect to perceived ethnic threat (Table 8.3), three countries show large effects (>5% change in mean score), namely Iceland (IS), the United Kingdom (UK) and France (FR). These are countries that in Figure 8.2 showed large (absolute) standardized bias and at the same time large numbers of biased items. In Iceland and France, people feel more threatened by immigration after weighting for nonresponse, which is in line with earlier studies on the relationship between nonresponse bias and attitudes to immigration (see Billiet et al., 2007). In the United Kingdom, however, the perceived threat is higher before weighting. Social researchers are usually less interested in descriptive statistics such as means and proportions and more interested in the comparison of explanatory models. We therefore compared a substantive explanatory model for these two latent variables in Estonia (political competence) and Iceland (perceived ethnic threat), two countries with the largest bias in the attitudinal latent variables, with Germany, which has the smallest bias. The response rate of Germany (52%) is comparable with that of Iceland (51%), but is much lower than the response rate of Estonia (79%). The effects of variance inflation were taken into account in these models. The conclusions of these comparisons can be summarized fairly simply. With regard to political competence, there are small differences in the models between the W1 and W2
226
DESIGNS FOR DETECTING NONRESPONSE BIAS
samples in Estonia and no differences in Germany. The substantive interpretations of the parameters of the model in the W1 and W2 samples are identical. With regard to the second latent variable perceived ethnic threat, there are no differences in the conclusions for Iceland, despite the fact that Iceland is a country with a larger variance inflation factor in the W2 sample and has a much smaller response rate than Estonia. In the German samples, there are again no differences between the W1 and W2 models. A smaller sample was required in Iceland than in most other countries, because it has a small population (see Section 3.3.3). The lack of significant results could be caused by this small sample size. We will therefore present detailed results from another country here, namely the United Kingdom, which has a comparable response rate to Iceland and also a rather large bias in the variable perceived ethnic threat, together with a large variance inflation factor (VIF) of 2.38 in the W2 sample PS weights. The results are presented in Table 8.4. Although there is a large bias in the perceived ethnic threat items, there is not much difference in the regression models in the W1 and W2 samples. The major difference relates to the effect of sex, which is much lower and no longer significant at the 0.05 level. In other words, without the post-stratification weights the conclusions would have been that men have slightly more positive views about the consequences of immigration than women. Once the weights are applied, minimizing ‘bias’, this conclusion no longer applies. The effect of ever having had a job is stronger in the final weighted sample (W2), but is still not significant. In the examples analysed, the explained variance is rather low, especially for the latent variable perceived ethnic threat. The explained variance is even lower when we include only the three PS variables (age, sex and education) in the models. For example, with regard to political competence, R2 drops from 0.24 to 0.20 in Germany. It is even lower than 0.20 in the case of perceived ethnic threat, and even falls below 0.10 in the United Kingdom. This is an indication that the PS variables are only weakly related to the target variables in the ESS. In other words, because of the weak relationships, these specific post-stratification weights only reduce a small portion of the bias related to sampling and nonresponse. In summary, serious differences before and after post-stratification weighting were not found in the countries where we expected the strongest effects of nonresponse bias. Minor differences did not lead to fundamental changes in conclusions about the parameters in regression models. Does this mean that there is almost no nonresponse bias in the ESS, or should we instead conclude that the assumptions underlying the post-stratification method, and its weaknesses, are responsible for the failure to detect bias in target variables, and then adjust for it? It is plausible that the answer to both questions is a partial ‘yes’. On the one hand, ESS sampling and data collection are very well prepared and standardized wherever possible. This may be a reason for the fairly small amount of bias uncovered. However, the PS approach is dependent on the amount of bias reduction in the target variables, which in turn depends on the strength of the correlation between the PS variables and the target variables. It is therefore possible that we do not really detect bias with this approach,
Explanatory variables
Unweighted sample (design weight ¼ 1) Unstandardized coefficient
Intercept Sex Male Ref.: Female Age Education Lower Low secondary High secondary Ref.: Higher Urban Active Ever job Ref.: Never Job control R2
Adjusted SE
t-value
Probability
Final weighted sample Unstandardized coefficient
Adjusted SE
t-value
Probability
3.042
0.333
9.13
<0.0001
2.998
0.461
6.50
<0.0001
0.279
0.118
2.36
<0.01
0.181
0.149
1.21
ns
0.000
0.004
0.09
ns
0.003
0.005
0.69
ns
0.513 1.578 0.845
0.436 0.143 0.189
1.18 11.01 4.48
ns <0.0001 <0.0001
0.457 1.612 0.859
0.859 0.202 0.178
0.53 7.98 4.84
ns <0.0001 <0.0001
0.052 0.044 0.817
0.061 0.139 0.248
0.85 0.32 3.29
ns ns <0.001
0.024 0.231 1.100
0.076 0.176 0.312
0.32 1.31 3.52
ns ns <0.001
0.010 0.09
0.022
0.46
ns
0.036 0.10
0.028
1.29
DETECTING AND ESTIMATING BIAS IN THE ESS
Table 8.4 A comparison of explanatory regression models for perceived ethnic threat in samples unweighted (W1) and weighted for post-stratification (W2) samples in the United Kingdom in ESS 2
ns
ns, not significant.
227
228
DESIGNS FOR DETECTING NONRESPONSE BIAS
given the three PS variables that were used and in the light of the problems with the education variable in the ESS. 8.3.1.5 The post-stratification approach: concluding remarks In ESS 2, neither large nor systematic differences are observed in most countries between the national samples and the population structure as regards sex and age. The picture with regard to the education structure is different, as in most countries at least one education category was underrepresented in the national sample compared to the population structure. In six country samples, the middle-educated are underrepresented, and in another eight samples the proportion of less-educated persons is considerably lower in the sample than in the population at large. In these countries, the weights based on age, sex and education basically push the estimates in the direction of characteristics and attitudes associated with the middle or less well-educated sections of the population. A number of general problems have been identified with regard to poststratification: (1) no distinction can be made between nonresponse bias and sampling bias; (2) post-stratification assumes MAR (missing at random) within each combination of the stratification variables, and where this is not the case because of nonrandom missingness within these classes (NMAR), there can be still a serious undocumented bias; (3) the size of the bias in the target variables can be seriously underestimated when the correlation of these variables and the poststratification variables is low; and (4) when a weighted sample is compared with the unweighted one, there is, strictly speaking, no guarantee that the adjusted sample better reflects the distribution of the target variable population. Bethlehem (2002, pp. 277–8) shows that the PS estimator is biased when there is a relationship between response probabilities and values of the target variable within each stratum. The MAR assumption is usually taken for granted when post-stratification is used in public opinion and market research, and the possibly low correlation between PS variables and target variables is generally ignored. The alternative would be special modelling for each separate variable, but in practice explicit nonMAR mechanisms for missing data are seldom employed. Owing to the uncertainty about the missing mechanism, it seems that high response rates are one of the best ways to prevent incorrect estimates due to nonresponse biases (Vehovar, 2007, p. 355). Post-stratification has a number of practical problems. The PS estimator may be biased in itself when the source (or ‘gold standard’) does not accurately reflect the real population distributions. This happens, for example, when the documentation in the population statistics (e.g. the 2001 census) is outdated and no longer reflects the actual distribution at the time of the survey (Vehovar, 2007, p. 355). It is also possible that the measurement of some PS variables in the source (‘gold standard’) and in the survey do not correspond. This happened with the education variable in some ESS countries. Level of education is the most effective PS variable, but the joint distribution of precisely this variable with the other demographics in the population is not always
DETECTING AND ESTIMATING BIAS IN THE ESS
229
available with sufficient precision, and the measurement of this variable is not comparable across countries (Hoffmeyer-Zlotnik, 2005). In this section, we found indications of bias in variables related to politics, immigration and media use (see Table 8.3). The direction of the bias is generally consistent across countries and is also consistent with the anticipated effect of the middle and less well-educated segments, but there are exceptions. In general, at the aggregate (country) level there is a moderate positive correlation (about 0.26) between nonresponse level and nonresponse bias. The main weakness of the application of post-stratification weighting is the weak relationship between the variables underlying the weights (sex, age and education) and the target variables. This casts doubt on the usefulness of this approach for the ESS. The usability could be improved if we were able to include more auxiliary information in the weights at the level of the area in which the target respondent lives. This begs the question of whether other variables such as urbanicity or geographical region perform better when they are included in PS weighting. Population statistics in which the age, sex and education variables are crossed with urbanicity are, however, not available for all countries. Despite these limitations, there are also positive aspects. Bethlehem (2002, p. 279) reports on the grounds of his own practical experience that nonresponse often seriously affects estimators such as means and totals, but less often affects estimates of relationships between variables. The findings of the minor differences in regression parameters obtained before and after post-stratification weighting seem to support his conclusions. To end on a positive note, a major advantage of the post-stratification approach is that it can be applied to all country samples in the same way. The results provide at least some idea about the existence and direction of bias. As will be shown in the following sections, other techniques are far more difficult or even impossible to implement in a standardized way across countries.
8.3.2
Comparing cooperative with reluctant respondents15
A second approach to assessing bias in the ESS is to use information from those initial refusals who are later ‘converted’ into respondents. The call record data from the ESS (see Chapter 3) contain information on interviewer behaviour, the outcomes of each call attempt and the final outcomes for each sample unit. Fieldwork call record data are subsequently merged with the main data files. This means that complete information is available from nearly16 all questions, for cooperative respondents who participated at the first request and for reluctant respondents (also called ‘converted refusals’). Even though it is not certain that the latter are more similar 15
This section is largely based on the report on refusal conversion in ESS 2 (Beullens, Vandecasteele and Billiet, 2007). 16 We have found that item nonresponse is somewhat higher among reluctant respondents. This is in line with the expectation that reluctant respondents are somewhat more susceptible to measurement error because they are more likely to satisfice (see also Section 7.4.3 in Chapter 7).
230
DESIGNS FOR DETECTING NONRESPONSE BIAS
Table 8.5 ESS 2
The number of cooperative and reluctant respondents in five countries in
Country CH DE EE NL SK
Cooperative respondents
Reluctant respondents
2059 2378 1789 1358 1407
175 494 201 526 105
Switzerland Germany Estonia Netherlands Slovak Republic
to final refusals than the cooperative respondents, it is likely that they are more similar to final refusals than to the final noncontacts. Since the study of reluctant respondents necessarily requires some assumptions to be made, we will talk here about ‘traces’ of nonresponse bias, in recognition of the fact that this method does not provide a complete picture of nonresponse bias (see also Billiet et al., 2007). The effect of refusal conversion efforts, expressed as a percentage of the initial refusals that are converted in all countries in ESS 2, ranges from 2% to 41% (see Table 7.1), clearly demonstrating the large differences in refusal conversion practice between countries. This has serious consequences for the usefulness of refusal conversion for bias detection and adjustment in a cross-national context, because it is not possible to estimate bias in a comparable way in all countries; only country samples that contain a substantial amount of converted refusals can be used to analyse the differences between cooperative and reluctant respondents. This section analyses the differences between cooperative and reluctant respondents in the five ESS countries where the number of reluctant respondents exceeded 100: Switzerland, Germany, Estonia, the Netherlands and the Slovak Republic17 (see Table 8.5).
8.3.2.1 Methodological decisions The reluctant respondents are compared to cooperative respondents on a number of background, attitudinal and media use variables. The survey questions that showed a rather large average absolute standardized bias according to the post-stratification approach are included here as well (see Appendix 8.1). We first focus on differences in survey outcomes between cooperative and reluctant respondents, before studying the relationship between some of these variables and the kind of respondent (cooperative/ reluctant) using multivariate logistic regression models.
17
Slovenia also had a large number of converted refusals, but owing to defective identifications in the main file and the contact forms file, these data could not be analysed.
DETECTING AND ESTIMATING BIAS IN THE ESS
231
8.3.2.2 Differences between cooperative and reluctant respondents Table 8.6 shows the bivariate differences between cooperative and reluctant respondents. Some variables are not included because no significant differences were found in any of the countries. Nonsignificant outcomes in individual countries are also excluded. In Switzerland, for example, there were only three variables where reluctant and cooperative respondents differed. Taking Germany as an example of how to read the table, it can be seen that men make up a larger proportion of cooperative respondents (48.3%) than of the reluctant respondents (42.7%). This means that women were more likely to agree to an interview than men during the refusal conversion phase. This does not mean that women have a greater reluctance to participate than men; it is possible that women are simply more prepared to participate after an initial refusal, or that more women than men were re-approached after an initial refusal in Germany. It is not possible to reject one of these two possible explanations, since the sex of the sampled cases who refused is not recorded in Germany. The sex of the target persons seems to play a significant role in survey participation in Germany and the Netherlands. Women in these countries make up an even larger proportion of the reluctant respondents than of the cooperative respondents. This is unfortunate, since in Germany the proportion of women was about right among the cooperative respondents,18 while in the Netherlands females were already over represented.19 Are Dutch women more willing than men to participate after an initial refusal, or are more women than men re-approached for refusal conversion in the Netherlands? Unlike in Germany, it is possible to exclude the ‘differential re-approach’ hypotheses, since the contact data for the Dutch sample contains information about the sex of the sampled persons who refused. Of both the men and women who refused, 90% were re-approached, resulting in 35% successful interviews among men versus 50% among women. It is clear that further distortion of the Dutch sample in the direction of a serious overrepresentation of women is caused by a greater readiness of women to participate in a survey after an initial refusal. It can be concluded from Table 8.6 that there are significant differences in background variables between cooperative and reluctant respondents, but never in all of the countries, and the direction of the relationship is often not consistent. Some of the attitudinal indicators (see Appendix 8.2) are related to cooperation, but here too the effects are not consistent across countries. In some countries there is a weak relationship between the type of respondent and their attitudes towards societal and political institutions. Among reluctant compared to cooperative respondents, for example, political participation is higher in Estonia and the Slovak Republic, and civil 18
The general population of 15 years and older in Germany consists of 48.5% men and 51.5% women. The sample of cooperative respondents reflects this distribution very well. 19 The general population of 15 years and older in the Netherlands consists of 49.1% men and 50.9% women.
232
Table 8.6
Cooperative (C) versus reluctant (R) respondents: bivariate significant differences (p < 0.05) in ESS 2a
Explanatory variables (see Appendix 8.1)
N
DE Germany
EE Estonia
NL Netherlands
SK Slovak Rep.
C
R
C
R
C
R
C
R
C
R
2059
175
2378
492
1789
200
1358
523
1407
105
48.3 45.9 2.4
2.7
2.61 28.5 39.6 31.9
63.1
42.7 49.3
44.7
37.1
2.4 25.7 32.8 41.5
56.8
31.5 39.7 28.8
20.0 47.5 32.5
27.5 40.5 32.3
15.0 44.0 41.0
12.9 56.7 30.4
9.8 62.9 27.3
50.4 89.3 16.3
65.0 94.5 10.5
90.7
93.8
42.1
45.8
3.9
3.2
21.8 64.6 13.6
9.9 75.3 14.9
5.0
5.7
61.1
50.9
DESIGNS FOR DETECTING NONRESPONSE BIAS
Background variables % Males Mean age Mean number of household members Urbanicityb % Countryside and village % Town and small city % Large cities and suburbs Level of educationb % Lower and lower secondary % Upper and post-secondary % Second stage of tertiary Labour market status % with paid job % who ever had job % ever unemployed Religion Religious involvement (0–10) Health Self-reported health status (% good)
CH Switzerland
Satisfaction and integration Satisfied with own life (0–10) Social isolation (0–10) % Feeling safe
3.9
8.1
Media use Minutes daily TV watching Minutes daily radio listening Minutes daily newspaper reading Internet use (no access to every day ¼ 0–7) Only w2-values or t-values that are significant at 0.05 level are shown. b 2 w applied to the complete table of the variable. a
4.3
7.9
4.7
5.0
5.4
5.6
7.0
6.7
76.1
70.8
115.1 84.4 32.5
123.0 92.6 35.9
5.9 0.6
5.5 0.7
5.2
5.6
5.4
5.6
7.8 4.4
7.4 4.6
119.3 128.7
3.0
3.7
1.0
1.3
61.5
43.7
DETECTING AND ESTIMATING BIAS IN THE ESS
Attitudes Admit immigrants in country (0 pos. 10 neg.) Perceived ethnic threat (0 high 10 low) Political participation (0–5) Civil obedience (0–10)
233
234
DESIGNS FOR DETECTING NONRESPONSE BIAS
obedience is greater in the Netherlands. No significant differences were found between cooperative and reluctant respondents for the variables relating to trust in political institutions, political competence, satisfaction with institutions and social trust. These variables are therefore not reported in Table 8.6. A relatively consistent picture is found for the indicators relating to satisfaction, social integration and (subjective) safety. Well-integrated, happier persons make up a larger proportion of the cooperative respondents than of the reluctant respondents. Significantly higher scores for life satisfaction are found among cooperative respondents in three countries, while those who are socially isolated are more likely to be reluctant respondents in the Netherlands. In Germany and the Slovak Republic, cooperative respondents feel safer walking in their neighbourhood after dark than reluctant respondents, with a particularly large difference among Slovakian respondents. The other variables analysed, including satisfaction with the government, being a member of a group that is discriminated against and managing comfortably on the household income, are not related to survey participation and are thus not included in the table. Finally, reluctant respondents make more intensive use of the media. This is observed in Germany (TV, radio, newspaper), the Netherlands (TV) and Estonia (Internet). 8.3.2.3 Detection of nonresponse bias in multivariate logistic regression models When the ‘traces of nonresponse bias’ are studied in a multivariate context, the most dominant relationships emerge, while more spurious ones disappear. Table 8.7 presents the results of a logistic regression model.20 The dependent variable is the odds ratio between the two types of respondent (reluctant versus cooperative). Effect coding has been used for categorical explanatory variables with more than two categories (Gupta, 2008).21 The odds ratios express changes in probability ratios between a category of the dependent variable and the reference category, not changes in probabilities. For example, an odds ratio of 0.797 for men means that the ratio ‘reluctant/cooperative’ is 0.797 times lower for men than for women.22
20
The variables were selected through a forward selection process. Backward and stepwise selection both lead to the same outcome. 21 A ratio of 1 means that the predictor has no effect at all. For categorical variables, the larger the deviation is from 1, the greater is the effect of a particular category on the dependent variable. For quasi-metric/ interval variables, the parameters indicate the change in the probability ratio (odds) for one unit change in the predictor variable. Parameters between 0 and 1 indicate a decrease in the ratio, reflecting a greater propensity for the explanatory variable to drive initial cooperation. Parameters larger than 1 indicate an increase in the ratio and therefore a propensity for the explanatory variable to drive cooperation at the refusal conversion stage. See also www.ats.ucla.edu/stat/mult_pkg/faq/general/effect.htm 22 It is possible to compute probabilities and changes in probabilities compared to a reference on the basis of the odds ratios (Allison, 1999, pp. 11–14).
Logistic regression estimates (odds ratios) for reluctant versus cooperative respondents in ESS 2
Explanatory variables (see Appendix 8.1) Background variables Male (¼Yes) Age Number of household members Urbanicity Countryside and village Town and small city Large cities and suburbs Level of education Lower and lower secondary Upper and post-secondary Second stage of tertiary Labour market status Paid job (1 ¼ yes) Ever job (1 ¼ yes) Ever unemployed (1 ¼ yes) Good health (1 ¼ yes) Comfortable income (1 ¼ yes) Religious involvement (0–10) Attitudes Perceived ethnic threat (0–10) Trust political inst. (0–10)
CH Switzerland
1.177
DE Germany
EE Estonia
0.797 1.019
0.706
0.634 0.816 1.312
0.677 1.293 1.143
1.019
1.863 1.357
SK Slovak Rep.
0.584
NL Netherlands 0.614
0.682 1.758 0.834
0.817 1.243 0.984
0.425
1.694 0.589 1.296
1.887 1.334
DETECTING AND ESTIMATING BIAS IN THE ESS
Table 8.7
1.136 1.101 235
(continued )
236
Table 8.7 (Continued) Explanatory variables (see Appendix 8.1)
Satisfaction and integration Satisfied with life (0-10) Social isolation (0–10) Feeling Safe (¼yes) Media use TV watching (minutes/day) Internet use (no access to every day ¼ 0–7)
DE Germany
EE Estonia
SK Slovak Rep.
0.911
1.096 1.067
0.464
1.065
NL Netherlands
0.862 1.097
1.001 1.051
p < 0.001; p < 0.01; p < 0.05; (ns) ¼ not significant at 0.05 level. Only statistically significant parameters are reported (p < 0.05) for all variables, except for categorical variables where all parameters are reported when one of the categories has a significant effect on the response variable.
DESIGNS FOR DETECTING NONRESPONSE BIAS
Political participation (0–10) Civil obedience (0–10)
CH Switzerland
DETECTING AND ESTIMATING BIAS IN THE ESS
237
In Switzerland, the model reduces to one single parameter. Target persons from larger Swiss households are initially more reluctant to participate but are more often persuaded to cooperate at the refusal conversion stage. Resistance to immigration and satisfaction with own life – which differed in the bivariate comparisons in Table 8.6 – are no longer significant in the logistic regression. In Germany, converted refusals are significantly more likely to be women than men and are also more likely to be older, live in large cities, engage in Internet surfing, have a history of unemployment and be less likely to participate in politics. This is again rather different from the bivariate comparisons, with some new predictors becoming significant in the logistic regressions and others dropping out. As in Germany, women in Estonia are more likely to become converted refusals, although this was not found in the original bivariate comparisons. Whilst having a paid job was also found to be a predictor of being a converted refusal, living in a village and ever having been unemployed were associated with a greater likelihood of being a cooperative respondent. In the Slovakian sample, the odds of being a converted refusal rather than a cooperative respondent increase somewhat with age. Furthermore, respondents who have attained a middle level of education, are more religious or feel comfortable with their household income are also somewhat more likely to be converted refusals than those who do not have these characteristics. Ever having had a job, as well as feeling safe walking in the neighbourhood after dark, both make respondents less likely to be converted refusals. The Dutch case seems to be the most complex of the five countries. Here, the likelihood of being a converted refusal increases when the respondent is female, has an average education level, watches more television and surfs more frequently on the Internet. Converted refusals are also more likely to have ‘ever had a job’ and to feel healthier than initially cooperative respondents. Respondents in the Netherlands who see immigrants as a threat are also more likely to be converted refusals than those who feel less threatened. This is in line with earlier findings comparing reluctant and cooperative respondents (Billiet et al., 2007). What is surprising are the effects of trust in political institutions, civil obedience and greater participation in political organizations; respondents who trust political institutions, obey the law and participate in political organizations are somewhat more likely to be converted refusals who initially refuse. According to the literature presented in Section 6.3.2, we would have expected them to be more likely to cooperate immediately. The effects of feeling socially isolated and dissatisfaction with own life are in line with what might be expected from previous studies. Summing up, we find that the type of respondent, in terms of ‘reluctant’ versus ‘cooperative’, is related to social-demographic variables, attitudinal indicators and some other characteristics. These effects remain after controlling for other background variables. It is possible that these reluctant respondents are similar to those who never agreed to do an interview (final refusals), although there is no definitive evidence for this assertion. The indications of bias are not the same across the different countries. The differences between countries could be artefacts
238
DESIGNS FOR DETECTING NONRESPONSE BIAS
of differences in the refusal conversion practices of the survey organizations and of interviewer behaviour and decisions in the field. This would mean that the converted respondents are not comparable across countries because in one country all refusals are re-approached, whereas in others a selection is made on the basis of information collected at earlier contacts. Some types of refusal may be prioritized when the survey organization, the fieldwork supervisor or the interviewer selects cases for refusal conversion that are more likely to cooperate than others. This may result in an overrepresentation of ‘soft’ refusals among the converted refusals that are not representative for all final refusals. There is certainly evidence of some variation in these procedures between countries participating in the ESS (see Chapter 7). The differences between converted refusals and cooperative respondents are relatively small. This might indicate that nonresponse bias is not really problematic, but it is also possible that the converted refusals represent the ‘softer’ refusals and are therefore a poor proxy for the final refusals. This could be caused by an overrepresentation of easy-to-convert (‘soft’) refusals among the converted refusals. For this reason, we will now try to distinguish between the reluctant respondents that were easy and those that were hard to convert (see Section 7.3.2). 8.3.2.4 Differences in soft and hard refusals among reluctant respondents What are the characteristics of easy and hard-to-convert refusals, and what proportion of the reluctant respondents were hard to convert? The best candidates for answering this question are the target persons from the Netherlands and Germany, since both countries had enough successful refusal conversions to generate sufficient numbers of reluctant respondents. On the basis of the analyses in Section 7.3.2, two different criteria are identified for separating easy and hard-toconvert refusals: in the Netherlands, whether reluctant respondents had refused once (easy) or more than once (hard); and in Germany, the interviewer’s assessment of the likelihood of future cooperation (see also Table 7.2). As more efforts were made to achieve refusal conversion in the Netherlands than in Germany (for instance, in the latter country repeated refusals were not re-approached), it is to be expected that the differences between cooperative and reluctant respondent are greater in the Netherlands than in Germany. The German results are presented first, in Table 8.8. There is no significant difference in education level, feeling safe after dark, and the amount of time spent listening to the radio each day, and these variables are therefore not shown in the table. The estimates from those reluctant respondents who were earlier classified as ‘will definitely not cooperate’ were generally more different from the initially cooperative respondents than the other groups of reluctant respondents. They were more often older, female and less healthy, and spent more time watching TVand reading newspapers. These results suggest that reapproaching soft refusals to try to enhance response rates may not be enough on its
Traces of bias suggested by refusal conversion and interviewer assessment of future cooperation, Germany, in ESS 2.a
Explanatory variables
N % Males w2 ¼ 14.41 ; df ¼ 4 Mean age t ¼ 5.37 Mean number of household members t ¼ 2.68 Urbanicity % Countryside and village % Town and small city % Big cities and suburbs w2 ¼ 23.57 ; df ¼ 8 Good health (% yes) w2 ¼ 17.38 ; df ¼ 4 Religious involvement (0–10) t ¼ 4.86 Media use Minutes daily TV watching t ¼ 2.69 Minutes daily newspaper reading t ¼ 4.03
2378
Reluctant respondents All
Probableb
Definitely not
No estimation
492
136
205
83
68
Proxy refusal
48.3
42.7
48.5
36.6
39.3
55.8
46.1
49.5
47.6
51.4
50.9
46.0
2.6
2.4
2.4
2.5
2.3
2.7
28.5 39.6 31.9
25.7 32.8 41.5
27.8 30.2 42.0
27.7 34.1 38.1
25.7 31.3 43.0
13.3 36.2 50.5
63.1
56.8
63.7
48.4
62.9
61.2
3.1
3.6
3.0
3.5
4.0
2.2
115.1
123.0
121.0
128.1
115.4
121.2
32.4
35.9
30.7
40.5
32.0
38.7
p < 0.001; p < 0.01; p < 0.05. Only variables with significant differences between kinds or respondents are shown in the table. b Will probably cooperate, will cooperate, will probably not cooperate.
239
a
Cooperative respondent
DETECTING AND ESTIMATING BIAS IN THE ESS
Table 8.8
240
DESIGNS FOR DETECTING NONRESPONSE BIAS
Table 8.9 Traces of bias due to refusal using the number of refusals, bivariate differences, Netherlands, in ESS 2 Explanatory variables (see Appendix 8.1) Background variables % Males w2 ¼ 17.48 , df ¼ 2 Level of education % Lower and lower secondary % Upper and post-secondary % Second stage of tertiary w2 ¼ 8.77; df ¼ 4; prob ¼ 0.07 Attitudes Perceived ethnic threat (0–10) t ¼ 9.70 Civil obedience (0–10) t ¼ 3.41 Satisfaction and integration Satisfied with own life (0–10) t ¼ 16.19 Social isolation (0–10) t ¼ 5.43 Media use Minutes daily TV watching t ¼ 7.00 Minutes daily newspaper reading t ¼ 3.02
Cooperative
Refused once (easy to convert)
Refused more than once (hard to convert)
44.7
42.7
30.2
12.9 56.7 30.4
9.4 65.4 25.2
10.3 59.7 30.0
5.2
5.6
5.6
5.4
5.4
5.7
7.8
7.6
7.3
4.4
4.4
4.8
116.4
128.8
126.0
34.4
32.5
39.2
p < 0.001; p < 0.01; p < 0.05.
own to reduce nonresponse bias because these soft refusals do not represent all refusals. As noted in Chapter 7, the reissue probabilities are much higher in the Dutch sample of refusals than in the German sample, and far more Dutch refusals were reissued a second time; 44% of the reluctant respondents in the Netherlands refused twice or more before they were persuaded to cooperate (Beullens, Vandecasteele and Billiet, 2007, p. 23). Table 8.9 compares those who refused once (easy to convert) and those who refused repeatedly (hard to convert) with the cooperative respondents who never refused. 44.7% of cooperative respondents were men, compared to just 30% of those who had refused more than once. The hard to convert obey the law more, are less
DETECTING AND ESTIMATING BIAS IN THE ESS
241
Table 8.10 Multinomial baseline logit estimates (b) and odds ratios of belonging to soft and hard refusals versus cooperative respondents (reference) with respect to background, attitudinal and media use variables, the Netherlands, in ESS 2 Explanatory variables (see Appendix 8.1)
Refused once (¼ soft) b
Male 0.130 Single 0.291 Level of education Lower and lower secondary 0.202 Upper and post-secondary 0.1 Second stage of tertiary 0.029 Minutes watching television/day 0.003 Minutes reading newspaper/day 0.001 Perceived ethnic threat 0.078 Trust in political institutions 0.078 Social isolation 0.003 Satisfied with own life 0.093 Max-rescaled R2 0.076 2 log likelihood 2786.151
Refused twice (¼ hard)
Odds ratio
b
Odds ratio
0.878 0.748
0.928 0.496
0.395 0.609
0.817 1.260 0.971 1.003 0.999 1.081 1.081 0.997 0.911
0.401 0.115 0.286 0.002 0.008 0.156 0.145 0.112 0.204
0.670 1.121 1.331 1.002 1.008 1.169 1.156 1.119 0.815
p < 0.001; p < 0.01; p < 0.05.
satisfied with their life, feel more socially isolated and spend more time reading newspapers than the other groups. Table 8.10 reports the significant covariates of a multinomial regression model. The ratios of ‘soft’ (refused once) and ‘hard’ to convert (refused twice) compared to being a cooperative respondent (reference) are presented. The odds for a man of being a ‘refused twice’ rather than an initially cooperative respondent are only 0.395 of the ratio for a woman. Put differently, the ratio between being a ‘cooperative’ and a ‘refused twice’ respondent is 2.532 (1/0.395) higher for females than for males (153% higher). Among all respondents, men are thus less likely to be ‘hard’ reluctant respondents. Does this mean that women are less likely to cooperate in the survey than men? Or does it simply mean that women are much more inclined to participate in the face of repeated and insistent requests from interviewers? It is striking that in most of the cases the parameters are only significant for the ratio ‘refused twice/cooperative’, and not for the ratio ‘refused once/cooperative’. This means that the effect of the background variables on cooperation is most pronounced when cooperative respondents are compared with the hard-to-convert reluctant respondents. They may therefore be the most informative about the final refusals, especially since in the Netherlands nearly all refusals are re-approached. The
242
DESIGNS FOR DETECTING NONRESPONSE BIAS
following variables are significant in the model for one or both of the groups: sex, household size (single versus multiple-person), education level, TV watching and newspaper reading, perceived threat from immigration, political trust, social isolation and life satisfaction. 8.3.2.5 How to adjust the substantive data using information from reluctant respondents A common way of adjusting for nonresponse bias is the computation of weights based on response propensity scores. This weighting technique aims to correct for differences caused by the varying inclination of individuals to participate in a survey. In order to obtain propensity scores, a source is needed that provides unbiased estimates. This source is normally a probability-based reference survey with much higher response rates that is believed to produce unbiased estimates (Duffy et al., 2005; Bethlehem and Stoop, 2007). This is the so-called ‘gold standard’ that is used to improve the new survey. Through logistic regression, the probability of each respondent participating in the target survey that has to be adjusted is estimated according to a set of relevant variables (Schonlau et al., 2006). If the target survey is a web-based survey, for example, a face-to-face survey with much better response rates is used as the reference survey (Lee, 2006; Loosveldt and Sonck, 2008). In theory, information collected from reluctant respondents could be used to construct a reference survey, together with the cooperative respondents. However, a number of serious problems have to be addressed before we can use this to adjust for nonresponse bias. First of all, refusing is only one kind of nonresponse; the failure to contact target persons may be caused by other factors. Information about refusals should therefore be combined with information about noncontacts when adjusting the final sample. Secondly, the information obtained from a sample of converted refusals is only useful for the computation of propensity scores for refusal bias when all initial refusals, or at least a random sample of them, are re-approached. Analysis of several rounds of call record data in the ESS shows that in practice neither of these is usually the case. Fieldwork supervisors and interviewers may have made systematic choices about which cases to reissue based on information about the refusals and the interviewers’ subjective estimates about the likelihood of their future cooperation (Beullens, Billiet and Loosveldt, 2009a). On other occasions, it might have been the performance of a particular interviewer that led to a reissue. What is clear is that it is never all refusals or a truly random subsample that are reissued. A third problem is that reluctant respondents are initial refusals who were willing to cooperate following additional requests. A number of studies have shown that it cannot be assumed that they represent final refusals (for an overview, see Stoop, 2005, pp. 112–19). Given these problems, we do not apply propensity weights adjustment using only information from reluctant respondents. We will include reluctant respondents in propensity weighting in Section 8.3.4, together with core information on final refusals.
DETECTING AND ESTIMATING BIAS IN THE ESS
243
8.3.2.6 The reluctant respondents approach: concluding remarks In previous rounds of the ESS, traces of bias were explored and models were tested in order to ascertain whether nonresponse could have an effect on relevant constructs (Billiet et al., 2007). The results obtained were mixed, and traces of bias were not always found in the same variables in all countries in which the analysis was possible (see also Lynn et al., 2002b). This suggests that the bias differs according to the country samples. Furthermore, several problems emerge when an attempt is made to detect bias by comparing reluctant respondents with cooperative respondents, especially from a comparative perspective. The basic requirement for cross-national comparison (i.e. equivalent samples) is not met, since the procedures for identifying, selecting and re-contacting reluctant respondents differ. As a result, even if an improved method with a higher proportion of converted refusals based on all nonrespondents or a random sample of nonrespondents was possible within one or more countries, there is still a lack of comparable data and adjustments for all countries. A further problem is that numbers of converted refusals differ across countries in the ESS and across survey rounds, thereby reflecting differences in fieldwork strategies, fieldwork success rates and budgets. It also seems that the decision to classify someone as a ‘converted refusal’ is too dependent on differences between interviewer decisions or even differences in the ‘fielding culture’ of the survey organization with regard to the treatment of initial refusals. Differences in privacy regulations may also affect the cross-country differences in the characteristics of reluctant respondents, since in some countries it is prohibited to re-approach or collect information about persons who refuse to cooperate in a survey. These factors make it impossible to rely on this kind of information when the aim is to adjust for nonresponse using auxiliary information that is assumed to apply to all sampled units. Does all this mean that the ‘reluctant respondents’ approach is of no value in estimating bias in a cross-national situation? Not necessarily: while the approach is not currently optimal, it could be improved – for example, by insisting that all countries randomly reissue a subsample of refusals. Even where completely comparable information for all country samples is not available, it can still be used to warn researchers that there is serious bias in some variables. The size of some country samples increased substantially because of refusal conversion, and this can be an important outcome in its own right. Furthermore, including refusal conversion in a survey design might help to improve the atmosphere within a survey organization, since it is communicated to the interviewers that investing in high response rates is important.
8.3.3
Using additional observable data collected for all target persons
In the ESS, call record data are collected for every issued sample unit and recorded on ‘contact forms’ (see Appendix 3.1). At the first visit, the interviewer is also expected to observe and record characteristics of the neighbourhood (See Neighbourhood Characteristics Form on p. 74). The observations about housing and neighbourhood
244
DESIGNS FOR DETECTING NONRESPONSE BIAS
must be classified in a limited number of pre-coded categories. Whenever possible, the interviewer also records the sex and age (category) of the target person. This information is collected to facilitate the assessment of nonresponse bias. The quality of these data was rather weak in some countries in ESS 1, but improved in later rounds (Cincinatto, Beullens and Billiet, 2008). ESS 2 data are used here for illustration. The fact that we have information for almost all kinds of nonrespondents (refusals, noncontacts and most other nonresponse categories) is an advantage over the reluctant respondent approach discussed in the previous section. The main weakness in the ESS is that information is available on only a very limited number of neighbourhood variables, while there may also be some issues concerning the reliability of the measurements because the interviewer observations are made without clear coding schemes or training. Information about estimated sex and age is, of course, only available for target persons who are contacted, unless this information is part of the sampling frame. The utility of these auxiliary data will be reviewed after a summary of the main findings of an analysis of observable data in ESS 2 by Cincinatto, Beullens and Billiet (2008). 8.3.3.1 Measurements and analysis method Since the collection of observable information about all sample cases is a demanding task, the amount of missing data is larger than elsewhere in the contact forms. In household and address samples, no information is available about the sex and age of target respondents when no contact was made at all, or when participation was refused before the respondent selection took place in household and address samples. Countries with large amounts of missing data were dropped from the analysis. In terms of the combined age and sex information, only five country samples had less than 10% missing data on the combination age and sex, so this information is not used here. The quality of the neighbourhood information is higher; 14 country samples had less than 10% missing data and were therefore included in the analysis. As in the previous sections, outcomes are presented for a multiple-indicator construct rather than for single questions. Whilst the item concerning the type of dwelling in which the respondent lives has to be treated as a discrete variable, the other three questions, about the physical state of the buildings in this area, litter and levels of vandalism, are clearly related. Factor analysis shows that an equivalent configuration of two latent variables applies to the 14 countries: the questions about litter and vandalism measure one latent variable, the third question about the physical state of the dwelling does not belong to this variable but, as expected, measures a different concept. These two factors, ‘litter/vandalism’ and ‘physical state’, are used in further analysis. The correlation between these two variables ranges from 0.22 to 0.60. Neighbourhoods in which litter, rubbish or vandalism are common are also more likely to have buildings and dwellings that are in poor condition.23 A strange outlier 23
We cannot guarantee that the observations are independent. The correlations may be high because of the interviewer’s impressions.
DETECTING AND ESTIMATING BIAS IN THE ESS
245
here is Austria, where the correlation between the two variables is negative (0.19). When problems of multicollinearity are met during analysis, the two variables are combined into one ‘neighbourhood condition’ variable. This is the case in the samples for Portugal and the Czech Republic (Cincinatto, Beullens and Billiet, 2008, p. 15). In order to study the effects of the housing and neighbourhood variables, multinomial logistic regression (baseline category logit) modelling is used with type of target person in terms of response/nonresponse as the dependent variable. Possible outcomes are ‘initial refusal’ versus ‘cooperative’ or ‘final noncontact’24 versus ‘cooperative’. Taking the initial refusals instead of final refusals may produce a better estimate of the factors that affect the decision to refuse, regardless of what happens later, following refusal conversion attempts. As stated earlier, the explanatory neighbourhood variables refer to the type of dwelling in which sample persons live, the physical state and the presence of litter/vandalism in the neighbourhood. The two latter variables are (quasi-) metric variables. Higher values correspond to a neighbourhood that is in a relatively poor condition and that is more prone to litter and/or vandalism. The type of dwelling is a categorical variable. The optimal categorization is into two classes, namely ‘apartments’ and ‘other dwelling types’. Age and sex are also ‘observable variables’. These two variables are only included in three countries, where the amount of missing data is not too high. The interaction effects between housing type and the other neighbourhood variables are always tested. In order to ascertain which interactions must be included in the final model, a stepwise regression was performed. The basic idea behind the analysis was to find a parsimonious multinomial logistic regression model for each country to explain the outcome variable under consideration. Those variables that did not make a significant contribution to the model were eliminated one by one, respecting the hierarchical structure in the model: nonsignificant interaction terms were dropped first, then the additive terms. This was done until the model did not significantly deteriorate. After the parsimonious model had been determined for each country sample, the analysis of variance statistics (degrees of freedom, Wald chi-square and probability level) for each variable was examined in order to obtain an idea of the explanatory power of each explanatory variable (Cincinatto, Beullens and Billiet, 2008). 8.3.3.2 Effects of neighbourhood characteristics on initial nonresponse and final noncontact Table 8.11 contains only the odds ratios for the variables that had a significant effect (p < 0.05) on the probability ratios ‘initial refusal/cooperative’ and ‘noncontact/ 24
Final noncontacts are only included when their absolute number exceeds 100. To be more specific, in Estonia, Finland, Hungary, the Netherlands, Poland and Portugal, the noncontacts will not be included since their respective absolute numbers are 85, 59, 0, 78, 20 and 78. In Hungary, the number of noncontacts was 128, but due to inconsistencies in the dataset, this country is excluded here too (Billiet and Pleysier, 2007, p. 51).
Multiplicative logistic regression parameters (odds ratios) of neighbourhood variables on the contact outcomes in ESS 2a
Country
a
Austria Belgium Switzerland Estonia Spain Finland Greece Hungary Italy Netherlands Poland Slovak Republic Czech Republic Portugal
Physical state (highest score ¼ very bad)
Litter and/or Interaction Interaction vandalism (highest dwelling type and dwelling type and score ¼ very physical state Ref: litter and/or common) other dwelling vandalism Ref: types other dwelling types R/C
R/C
N/C
R/C
0.835 1.083 1.158 1.274 1.276 1.205 1.350 1.151 1.020 1.122 1.484 1.035 1.028 1.151
0.848 1.525 1.434 1.186 1.081
0.873 0.916 1.461 1.462 0.680 1.261 0.918 1.158 1.501 1.046 1.390
b
2.358 b
1.811
b
N/C
1.229 1.426 1.023 b 1.251 b 0.834 1.272 1.472 0.865 b 1.163
b
1.484 1.117 b
N/C
0.883
R/C
N/C
R/C
0.798
1.098
1.272
0.829 0.765
1.191
1.552
0.792 1.398
0.818
b
1.252 0.397 0.936 1.055 b 1.471
1.132 0.939
2.045 0.869
R/C, initial refusals/cooperative; N/C, final noncontacts/cooperative. Noncontacts are excluded from the analysis because the number of observations in this category is too small to obtain a stable estimation. c p < 0.001; p < 0.01; p < 0.05; Empty cells: variable not in parsimonious model. b
N/C
DESIGNS FOR DETECTING NONRESPONSE BIAS
AT BE CH EE ES FI GR HU IT NL PL SK CZ PT
Dwelling type: apartment Ref: other dwelling types
246
Table 8.11
DETECTING AND ESTIMATING BIAS IN THE ESS
247
cooperative’. Nonsignificant main effects of variables are also reported when these variables are included in a significant interaction term at a later stage, since the models are hierarchical. In some countries, the parameters ‘noncontact/cooperative’ were not tested where there were too few noncontacts to provide stable results. The table reports the parameter estimates of multinomial logistic regression models for 14 countries, two of which (the Czech Republic and Portugal) are presented at the bottom of the table because different models had to be used due to multicollinearity. In eight out of 14 country samples, the effect of living in an apartment (multi-unit dwelling, flat; see Box 8.1) as compared with other types of dwellings on the probability ratio ‘initial refusal/cooperative’ was positive. This means that in most countries (CH, EE, ES, FI, GR, NL, PL and PT) it is more likely that target persons who initially refused to cooperate were living in an apartment rather than in other types of dwellings. The effects are also positive in five other countries, but not significantly different from zero at the 0.05 level. Living in an apartment can mean several things. Sometimes it might be mainly related to social class, but it might also co-vary with urbanicity in countries where apartments are more common in urban regions. More in-depth investigation is needed in order to improve the understanding of the relationships between living in apartments and nonresponse outcomes. Austria is an interesting exception. Target persons living in apartments in Austria seem less likely to refuse initially. It is also less likely that target persons in Austria are not contacted when they live in apartments. This might be a consequence of the high number of telephone contact attempts in that country (see Tables 6.1 and 6.2). This is different from the findings in five other countries where significant effects on noncontacts were found. In these countries (BE, CH, GR, IT and CZ), it is more likely that the contact attempt will fail when the target person lives in an apartment. A fairly large effect of dwelling type on initial nonresponse is observed in Poland (odds ratio 1.484). The largest effect in terms of failing to make contact is found in Greece, where the ratio ‘noncontact/cooperative’ is 2.358 times higher where the target person lives in an apartment compared to another dwelling type. In sum, it can be concluded that the dwelling type clearly plays a role in explaining differences in response rates.Tables 6.1 and 6.2 The physical state of buildings in the area where target persons live is more often related to survey participation than the presence of litter or vandalism. The effect of a bad physical state is positive on initial refusal in eight countries, including Portugal, where the two neighbourhood variables are combined. This indicates that target persons who live in neighbourhoods in a relatively bad condition are more likely to refuse initially to cooperate in a survey. The situation in Austria is again surprising, since the relationship operates in the opposite direction. Due to data-quality issues, the ratio ‘noncontact/cooperative’ could only be tested for eight countries with regard to the physical state of dwellings in the neighbourhood. In five of these countries, the effect is significantly positive, indicating that failing to establish contact with the target person is more likely in neighbourhoods characterized by litter and vandalism. There are only two countries where litter/vandalism has a
248
DESIGNS FOR DETECTING NONRESPONSE BIAS
significant main effect on the likelihood of initial refusal (Austria and Italy), but the effects operate in opposite directions. In Austria, the value of 1.461 indicates that it was more likely that target persons would refuse initially when they lived in a neighbourhood characterized by litter and vandalism. The weak significant effect in the Italian sample does not operate in the expected direction, suggesting that refusals were less common here in areas characterized by litter and vandalism. There are significant interaction effects in four countries. If we begin with noncontact, in Austria we saw that persons living in apartments were less likely to be contacted (odds ratio 0.848). Austria was the only country where the presence of litter and vandalism had an impact on the odds ratio ‘noncontact/cooperative’ (1.462). The interaction effects in Austria suggest that for target persons living in apartments, noncontactability increases when the physical state of the neighbourhood is bad (1.261), but decreases when there is litter/vandalism (0.798). The effects for ‘initial refusal/cooperative’ in Austria are similar to those for ‘noncontact/cooperative’, with one exception: the odds ratio of the interaction between dwelling type and physical state of the neighbourhood shows that target persons who live in apartments in poorly maintained areas are even less likely to refuse (relative to cooperation) than those living in other dwelling types (0.680). The results for the other countries are rather easier to interpret. In Spain, a bad physical state of the neighbourhood had a negative main effect on contactability (or more precisely, a bad physical state had a positive effect on ‘noncontact/cooperative’). For target persons living in apartments, the likelihood of ‘noncontact/cooperative’ is even higher when the neighbourhood is in a bad physical state, but lower where the neighbourhood is characterized by litter and/or vandalism. These interaction affects are similar to those in Austria. The effects for ‘initial refusal/cooperative’ in Spain are similar to those for noncontact, but in this case the main effect of living in an apartment is also significant. In the Slovak Republic, the odds ratios for ‘noncontact/ cooperative’ are not significant. The main effect of a bad physical state of the neighbourhood on ‘initial refusal/cooperative’ is highly significant; the interaction odds ratio is higher for target persons who live in an apartment in neighbourhoods that are in poor condition, but lower (again) for those who live in apartments with litter and/or vandalism in the neighbourhood. In the Czech Republic, finally, a main effect on ‘noncontact/cooperative’ is found for living in an apartment. If the apartment is located in a poorly maintained neighbourhood, the likelihood of noncontact/ cooperative reduces again. One explanation for the mitigating effect of litter and vandalism for apartment-dwellers is that ‘apartment’ is too broad a category, and could mean different kinds of dwelling types for people with different activity patterns and different socio-economic status. 8.3.3.3 Effects of neighbourhood variables on final nonresponse The second stage of the analysis focused on subcategories of the initial refusals: those who are not re-approached for refusal conversion, those who are re-approached but not contacted, those who are re-approached but not interviewed because of a second
DETECTING AND ESTIMATING BIAS IN THE ESS
249
refusal, and finally those who cooperate after an initial refusal (the reluctant respondents). In the next analysis presented here, the reluctant respondents are now the reference category, while the neighbourhood variables are the same as in the previous model. The advantage of this approach is that it makes it possible to study the contrast between reluctant respondents and all other categories of refusals. Significant effects of the predictors on the ratios will be an indication that reluctant respondents differ from final nonrespondents according to these variables and other unobserved variables that co-vary with these. The stronger the effects, the more problematic the refusal conversion approach is to bias, since it suggests that refusal conversion only manages to bring in a particular group of refusals and will therefore fail to minimize bias. Unfortunately, because of the small numbers of reluctant respondents in most countries, and because of high item nonresponse on the other variables, we were only able to include four countries in the analysis: Estonia, Switzerland, the Netherlands and the Slovak Republic. The dwelling type has no effect at all on the probability ratios (dependent variable). Moreover, none of the parameters is significant in the Netherlands, and therefore only information from the three remaining countries is reported in Table 8.12. The poor physical state of buildings in the area where the target person lives has a large effect on the ratio between re-approached (second) refusals and reluctant respondents. In the three countries, re-approached target persons living in dwellings in neighbourhoods in a very bad physical state are more likely to refuse a second time compared with those living in better areas. The effect of this variable is greatest in the Slovak Republic (1.759). In Estonia, those who live in neighbourhoods where litter and/or vandalism are common are more likely to refuse a second time when they are re-approached than those living in neighbourhoods with less litter and/or vandalism. There is another striking finding in Switzerland and Estonia, although it is not confirmed in the Slovak Republic, where the relationship operates in the opposite direction. In Switzerland and Estonia, the ratio ‘not re-approached/reluctant’ is significantly higher when the buildings in the area are in a bad physical state. This implies that the interviewers or survey organizations in these countries selected target persons living in better neighbourhoods to re-approach. This may mean that final refusals differ from converted refusals. This effect of the physical state of housing in the neighbourhood on the ratio ‘not re-approached/reluctant respondent’ operates in the opposite direction in the Slovak Republic. 8.3.3.4 Effects of neighbourhood variables on response outcomes: concluding remarks The first stage of the comparison of the odds ratios between initial refusals and cooperative respondents and between noncontacts and cooperative respondents showed that in most of the 14 countries those living in apartments were more likely to refuse to participate in the ESS at first contact than those living in other types of dwelling. Similarly, the analysis shows a greater likelihood in most countries of not
250 Table 8.12 Logistic regression parameters (odds ratios) of the neighbourhood variables on the different categories of initial refusals in three countries in ESS 2a Physical state (highest score ¼ very bad) Not re-approached Switzerland Estonia Slovak Republic a
1.399 1.585 0.511
Litter and/or vandalism (highest score ¼ more common)
Re-approached Noncontact Refusal 0.989 b b
1.193 1.029 1.759
Reference category: reluctant respondent. Effect not tested since number of not contacted re-approached refusals is too small. p < 0.001; p < 0.01; p < 0.05. Empty cells: variable not in parsimonious model. b
Not re-approached 1.241
Re-approached Noncontact Refusal b
1.393
DESIGNS FOR DETECTING NONRESPONSE BIAS
Country
DETECTING AND ESTIMATING BIAS IN THE ESS
251
being able to contact a target person who lives in an apartment as compared to those living in other dwelling types. Where the other neighbourhood variables, such as the state of buildings and the presence of vandalism and/or litter, have an effect, it is usually the case that poorer conditions lead to lower response rates. Refusals and noncontacts are more likely in areas characterized by the poor physical state of the buildings and the presence of litter and/or vandalism. We cannot, however, determine the real reason for this: Is it because of characteristics of the persons living in these neighbourhoods, or do prior expectations and the resultant behaviour of the interviewers play a role? In the second stage of the analysis, it was possible to discover more about possible differences between reluctant respondents and final refusals using observable data. It is unfortunate that only four countries could be included in the analysis. Nonetheless, the direction of the findings is clear: reluctant respondents and final refusals differ at least with respect to the physical state of the neighbourhood in three of the four countries. Re-approached target persons living in areas that are in a very bad physical state are more likely to refuse a second time than those living in better areas. In the Netherlands, this was not significant at all. The main advantage of the approach based on observable data in the contact forms is that, in principle, it is possible to obtain auxiliary information about all target persons, both respondents and nonrespondents. For the estimation of age and sex, the target person must have been successfully contacted at least once, and this information was therefore rather patchy.25 Hence we focused on the neighbourhood information. Indirect traces of bias can be found insofar as the neighbourhood variables are related to other substantive variables in the survey. Education, for instance, correlates both with attitudes and with the type of dwelling and neighbourhood characteristics (Cincinatto, Beullens and Billiet, 2008). A major disadvantage is that it is difficult for interviewers to observe and record neighbourhood information accurately and in a standardized way. This sometimes results in a large amount of missing data. It is possible that the unexpected findings in some countries may be due to a degree of negligence in recording the information, or even to a misunderstanding of the instructions. However, in principle there should be no missing data for the dwelling and litter/vandalism items. It is also possible that the meaning of living in an apartment, or even the definition of what constitutes an apartment, is not equivalent across countries. Efforts to improve the coding of these variables (see Chapters 2 and 6) in later rounds are foreseen from ESS 5 onwards. In principle, the complete sample can be used with the addition of information about the observable variables among respondents and nonrespondents, as a kind of ‘gold standard’ in order to correct substantive variables in the dataset. As stated in the previous section, this would involve generating weights for each respondent using propensity scores. Given the differences in quality of the data recorded in the contact 25
The procedure whereby the interviewer must estimate the age and sex of the selected target person was adapted in ESS 4. Interviewers are required to record the age and sex of all persons who refuse, whether or not they are the selected target persons; that is, including proxy refusals.
252
DESIGNS FOR DETECTING NONRESPONSE BIAS
forms, it is currently not possible formally to correct the observed samples for bias in a standardized way, but we now have some indications as to which variables that are related to dwelling type and neighbourhood characteristics are likely to produce nonresponse bias. In nearly all countries, for example, there is a significant and moderately negative correlation between level of education and the physical state of the buildings in the neighbourhood. In a number of countries there are also correlations between certain dwelling types (detached houses, farms) and the education level of the residents. The next section gives an example of how neighbourhood information is combined with answers to key questions to try and improve the estimation of propensity scores.
8.3.4
The study of bias using core information on nonrespondents
8.3.4.1 Design of the study The fourth method that has been used in the ESS to estimate bias is to collect core information on nonrespondents. In ESS 3, a doorstep questionnaire survey (DQS) was implemented in Belgium, and a nonresponse survey (NRS) in Norway, Poland and Switzerland (for the difference between these approaches, see Section 8.2). In both surveys, a short questionnaire was used comprising key questions from the original ESS questionnaire (referred to henceforth as the ‘key questionnaire’).26 The aim of the nonresponse surveys was to explore the possibility of using the data collected to measure nonresponse bias and to ascertain whether it could subsequently be used to make adjustments. Three versions of the key questionnaire were used: a small version with seven questions and a long version with an additional eight questions (15 in total). In a third version, a single additional question asking for the reasons for refusal to participate in the main survey was presented to a subsample of the respondents only, because we were not sure about its effect on response behaviour. To select the questions to include in the key questionnaire, information from studies of nonresponse in ESS 1 and 2 were used. The selected questions showed the largest differences in distributions between cooperative and reluctant respondents (Billiet et al., 2007), or were most sensitive to variance inflation because of post-stratification weighting (Vehovar, 2007; Vehovar and Zupanic, 2007). They included questions on social participation, feeling safe, walking in the neighbourhood after dark, interest in politics, trust in politicians, trust in others and attitudes towards immigrants. Each version of the questionnaire included a final question that aimed to measure the survey climate. The short and long versions of the questionnaire are shown in Appendix 8.2. A major challenge when trying to collect information from nonrespondents is, of course, obtaining a high response rate, both from nonrespondents and from respon26
The study was part of a Joint Research Activity of the ESS infrastructure programme, which is cofinanced by the EC under its FP6 programme.
DETECTING AND ESTIMATING BIAS IN THE ESS
253
dents to the initial survey. The major question is whether the refusals who provide some information differ from the respondents in the main survey, and if so, whether those differences are larger than the differences between the cooperative and reluctant (converted) respondents in the main survey. It is also important to estimate the additional costs of a nonresponse survey in order to determine whether it is realistic to propose that such a survey be conducted by all participating countries. Existing rules on privacy protection are another important consideration, since it is sometimes a legal requirement to delete addresses following a formal refusal. We will first discuss the designs of the nonresponse surveys in the four countries before outlining the results from two of them. The reason for selecting two cases is the two methods that were used: the doorstep questionnaire survey (DQS) in one country (Belgium), and a follow-up survey among respondents and nonrespondents in three countries (here called a nonresponse survey – NRS). The target population and the general design differed between Belgium and the three other countries. In Belgium (BE), the doorstep questionnaire survey (DQS) targeted both initial and final refusals, and the key questionnaire was offered to refusals as soon as the request to participate in the main survey had been declined. In the other three countries, Poland (PL), Switzerland (CH) and Norway (NO), a nonresponse survey (NRS) was conducted several months after the end of the main data collection exercise in ESS 3. In these three countries, the target population comprised respondents to the main survey, refusals and final noncontacts. Respondents from the main surveys were included in the NRS because it was not certain in advance that information from the follow-up study could be linked to the target persons who did not participate in the survey (see below).27 In addition, the main modes of the follow-up study (mail survey or telephone survey) differed from the mode in the main ESS 3 survey (face-to-face). The inclusion of respondents in the NRS makes it possible to avoid mode effects. Noncontacts were included because it is to be expected that the new modes included in the nonresponse phase might bring in previously noncontacted target persons. The main features of the designs used in all countries are listed in Box 8.1. The aim was to learn from both types of studies, and we present analysis results from each type. One of these is necessarily the Belgian case, and of the other three counties Norway was chosen because the differences between original respondents and re-approached nonrespondents were largest there. Belgium28 In Belgium, the short DQS was conducted on the doorstep, during contact with respondents who refused when asked to participate in the survey. These refusals cover most of the nonresponse in the Belgium sample, since the number of noncontacts was less than 3%. Contrary to the NRS approach in the other countries, the DQS took place before refusal conversion, at the same time as data collection for 27
It was not possible to link the main ESS data file with the NRS file in Poland. This was possible in Norway and Switzerland. 28 The authors wish to thank Geert Loosveldt, Hideko Matsuo, Leen Vandecasteele and Katrien Symons for the data collection in the DQS in Belgium and for providing the data.
254
Box 8.1 A Short Description of the Main Characteristics of the Nonresponse Survey in Four Countries Response rate among Period
Mode
Type of questionnaire
Sampled respondents
Nonrespondents
BE
Initial refusals from main survey (N ¼ 694) All respondents main survey (N ¼ 1798)
PAPI on the doorstep as part of main survey contact procedure.
Short version (7 questions)
Response rate of main survey 61%
44.7% (n ¼ 303 of 694 refusals)
PL
All sampled units from main survey (N ¼ 2547)
At the same time as data collection for the main survey (23 October 2006 to 19 February 2007) Dispatch of mailed questionnaires in three waves, first batch around one month after end of main survey: (1) 16 January 2007 (2) 8 Febuary 2007 (3) 16 Febuary 2007
Questionnaires sent by mail, with introduction letter. Reminder after ten days.
Short version (7 questions) 27.9% of respondents Long versions (15 questions) 72.1% of respondents. Reason for refusal question to all nonrespondents
59.0% (1018 of 1721 )
23.2% (n ¼ 192 interviews)
25 not delivered excluded
DESIGNS FOR DETECTING NONRESPONSE BIAS
Samples
CH
NO
Samples
Period
Mode
Type of questionnaire
Sampled respondents
Nonrespondents
Sample of 50% of the noncontacts and nonrespondents (1488) from main survey and sample of respondents (300) Sample of 403 main survey respondents and 800 nonrespondents from main survey (123 extremely hard refusals excluded)
Several months after main survey: 16 May to 13 June 2007
Questionnaire sent by mail with introduction letter (D 6 incentives), followed by a CATI or web recall after 15 days. Questionnaires sent by mail with introduction letter. Reminder after two weeks. Offer to respond by web. Final nonrespondents contacted by telephone (CATI if cooperative).
Short version (7 questions) 50% Long version (15 questions) 50%
84% (252 of 300)
51.8% (n ¼ 771 interviews)
Long version (15 questions)
60.8% (245 of 403)
30.3% (n ¼ 242 interviews)
Several months after main survey: 1 March to 1 August 2007
DETECTING AND ESTIMATING BIAS IN THE ESS
Response rate among
255
256
DESIGNS FOR DETECTING NONRESPONSE BIAS
the main survey. Initial refusals who had already answered the key questionnaire and who were later converted into respondents (reluctant respondents) were not included in the data for the nonresponse survey, as they had finally become respondents: 44.7% of the refusals cooperated. Poland29 The study was a little more complex in Poland. The NRS was conducted several months after the main survey. All respondents and nonrespondents were sent a mail questionnaire. Due to different dates of winter holidays for youngsters from different regions, the despatch of the mail questionnaire was divided into three waves. A reminder or thankyou card was sent to all target persons after 10 days.30 The short and long versions of the questionnaire were randomly divided among the cooperative respondents, reluctant respondents (converted refusals) and nonrespondents. The question about the reasons for refusal was also included in the questionnaire for the nonrespondents. The response (completion) rate in the NRS was 59.5% among respondents to the main survey and 23.2% among the nonrespondents. There are differences between short and long versions of the questionnaire. The response rate was 57.3% among the respondents who received the long questionnaire (n ¼ 488) and 62.7% among those who received the short version (n ¼ 855). Among the nonrespondents, the difference in the NRS response rates between the long (n ¼ 404) and short (n ¼ 230) versions was not significant. The response rate was somewhat higher (25.5%) for main survey nonrespondents who had been classified as ‘not able to cooperate’ in the main phase. The latter received only a long version. In the Polish NRS, it was also possible to distinguish a category of respondents who participated after an initial refusal in the main survey (reluctant respondents).31 The subsamples that received the long (n ¼ 28) and the short (n ¼ 25) versions were very small. The response rates differed somewhat, being 32% for the short version and 28.6% for the long version. The response rate among the reluctant respondents was somewhat higher than among the nonrespondents from the main survey, but was still low. Switzerland32 The questionnaire for the NRS in Switzerland, including an incentive of CHF 10 (D 6), was sent to a random sample of respondents (n ¼ 300) from the main ESS survey and to all refusals and noncontacts who had not moved since the end of the main survey (n ¼ 1488). Target persons were first asked to respond to a mail survey, but were given the opportunity to answer via the Internet instead if they preferred. A 29
The authors wish to thank Pavel Sztabinski, Franciszek Sztabinksi, Theresa Zmijewska-Jedrezejczyk and Anna Dyjas-Pokorska for the data collection in the NRS and for providing the data. 30 Additional in-depth interviews (n ¼ 24) were conducted after the implementation of the mail NRS survey in order to obtain a better understanding of nonresponse. However, only the quantitative data are used in this chapter. 31 It was possible to distinguish these categories at group level, but owing to privacy regulations it was not possible to link the records from the nonresponse survey to the main survey records. 32 The authors wish to thank Dominique Joye and Alexandre Pollien for the data collection for the NRS and for providing the data.
DETECTING AND ESTIMATING BIAS IN THE ESS
257
telephone recall was then implemented after a period of two weeks. The planned sample of nonrespondents comprised 47 noncontacts (40% response rate), 670 personal refusals (58.2% response rate), 71 proxy refusals (53.5% response rate), 661 household refusals prior to target respondent selection (49.3% response rate) and 39 other nonresponse cases (46.2% response rate obtained in this category). One half of the sample received the short version, the other half the long one. The response rates for the short and long versions were about the same. Norway33 The long version of the self-completion questionnaire was sent to most nonrespondents (n ¼ 800), excluding target persons who had refused very explicitly34 and people with language difficulties. It was also sent to a random sample of respondents (n ¼ 403) from the main survey. After two weeks, a reminder was sent to the sampled cases who had not responded to the follow-up survey. They were then invited to complete the questionnaire on the Internet. After a further twoweeks, target persons who had not yet responded were contacted by interviewers who offered them the opportunity to participate in an interview by telephone (CATI). The response rate was 30.3% among the nonrespondents (n ¼ 242) and 60.8% among the previous respondents (n ¼ 245). In summary, the response rates among the initial nonrespondents in ESS 3 are relatively high in Switzerland (51.8%) and Belgium (44.7%) and quite low in Norway (30.3%) and Poland (23.2%). The response rates in the latter two countries are similar to the success rates of refusal conversion attempts in the main survey (30% in PL and 27% in NO), but the response rates in the nonresponse survey in Belgium and the NRS in Switzerland are both much higher than the refusal conversion rates (45% versus 24% in BE and 52% versus 15% in CH). On the one hand, it seems strange that response rates in the NRS were highest in the country with the lowest response rates in the main survey (CH); on the other, it may be assumed that there are proportionally more ‘soft’ refusals in countries where there are large numbers of refusals. These ‘soft’ refusals might be easier to persuade to participate in a survey that imposes less of a burden because of the small key questionnaire. In addition, it may be that some respondents in Switzerland were positively influenced by the possibility of completing the survey in a different mode. Finally, the higher response rate may be an effect of the unconditional incentive in Switzerland, whereas in other countries no incentives were used. 8.3.4.2 Using information from doorstep questions and follow-up surveys Can the answers given by nonrespondents to the key questionnaire be used to adjust for nonresponse bias? To answer this question, we first try to ascertain how the response distributions of the respondents who participated in the main ESS 3 survey (the ‘main 33
The authors wish to thank Kristen Ringdal, Øyven Kleven, Einar Bjørshol and Frode Berglund for the data collection for the NRS and for providing the data. 34 These are refusals with reasons classified under the categories ‘do not trust surveys’, ‘never do surveys’, ‘interferes with my privacy’ or other negative statements about surveys.
258
DESIGNS FOR DETECTING NONRESPONSE BIAS
respondents’) differ from those of the ESS nonrespondents who only participated in the nonresponse surveys (the ‘DQS/NRS only respondents’). Sometimes it is also possible to analyse how the reluctant respondents differ. Variables from the key questionnaire that show significant differences are retained for further analysis. Using the information from the respondents and the nonrespondents, the response propensities are estimated using logistic regression models with the retained variables as predictors. Finally, the samples of cooperative respondents are adjusted using propensity weights.35 The DQS in Belgium We begin with a comparison of the cooperative and reluctant respondents from the main survey and the DQS respondents. To avoid confusion, we call the latter ‘refusals’. We did not expect go find large differences between the reluctant respondents (converted refusals) and the refusals, since the latter group also initially refused to cooperate in the main survey. In fact, however, Table 8.13 shows that the differences are quite pronounced. This suggests that refusal conversion probably taps into different kinds of refusals from those in the DQS, since in Belgium the DQS took place prior to any of the usual refusal conversion efforts. We will now focus on the differences between respondents and refusals, as the differences between cooperative and reluctant respondents were discussed in Chapter 7. The questions in the key questionnaire submitted to the nonrespondents in Belgium were asked in a different context from the original respondents, who answered the same questions as part of a very long survey. The differences between cooperative respondents and nonrespondents are therefore probably not only indications of nonresponse bias, but also reflections of these different contexts. It is widely acknowledged that the context generally has a larger effect on attitudinal than on factual questions, but the possibility cannot be ruled out that there is also an impact on factual questions (Schwarz and Sudman, 1992). On the majority of the key questions, the differences between cooperative respondents and nonrespondents operate in the same direction as those between cooperative respondents and reluctant respondents (see Table 8.13). This is an indication that in the Belgian case the effects cannot be completely ascribed to the context. The nonresponse surveys in the three other countries do not have the same potential problems caused by different contexts, since the key questions were offered to samples of both cooperative respondents and nonrespondents. On the basis of the overview of the literature in Section 6.3, we do not expect a relationship between survey cooperation and age. The more highly educated and the employed are expected to cooperate more, as are people who live in larger households (especially when there are children present). In the Belgian DQS, there are more people in the highest age category (60 þ ) than among the cooperative respondents, and in general refusals are older than cooperative respondents. The refusals include more less-educated persons and fewer highly educated people than the cooperative respondents. There is no difference in employment status between cooperative respondents and refusals, but the reluctant respondents are more often unemployed. 35
The authors wish to thank Hideko Matsuo from CeSO (K.U. Leuven) for the analyses of the nonresponse data in collaboration with Jaak Billiet.
DETECTING AND ESTIMATING BIAS IN THE ESS
259
Table 8.13 Responses to key questions in the small refusal survey (SRS) in Belgium according to kind of respondent in ESS 3a Explanatory variables (key questions, see Appendix 8.2)
Kind of respondent Cooperative Reluctant Refusals (DQS)
N Age 15–29 30–39 40–49 50–59 60 þ w2; df ¼ 4 Prob. H0 Education level Lower (basic) and lower vocational Lower secondary (humanities) Higher secondary (prof./humanities) Higher education w2; df ¼ 3 Prob. H0 Employment status Employed Unemployed w2 ¼; df ¼ 1 Prob. H0 Household composition One person Several persons w2; df ¼ 4 Prob. H0 Social participation Much less than most Less than most About the same More than most Much more than most w2; df ¼ 4 Prob. H0
1,658
140
303
23.4 16.0 20.7 16.2 23.6
17.9 9.3 16.4 15.0 41.4 23.11 p < 0.001
8.3 21.5 16.8 18.8 34.7 47.54 p < 0.001
25.41 10.56 35.67 28.36
28.57 6.43 43.57 21.43 7.1121 p ¼ 0.068
33.92 9.19 33.22 23.67 9.349 p ¼ 0.025
49.8 50.2
29.5 70.5 21.09 p < 0.001
47.8 52.2 ns
11.6 88.4
20.0 80.0 8.50 p < 0.005
16.4 83.6 ns
15.3 29.2 35.3 15.9 4.4
16.6 32.4 33.1 15.1 2.9 ns
27.7 27.7 33.7 5.3 5.7 36.29 p < 0.001 (continued)
260 Table 8.13
DESIGNS FOR DETECTING NONRESPONSE BIAS (Continued)
Explanatory variables (key questions, see Appendix 8.2)
Kind of respondent Cooperative Reluctant Refusals (DQS)
Political interest Very interested Quite interested Hardly interested Not at all interested w2; df ¼ 3 Prob. H0
8.8 37.2 33.2 20.9
2.1 30.0 38.6 29.3 13.94 p < 0.005
4.4 19.5 32.8 43.3 84.07 p < 0.001
Only variables of which the distributions are statistical different (p < 0.05) are included in the table. The distributions of sex and ‘feeling safe’ do not differ significantly. w2 is computed for partial cross-tables: cooperative versus reluctant, and cooperative versus DQS.
a
Differences between reluctant respondents (who cooperated in the main survey) and refusals (who answered the DQS questions) show that refusal conversion and doorstep questions pick up different kinds of refusals. These differences allow us to compare the varying impact on the survey estimates of correcting the data for nonresponse bias based on the inclusion of these two different groups. There appears to be no relationship between feeling safe in the neighbourhood and survey participation, and this variable is therefore omitted from the table. The distributions of the two other attitudinal variables, ‘social participation’ and ‘political interest’, are in line with findings in previous studies (Billiet et al., 2007, p. 148). Refusals are (or feel they are) less likely to engage in social activities than most other people compared to cooperative and reluctant respondents, and they are also the group most likely to say they have no interest in politics. This confirms the results of Groves et al. (2004) and Voogt (2004). The next issue to be explored is whether the information from the DQS can be used to adjust the sample of cooperative respondents in order to reduce nonresponse bias. The assumption in this approach is that the refusals who participate in the DQS are informative for the final refusals, and that the sample of cooperative respondents (or cooperative plus reluctant respondents) can be corrected on the basis of information about the response propensities (Rosenbaum and Rubin,1984; Rubin, 1986; Schonlau et al., 2006; Lee and Valliant, 2008, 2009). The adjustment method consists of three steps. In the first step, the odds ratios for survey cooperation are estimated using logistic regression models. The method used simultaneously includes the probability ratios ‘refusal/cooperative’ and ‘reluctant/cooperative’ in a multinomial model. In this approach, the refusals in the main survey who participated in the DQS are used to estimate the effects of the predictor variables on the nonresponse propensity or its inverse, the response propensity. The effects are also estimated for the reluctant respondents versus cooperative respondents. We focus here mainly on the parameters
DETECTING AND ESTIMATING BIAS IN THE ESS
261
for the refusals who cooperated in the DQS versus cooperative respondents, whilst briefly noting the major differences compared with the parameters for reluctant respondents versus cooperative respondents. In order to be able to adjust the data, the propensity scores are estimated (Siegel, Chromy and Copello, 2008; Lee and Valliant, 2009). These scores express the net probabilities36 of a survey response for the sample units based on their values on each of the predictor variables in an explanatory model. The most commonly used method for modelling propensity scores is logistic regression, which models propensity scores as log½eðxÞ=ð1eðxÞÞ ¼ a þ b0 f ðxÞ, where f ðxÞ represents a function of a set of covariates. Where covariates are used for the estimation, variable selection becomes an issue, because the predictive power of the covariates in the model matters. It is often recommended that all covariates be included, even if they are not statistically significant, unless they are clearly unrelated to the treatment outcomes or are inappropriate for the model (Lee and Valliant, 2008, p. 178). As the sample sizes are rather small, the variables that are related to the likelihood of responding are used as the initial input for the model, and are all retained in the propensity scores estimation, even when they are no longer significant in the model.37 The dependent variable has three categories in a multinomial regression model (cooperative respondents, reluctant respondents, refusals) and the probability ratios ‘refusal/cooperative’ and ‘reluctant/cooperative’ are estimated in the models. The odds ratios38 in Table 8.14 express the net effects of the predictors on the probability ratios. The response propensity scores assigned to the individual cases are computed using the inverse of the logistic regression parameters. In the second step, weights are computed. The technique used here for computing weights based on propensity scores is the response propensity stratification proposed by Little (1986). The weights are not derived directly from the response propensities of the individual cases estimated in the logistic regression model, but are based on strata of cases that are close in terms of the scores. Poststrata are formed by sorting sample units (or cases) based on the propensity scores and dividing the sample into groups having the same number of cases. We divided the sample into 10 deciles, since the response propensities are relatively constant within each post-stratum. The weights are obtained by dividing the expected probability of belonging to substratum i (i.e. 0.10) by the observed probability of the cooperative respondents or nonrespondents in the corresponding substratum i of the subsamples of respondents or nonrespondents, respectively. The weights are equal within classes among respondents and nonrespondents (Lee and Valliant, 2008, p. 180). This procedure is similar to that described by Lee and Valliant (2009, 36
‘Net’ means controlled for the other variables in the model. Note that the parameters are inverted so that they express cooperative respondents versus nonrespondents who cooperate in the DQS. 37 Forty-seven DQS cases with missing values on one or more of the variables have been deleted from the Belgian sample; there is too little information for any kind of meaningful imputation. Including missing values as extra classes led to impossible parameter estimates. 38 For the meaning of the odds ratios, see the explanation of Table 8.7.
262
DESIGNS FOR DETECTING NONRESPONSE BIAS
pp. 324–5),39 where a web-based survey has been adjusted to a reference survey. In our case, the obtained sample of cooperative respondents is adjusted to the augmented sample of cooperative plus DQS-only respondents. Table 8.14 presents the parameters of two different models. The models contain all variables from Table 8.13. As already stated, we will focus here on the DQS (lefthand) part of the model. It should be noted, however, that differences in parameters between initial and final refusals suggest that different factors play a role in refusal conversion from those in the DQS survey in Belgium. The refusals (who answered the DQS questions) are more likely to belong to the 30–39 and 60 þ age categories than to the 15–29 category. These are some of the strongest effects in the model. The ratio ‘refusal/cooperative respondent’ is 1.451 times higher in the 30–39 age category and 1.793 times higher in the 60 þ category than in the youngest age group (15–29 years). Those who participate more than others in social organizations are less likely (0.278) to refuse. As noted in Section 8.3.1, (lack of) interest in politics is a strong predictor for refusal; refusals are 57% more likely to say they are not interested in politics than cooperative respondents. The effect of education operates in the expected direction but is not significant. The effect of employment status is also not significant, but the employed are somewhat more likely to belong to the reluctant respondents. We have already seen this in Table 8.13. Refusal is more likely among one-person households, but this effect is also not significant. These covariates (education, employment situation and household composition) are included in the estimation of the propensity scores, as suggested by Lee and Valliant (2008, p. 178), even though they are not significant. As expected on the basis of Table 8.13, the reluctant respondents in ESS 3 appear to belong to a different category of refusals. The effect of age on the probability ratio ‘reluctant/cooperative’ is not significant, but seems to operate in a different direction for the 30–39 age category compared with the refusals who cooperated in the DQS. Contrary to the latter, we find a significant effect for those with a higher-secondary educational background. These target persons are more likely to be reluctant respondents and less likely to cooperate immediately compared with the lesseducated (reference). Employment status was discussed in the previous section. Social participation has no significant effect, while the effect of political interest operates in the same direction but is smaller for reluctant respondents than for refusals. All in all, the effects on ‘reluctance/cooperation’ are weaker than the effects on ‘refusal/cooperation (see also R2 and the Hosmer and Lemeshow statistic). For this reason, and also because this section is focused on core information from nonrespondents, further steps in the propensity score weighting approach are only applied to the DQS part of Table 8.14. The third and final step is to evaluate the results of the data after adjustment with the weights generated by the model. Two approaches can be used to assess the 39
For example, assume that the 10% sampling units of the whole sample with lowest propensity scores are 3.62% of all units in the respondents sample and 15.86% of all units in the sample of nonrespondents. Then the weights for these units are 2.76 and 0.631, respectively.
DETECTING AND ESTIMATING BIAS IN THE ESS
263
Table 8.14 Logistic regression models explaining the odds ratios ‘refusals/cooperative’ (left-hand side of the table) and ‘reluctant/cooperative’ (right-hand side of the table), Belgium, in ESS 3a Explanatory variables (key questions)
Refusals (DQS)/ cooperative
Reluctant/ cooperative
Odds ratio
Standard error
Odds ratio
Standard error
1.451 0.836 1.199 1.793
0.152 0.161 0.152 0.162
0.800 0.997 1.116 1.395
0.226 0.213 0.205 0.185
0.801
0.194
0.689
0.269
1.002 0.917
0.119 0.129
1.380 0.865
0.149 0.176
1.205
0.100
0.751
0.123
1.113
0.098
1.246
0.119
1.223 0.722
0.106 0.143
1.018 0.980
0.134 0.157
Reference: Less Political interest Not interested
1.574
0.080
1.313
0.098
Reference: Interested R2 Hosmer and Lemeshow statistic
0.055 10.980
Age (years) 30–39 40–49 50–59 60 þ Reference: 15–29 Education level Lower secondary (humanities) Higher secondary (professional/humanities) Higher tertiary Reference: Lower (basic and vocational) Employment status Employed Reference: Unemployed Household composition One-person household Reference: More than one Social participation About the same More
a
0.026 5.914
Refusals (N ¼ 248), reluctant respondents (N ¼ 138); cooperative respondents (N ¼ 1652). p < 0.001; p < 0.01; p < 0.05.
effectiveness of the weights in reducing bias. The first approach involves examining whether there are significant differences in the responses of the cooperative respondents before and after weighting. Significant differences that could be expected from previous nonresponse studies indicate that the weights are effective.The response
264
DESIGNS FOR DETECTING NONRESPONSE BIAS
distributions of the unweighted sample (observed frequencies) are tested against the weighted sample (expected frequencies). Unweighted and weighted results can be compared for all questions in the main survey, not just for the key questions. In the second approach, the complete response distributions according to kind of respondent (cooperative versus refusal) are compared between the weighted and the unweighted samples. In this approach, the weighing procedure can be regarded as very effective if the differences between cooperative respondents and refusals that existed before weighting disappear after weighting. This means that the weighting procedure has changed the sample into MAR, since there are no longer any systematic differences between cooperative respondents and refusals. The response distributions for nonrespondents (refusals) and cooperative respondents are compared using test statistics (w2 or difference between means). Table 8.15 presents the results of this first approach. It shows the unweighted sample of cooperative respondents and the weighted samples based on the propensity scores as estimated in a model explaining the ‘refusal/cooperation’ probability ratio. It can be seen that there are only small changes in the distributions of the variables used in the models. None of the differences between the unweighted and the weighted sample is statistically significant. This means that from the point of view of reducing nonresponse bias, the data quality is not improved at all by using information from the refusals who answered the DQS questions. It could be that there are differences on other variables in the main survey after applying nonresponse propensity score weights. Unweighted and propensity score-weighted samples were compared for several attitudinal questions that are the most likely candidates for bias (television watching, interest in politics, satisfaction with democracy, trust in people and an item on immigrants). None of the differences in the distributions (or means) was statistically significant. The second approach looks at whether differences between the kinds of respondent on key questions in the unweighted sample disappear after weighting. Table 8.13 presented four significant differences between cooperative respondents and refusals, namely in terms of age distribution, education, social participation and interest in politics. A test of the difference before and after weighting is presented in Table 8.16. The differences between cooperative respondents and refusals are much smaller after weighting on all four key questions in the Belgian study. For two questions (age and education), they are no longer larger than would be expected due to chance. On the other two questions, there is still a significant difference. We can therefore conclude that information provided by target persons who refused to participate on the doorstep does not completely correct for the difference between respondents and refusals who participated in the DQS. This could be due to residual nonresponse bias or to contextual effects, as the DQS questions are not embedded in the context of the main questionnaire. Most probably, it is a combination of both. Two supplementary analyses were conducted to determine whether the present weighting model could be improved. Firstly, a logistic model was tested using information from all initial refusals, DQS respondents plus reluctant respondents. The dependent variable is now the probability ratio ‘all initial refusals/cooperative
DETECTING AND ESTIMATING BIAS IN THE ESS
265
Table 8.15 Unweighted and weighted samples of cooperative respondents, Belgium, in ESS 3a Explanatory variables (key questions see Appendix 8.2) Age (years) 15–29 30–39 40–49 50–59 60 þ w2; df ¼ 4 Prob. H0 Education level Lower Lower secondary Higher secondary Higher education w2; df ¼ 3 Prob. H0 Employment status Employed Unemployed w2 ¼; df ¼ 1 Prob. H0 Household composition One person More persons X2; df ¼ 4 Prob. H0 Social participation Much less than most Less than most About the same More than most Much more than most w2; df ¼ 4 Prob. H0
Unweighted N ¼ 1652
Weighted N ¼ 1652
23.4 16.0 20.7 16.2 23.6
21.5 17.1 19.9 16.3 25.1 5.579 p ¼ 0.233
25.41 10.56 35.67 28.36
26.9 10.1 35.4 27.6 2.099 p ¼ 0.552
49.8 50.2
50.1 49.9 0.057 p ¼ 0.811
11.6 88.4
12.2 87.8 0.622 p ¼ 0.430
15.3 29.2 35.3 15.9 4.4
16.0 29.4 35.4 15.1 4.1 1.554 p ¼ 0.817 (continued)
266 Table 8.15
DESIGNS FOR DETECTING NONRESPONSE BIAS (Continued)
Explanatory variables (key questions see Appendix 8.2) Political interest Very interested Quite interested Hardly interested Not at all interested w2; df ¼ 3 Prob. H0
Unweighted N ¼ 1652
Weighted N ¼ 1652
8.8 37.2 33.2 20.9
8.2 34.9 34.7 22.1 5.173 p ¼ 0.160
a Weights based on propensity scores and derived from the ‘nonrespondent/cooperative’ (DQS) model of Table 8.14.
respondents’. As might perhaps be expected from Table 8.14, where smaller effects were presented for the reluctant respondents, the combination of all initial refusals does not lead to stronger effects. The odds ratios are in fact even smaller than for refusals. Consequently, the range of the propensity weights is also smaller and will thus lead to even smaller differences between unweighted and weighted samples. The second supplementary analysis included auxiliary information about all refusals. This auxiliary information was available for the target persons from the observational data (age, sex, dwelling type and neighbourhood characteristics). In a first step, the refusals are weighted according to the joint distribution of those who did (n ¼ 303) and those who did not participate in the DQS (n ¼ 307),40 using the two variables that showed significant differences in distributions. These variables are age
Table 8.16 Statistical test information about the differences in distributions of key questions between cooperative respondents and refusals (DQS respondents) in the weighted and unweighted samples, Belgium, in ESS 3a Explanatory variables (key questions see Appendix 8.2) Age (df ¼ 4) Education level (df ¼ 3) Social participation (df ¼ 4) Political interest (df ¼ 3)
Unweighted
Weighted
w2
Prob.
w2
Prob.
47.544 9.349 36.295 84.040
<0.0001 0.025 <0.0001 <0.0001
2.008 3.887 17.912 25.938
0.734 0.274 0.001 <0.0001
Only key questions with significant differences in distributions (p < 0.05) in unweighted samples are shown.
a
40
A number of cases had to be excluded because of incomplete data.
DETECTING AND ESTIMATING BIAS IN THE ESS
267
and dwelling type. In a second step, the logistic regression parameters are reestimated using these ‘calibrated’ data, and propensity score weights are computed as before. The unweighted sample of cooperative respondents is again compared with the weighted sample. The differences are once again very small, and no significant change is observed in the distributions or means of the background and attitudinal variables. The NRS in Norway This section looks in detail at the findings in Norway, the country where the differences between respondents in the main survey and the nonrespondents who later completed the NRS were most pronounced. Three different modes were used in the NRS in Norway, namely mail, CATI and the Internet. Most of the responses were sent by mail (85%). Only the long version of the questionnaire was used, making it possible to compare the distributions of 15 questions. The distributions did not differ depending on the mode of administration, and even the response variable (kind of respondent) was found to be independent of the mode. The questionnaire was sent to a sample of respondents in the main ESS survey and to all nonrespondents (refusals, noncontact and not able). This makes it possible to compare the nonrespondents who participated in the NRS with two groups in Norway; namely, all respondents who cooperated in the main survey and the sample of cooperative respondents who were selected for the NRS and answered the key questions. An advantage of the first comparison is that the sample of cooperative respondents in the main survey is large (n ¼ 1646). A disadvantage is that there may be mode effects, questionnaire effects (long main questionnaire, short key questionnaire) and that there is a time gap between the two data collection exercises. It is known that responses to opinion question are fairly unstable (see Billiet, Swyngedouw and Waege, 2004). An advantage of the second comparison is, of course, the absence of mode and questionnaire effects and also of the time gap. A serious disadvantage is not only the small sample size (n ¼ 230),41 but also the fact that these are ‘double cooperators’ (main survey and NRS) who may not represent all cooperative respondents (61% of the cooperative respondents cooperated again). In view of the pros and cons, both comparisons are presented here. Table 8.17 compares the response distributions of three groups; namely, all cooperative respondents on the main survey (‘ESS main’),42 the cooperative respondents who also participated in the NRS (‘NRS þ main’; the double cooperators), and the ESS 3 nonrespondents who completed the NRS (‘NRS only’; also called nonrespondents). No results are given for survey mode, because mode had no significant effect on the responses (see the bottom of the table).
N ¼ 230; 15 of the 245 double cooperators in Norway were initially reluctant in the main survey and are not used here. 42 The 103 reluctant respondents from ESS 3 in Norway are excluded since they initially refused and thus cannot be called cooperative. 41
NRS only (nonrespondents)
NRS and main (double cooperators)a
ESS main cooperative respondentsb
N Sex Male Female w2; df ¼ 1 Prob.
242
230
1,646
46.3 53.7
53.9 46.1 5.673 p ¼ 0.017
51.6 48.4 2.783 p ¼ 0.095
34.3 34.7 31.0
24.6 30.7 44.7 20.826 p < 0.0001
17.8 35.6 46.6 49.920 p < 0.0001
56.4 43.6
66.7 33.3 11.361 p ¼ 0.0008
67.6 32.4 13.728 p ¼ 0.0002
Education level Lower basic & lower secondary Higher secondary Higher education w2; df ¼ 2 Prob. Employment status Employed Unemployed w2; df ¼ 1 Prob.
DESIGNS FOR DETECTING NONRESPONSE BIAS
Explanatory variables (key questions, see Appendix 8.2)
268
Table 8.17 A comparison of the significant (p < 0.05) response distributions (%) and test statistics by kind of respondent respondents, Norway, ESS 3
Feeling safe Very safe Safe Unsafe Very unsafe w2; df ¼ 3 Prob. Social participation Much less than most Less than most About the same More than most Much more than most w2; df ¼ 4 Prob.
13.9 86.1 6.285 p ¼ 0.0121
19.1 80.9 0.0252 p ¼ 0.874
45.0 44.2 8.3 2.5
57.9 36.0 5.3 0.9 22.885 p < 0.0001
50.9 40.8 6.0 2.3 4.517 p ¼ 0.2108
12.5 17.4 60.2 8.3 1.7
5.3 20.2 59.7 11.0 4.0 29.309 p < 0.0001
3.3 17.7 61.2 16.1 1.8 70.7760 p < 0.0001
3.3 31.1 50.2 15.4
7.9 34.2 51.8 6.1 40.483 p < 0.0001
9.7 39.0 44.3 7.0 40.047 p < 0.0001 (continued)
269
Political interest Very interested Quite interested Hardly interested Not at all interested w2; df ¼ 3 Prob.
19.5 80.5
DETECTING AND ESTIMATING BIAS IN THE ESS
Household composition One-person household More persons household w2; df ¼ 1 Prob.
270
Table 8.17 (Continued) Explanatory variables (key questions, see Appendix 8.2)
How satisfied with the way democracy works (0 ¼ dissatisfied 10 ¼ satisfied) mean (SD) t-value; df Prob. Trust in politicians (0 ¼ no 10 ¼ complete trust) mean (SD) t-value; df Prob. Immigrants make country worse/better place to live (0 ¼ worse 10 ¼ better) mean (SD) t-value; df Prob.
NRS and main (double cooperators)a
ESS main cooperative respondentsb
20.6 40.3 28.6 8.8 1.7
29.4 45.2 21.9 1.8 1.8 79.131; df ¼ 4 p < 0.0001
n/a
5.907 (2.119)
6.044 (2.066) 0.70; df ¼ 464 p ¼ 0.482
6.632 (1.911) 5.31; df ¼ 1864 p < 0.0001
4.261 (2.226)
4.770(2.210) 2.48; df ¼ 466 p ¼ 0.013
4.457 (1.997) 1.40; df ¼ 1864 p ¼ 0.162
4.356 (2.558)
4.770(2.210) 2.48; df ¼ 466 p ¼ 0.013
5.117 (2.040) 5.17; df ¼ 1876; p < 0.0001
DESIGNS FOR DETECTING NONRESPONSE BIAS
Surveys are valuable for the whole of society Completely agree Agree Neither agree nor disagree Disagree Completely disagree w2; df Prob.
NRS only (nonrespondents)
Voluntary work (1 ¼ at least once a week 6 ¼ never) mean (SD) t-value; df Prob. Most people trusted/can’t be too careful (0 ¼ careful 10 ¼ trust) mean (SD) t-value; df Prob. Mode Paper (mail) WEB CATI w2; df Prob.
4.248(1.750)
3.556 (1.610) 4.36; df ¼ 466 p < 0.0001
3.709(1.764) 4.38; df ¼ 1876 p < 0.0001
4.531 (1.625)
3.859 (1.860) 4.17; df ¼ 466 p < 0.0001
4.211(1.735) 2.69; df ¼ 1884 p ¼ 0.007
6.600 (2.250)
6.886 (2.057) 1.44; df ¼ 464 p ¼ 0.3151
6.844 (1.805) 1.89; df ¼ 1884 p ¼ 0.058
84.7 2.9 12.4
85.2 3.5 11.3 0.508; df ¼ 2 p ¼ 0.776
n/a
DETECTING AND ESTIMATING BIAS IN THE ESS
Time watching TV per day (0 ¼ no time 7 ¼ > 3 hours) mean (SD) t-value; df Prob.
w , t-test and p-values refer to NRS þ main respondents versus NRS only (expected distribution). t-test and p-values refer to ESS main cooperative respondents versus NRS only (expected distribution).
a 2 b
271
272
DESIGNS FOR DETECTING NONRESPONSE BIAS
Significant differences (p < 0.05) between NRS þ main respondents and NRS only respondents (the nonrespondents) were found on no fewer than 11 of the 15 key questions. Between ESS main (cooperative) and NRS only (nonrespondents), eight questions showed a significant difference at the 0.05 level, and two others at the 0.10 level (right-hand column). The variables that show significant differences between NRS only and NRS þ main are as follows: education level, employment status, social participation, political interest, attitude to surveys (not applicable for ESS main respondents), trust in politicians, time spent watching TV per day, the question about immigration and involvement in voluntary work. We will now look in more detail at the other comparison; namely, the differences between NRS only (nonrespondents) and ESS main (cooperative) respondents, because of the small sample size of the NRS þ main group and the fact that they are ‘double’ cooperators. As expected, there are fewer less-educated and more highly educated respondents among the ESS main respondents compared to the nonrespondents (NRS only). Furthermore, over two-thirds of ESS main respondents are employed, compared to just over half the nonrespondents. With regard to attitudinal variables, the nonrespondents feel less safe walking in their neighbourhood after dark than the ESS main respondents, but the difference is not significant at the 0.05 level. The nonrespondents participate much less than most in social activities (according to their own estimates), are not interested at all in politics, are less satisfied with democracy and are more inclined to have negative views on immigration. They also watch more television than ESS main respondents. By contrast, the cooperative respondents from the ESS main survey work more for voluntary or charitable organizations than nonrespondents and are also slightly more trusting of other people. As in the Belgian DQS analysis, all variables that had significantly different distributions between ESS main respondents and nonrespondents were included in the logistic regression analysis to compute the propensity scores, even where they are no longer significant in the logistic regression (Lee and Valliant, 2009, pp. 324–5). Due to very high correlations between education and employment status, resulting in multicollinearity, it was decided to retain only education in the model. Table 8.18 shows that five of the 10 predictors included in the logistic regression model contribute significantly (p < 0.10) to the variation in the ‘nonresponse/ response’ ratio: education (included in the model as an ordinal variable), social participation, trust in politics, satisfaction with democracy and attitude towards immigration. Propensity weights were computed on the basis of all variables. As in the Belgian DQS, the weighed and unweighted distributions exhibited only minor significant differences. This means that, according to the first approach, weighting by incorporating information from the nonrespondents who completed the NRS did not reduce nonresponse bias. The second approach to evaluating the weighting procedure gives more promising results. This approach explores whether differences between the different kinds of respondents (ESS main cooperative respondents and nonrespondents who completed the NRS) on key questions disappear after weighting. If so, this weighting is effective and the weighting procedure has changed the missing mechanism into MAR, since
DETECTING AND ESTIMATING BIAS IN THE ESS
273
Table 8.18 A logistic regression model explaining the probability of becoming a nonrespondent, odds ratios ‘nonrespondent/ESS respondent in Norway in ESS 3 Explanatory variables (key questions, see Appendix 8.2)
Education level (ordinal 1–8) Feeling safe Safe Unsafe and very unsafe Reference: Very safe Social participation Less than most About the same as most (Much) more than most Reference: much less than most TV watching time per day (0 ¼ no time 7 ¼ more than 3 hours) Political interest Hardly interested Not at all interested Reference: very and quite interested Trust in politicians (0 ¼ no trust 10 ¼ complete trust) How satisfied with the way democracy works (0 ¼ extremely dissatisfied 10 ¼ extremely satisfied) Immigrants make country worse/better place to live (0 ¼ worse 10 ¼ better) Voluntary work (1 ¼ at least once a week 6 ¼ never) Most people trusted/can’t be too careful (0 ¼ can’t be too careful 10 ¼ most trusted) R2 Hosmer and Lemeshow statistic
NRS only/ESS main (242/1646) Odds ratio
Standard error
0.849
0.045
1.050 0.993
0.115 0.167
0.844 0.891 0.506
0.158 0.119 0.196
1.075
0.043
0.911 1.419
0.109 0.517
1.152
0.044
0.855
0.042
0.891
0.038
1.003
0.047
1.032 0.057 8.825
0.042
p < 0.01; p < 0.05; p < 0.10.
there are no longer any systematic differences between cooperative respondents and refusals. Table 8.19 shows that respondents and nonrespondents no longer differ according to social background and attitudinal variables in the response propensityweighted sample, with the exception of political interest. This means that it is now also more acceptable to apply the MAR assumption for other variables (questions) in the weighted sample.
274
DESIGNS FOR DETECTING NONRESPONSE BIAS
Table 8.19 Statistical test information about the differences in unweighted and weighted distributions on key questions between cooperative respondents (ESS main respondents) and nonrespondents (NRS only), Norway, in ESS 3a Explanatory variables (key questions, see Appendix 8.2)
Unweighted w2
Prob.
Weighted w2
prob.
Education level (df ¼ 2) Employment status (df ¼ 2) Feeling safe (df ¼ 3) Social participation (df ¼ 4) Political interest (df ¼ 3)
40.552 11.594 3.869 48.105 33.014 T-value
<0.0001 0.0007 0.276 <0.0001 <0.0001 prob.
5.036 0.768 3.379 3.490 16.946 T-value
0.081 0.381 0.337 0.480 0.002 prob.
Time watching TV per day (df ¼ 1878) Trust in politicians (df ¼ 1876) How satisfied with way democracy works (df ¼ 1864) Immigration makes country worse/better place (df ¼ 1871) Voluntary work (df ¼ 1885)
4.38
<0.0001
1.15
0.248
1.40
0.162
0.92
0.356
5.31
<0.0001
0.68
0.494
5.17
<0.0001
0.63
0.531
2.69
0.007
0.85
0.397
Only key questions with significant differences in distributions (p < 0.05) in unweighted samples are shown. a
The NRS in Poland and Switzerland For the sake of completeness, a short description and summary of the NRS process and results in Poland and Switzerland is given below. The survey of nonrespondents in Poland was a mail survey, with both a long and a short version of the questionnaire being used. Owing to privacy regulations in Poland, it was not possible to link the individual records of all main survey respondents in ESS 3 to the records from the NRS. Consequently, the complete gross ESS main sample in Poland, including both respondents and nonrespondents, received the NRS. At aggregate level, it is known how many of the original cooperative respondents, reluctant respondents and nonrespondents completed the NRS (see Box 8.1). As the net sample of respondents to the main survey who also completed the NRS is large in Poland (n ¼ 1000) and the response rate was relatively high (57% for the long version and 62% for the short version), the main comparison in Poland was between the NRS þ main respondents and the NRS only (nonrespondents). This excludes mode, questionnaire and time-gap effects.
DETECTING AND ESTIMATING BIAS IN THE ESS
275
To optimize the use of the NRS sample, the short questionnaire (seven key questions) was used for weighting. Significant differences were found only for education level (w2 ¼ 10.313; df ¼ 3; p ¼ 0.016) and value of surveys (w2 ¼ 69.074; df ¼ 4; p < 0.001). The number of variables showing significant differences between nonrespondents and cooperative respondents is thus much smaller in Poland than in Belgium, where the same set of questions was used. Contrary to earlier results, nonrespondents in Poland were more highly educated than the cooperative respondents who completed the NRS. In the main survey, however, the distribution of education is similar to that from population statistics. In the logistic regression model explaining the propensity of becoming a nonrespondent (or respondent), significant odds ratios were found for the same two variables, namely education and attitudes towards surveys. Those with a higher-secondary education (odds ratio 1.362) or a higher tertiary education (odds ratio 1.439) were more likely to refuse cooperation compared with the less-educated. An even larger effect is found for attitudes towards surveys. Those who disagree that surveys are valuable more often not responded on the main survey (odds ratio 1.995). After response propensity score weighting of the cooperative respondents using the information on the response probabilities, no significant differences were found between the unweighted and weighted samples of cooperative respondents. This approach suggests that the weighting procedure is effective. The original significant differences in relation to education and attitudes to surveys between cooperative respondents and nonrespondents in the unweighted sample disappear completely in the weighted sample (second approach). The fact that we did not detect nonresponse bias in Poland could be a consequence of the high response rate in the main survey; Poland achieved the target of 70% in ESS 3. This would be in line with the findings of Voogt (2004, p. 160), who concluded that ‘a response level of around 75% is enough to eliminate possible response bias in relations between variables, and to arrive at corrected estimates of the distributions of the variables of interest that are very close to the true value in the sample, especially when central questions are used in the correction methods’. On the other hand, the results of Groves (2006) indicate that even with a response rate of 80%, substantial nonresponse bias is possible. An alternative explanation could be that the 23% of the nonrespondents in Poland who completed the NRS do not represent all nonrespondents and are in fact rather similar to the cooperative respondents. The net NRS sample in Switzerland contains 252 respondents from the main survey, of whom 231 were cooperative (ESS main) and 771 were nonrespondents (NRS only). Again, 21 initial refusals were excluded from the analysis. Owing to the small number of ESS main respondents who completed the NRS compared to the number of cooperative respondents in the main ESS survey (n ¼ 1627), the latter were chosen for the weighting procedure, as in Norway. In order to ensure the largest possible number of cases, the analysis was restricted to the short version of the questionnaire. Except for education level (w2 ¼ 7.159; p ¼ 0.067), the five other questions show significantly different distributions between respondents and nonrespondents at the 0.05 significance level. The attitude towards surveys was not
276
DESIGNS FOR DETECTING NONRESPONSE BIAS
included in the analysis because this question was not asked in the main ESS survey. Nonrespondents from different modes in the NRS are grouped together, because this variable is not related to the dependent variable (response or nonresponse). Five predictors are also significant in the logistic regression. Nonresponse in the main survey is less likely when the sampled person is employed, belongs to a household with more than one person, participates more than others in social activities, feels very safe after dark and is politically interested. These findings are in line with the Norwegian findings. As was the case in the two other countries in which an NRS was organized, all the differences between respondents and nonrespondents disappear after weighting the combined sample of respondents and nonrespondents. 8.3.4.3 Nonresponse surveys: some reflections In this fourth approach to bias detection and adjustment, information was collected on core questions from nonrespondents. Two methods were discussed, both involving the use of a key questionnaire of either seven or 15 questions from the ESS survey: a doorstep questionnaire survey (DQS) in Belgium, where refusals were asked to answer a few questions even though they had refused to participate in the main survey; and a nonresponse survey (NRS) in Norway, Poland and Switzerland, in which both respondents and nonrespondents from the main survey were invited to participate some weeks or months after the regular data collection. A disadvantage of the doorstep DQS approach is that information is only obtained from refusals to the main survey, and not from noncontacts. In addition, the context of the small number of questions asked on the doorstep will differ from the long main questionnaire, possibly contaminating the findings. On the other hand, the DQS approach makes it easy to link the information from refusals to response behaviour on an individual level. This allows a distinction to be drawn between cooperative respondents, reluctant respondents and refusals using contact form data. In the NRS surveys, the information from nonrespondents who completed the NRS can be compared with either the main survey respondents who also participated in the NRS survey, or with all cooperative respondents in the main survey. Both the DQS and NRS approaches facilitate the computation of propensity scores and, in turn, propensity weights for the variables in the key questionnaire. In both approaches, many different comparisons could be made concerning the response variable to be used in the models. Are respondents those who were cooperative in the ESS main survey or the subset of those who also completed the NRS (the double cooperators)? Or are respondents both cooperative and reluctant respondents? Or should reluctant respondents be added to the final refusals for an optimal comparison? A possible solution for the DQS approach would be simultaneously to compare the probability ratios ‘reluctant/ cooperative’ and ‘refusal/cooperative’ in a multinomial model. This is also possible in the NRS when there is a substantial number of reluctant refusals in the main survey, and when it is permitted to link the records of NRS to records from main survey at individual level.
DETECTING AND ESTIMATING BIAS IN THE ESS
277
Our analyses comprised three steps. In the first step, logistic regression was used to estimate the odds ratios in a model with a set of covariates that explain the variation in the ‘nonresponse/response’ ratio (or its inverse, ‘response/nonresponse’). On the basis of this response, propensity scores were computed. In a second step, the samples of respondents and nonrespondents were weighted using response propensity stratification as proposed by Little (1986). Finally, in the third step the effectiveness of the propensity weights were assessed in two ways: firstly, by comparing the weighted and unweighted samples of respondents; and, secondly, by testing whether differences between respondents and nonrespondents on key questions disappear after weighting. The first method can be applied to all questions in the main survey; the second method cannot because we do not have information about these variables for the nonrespondents in the main survey. This is not a problem if the aim is only to adjust the estimates in the main survey. Different covariates were effective in explaining the response propensities in the four countries. Norway has quite a large number of effective predictors. There is, however, a small set of predictors that play a role in the prediction of nonresponse to the main survey in all countries. These are education, social participation and the overall attitude towards surveys. The latter question, on attitudes to surveys, seems very promising where the aim is to adjust for nonresponse. It was not asked in the main survey, and it could therefore only be used in the NRS in Poland, where a large sample of main respondents and nonrespondents answered the this question. In most cases, the weights generated from the propensity scores did not result in very different distributions for the respondents to the main survey. A reason for this might be that the bias was generally quite small. In almost all countries, the differences on key questions observed between respondents and nonrespondents in the unweighted data disappeared or were much smaller in the weighted samples. This is an indication that after weighting, respondents and nonrespondents belong to the same population and the missing mechanism is now Missing At Random, since there are no longer any systematic differences between respondents and nonrespondents. Core information on nonrespondents was collected in a quasi-experimental setting. We used two different methods (DQS and NRS), each of which had its own pros and cons. Even between the three countries where an NRS was carried out, however, there were differences in the sampling decisions, mode and final research. In order to be able to compare nonresponse bias properly across countries and to adjust for nonresponse in a standardized way, one of these methods would need to be applied in all countries. DQS undoubtedly has some advantages compared with NRS: it is less expensive and less burdensome for respondents and survey organizations, and the answers to the DQS can immediately linked at an individual level to sample frame information, interviewer observations and call records. There are no timing effects, but question context effects cannot excluded and, most importantly, information is obtained only on the refusals, not on noncontacts. In this sense, the NRS is a more promising option because all nonresponse is included, provided that the obtained samples are large enough. The longer version (16 questions) with more attitudinal questions is preferable to the short version and is useful in an NRS, but is probably too
278
DESIGNS FOR DETECTING NONRESPONSE BIAS
long for the DQS approach. Both approaches are most effective when the response rate is high; otherwise, the possibility cannot be ruled out that the core information pertains to a selective group of nonrespondents only. Nonresponse surveys, and even the doorstep questionnaire survey, cannot replace extended interviewer efforts such as more contact attempts and longer fieldwork periods to reach the hard to reach and convert the initially reluctant. Ideally, these efforts should be combined with other approaches such as interviewer observations of the dwelling and the neighbourhood, and refusal conversion attempts directed at ‘hard’ and ‘soft’ refusals. It should, however, be borne in mind that this combination entails a good deal more work for survey organizations and make surveys more expensive.
8.4 Conclusions Previous approaches to uncovering bias on key constructs of the ESS (Billiet et al., 2007) ultimately proved inconclusive. The four approaches outlined in this chapter to collecting auxiliary data and showing how these can be used in nonresponse adjustment represent a step forward, but there is still no hard and fast means of definitively measuring and correcting for nonresponse bias in a cross-national survey such as the ESS. The post-stratification (PS) approach has the advantage of estimating nonresponse bias for all relevant components (response, refusal, noncontact, not able) based on demographic variables documented in population statistics. The nonresponse adjustment is effective for target variables in the obtained sample if the stratification variables co-vary with the target variables. The method of correcting for bias using PS weighting is in principle straightforward, although it is likely to overestimate the bias ascribed to nonresponse since it includes sampling deficiencies in its estimates, whilst at the same time it may overlook the nonresponse bias in the target variable. A major weakness of the approach is that the effect of post-stratification weighting is usually very limited, because the degree of co-variance between the target variables and the available post-stratification variables is weak. Moreover, the sources that are used as a ‘gold standard’ to produce the post-stratification weights are sometimes problematic, which in turn makes corrective weighting more complex. Other nagging problems also remain: Do official population statistics accurately represent the distributions in all countries? How problematic is it if joint distributions of the stratification variables (e.g. age by sex by education) are not available for all countries? How well do the classifications of PS variables in population statistics match those in the ESS? The key challenge in improving the PS approach is to find additional weighting variables that are more strongly related to the target variables and for which the estimated population distributions are available and reliable. The problems are even greater when trying to detect bias by comparing reluctant respondents with cooperative respondents in a cross-national study. This approach is based on the assumption that converted refusals are similar to or can be used as a proxy for the final nonrespondents. A first problem is that the numbers of converted
CONCLUSIONS
279
respondents are too small in most ESS countries to be of any analytical use. Secondly, bias is not always found in the same variables across countries and over time (see also Lynn et al., 2002b). This may stem in part from measurement problems. Thirdly, it is not clear how decisions are made as to how many or which type of refusals should be re-approached in the different survey organizations in the different countries (Beullens, Billiet and Loosveldt, 2009a). It seems that the classification ‘converted refusal’ depends too much on differences in interviewer decisions or even differences in organizational traditions on how to treat initial refusals. Privacy regulations in some countries may prevent refusal conversion entirely or restrict the use of contact form data. And, finally, this approach ignores noncontacts: it actually (partially) measures refusal bias, and it cannot be assumed that nonrespondents and noncontacted sample units are similar or comparable. Different regulations and practical approaches, as well as systematic selection bias, make reliance on refusal conversion information problematic when the aim is to assess the amount of nonresponse bias or to adjust for nonresponse using comparable information about the target variables in all countries. The most important shortcoming of this approach is that there is little evidence that converted refusals are similar to final respondents. Given these problems, it makes no real sense to adjust for nonresponse using only information about reluctant respondents. What could be done is to use this information in combination with other approaches. One possible option for the future would be to encourage countries to reissue a substantial and randomly selected group of cases for refusal conversion and to try very hard to secure their cooperation. Currently, the approach can do little more than provide a warning to researchers that bias appears to be present in some variables. Furthermore, refusal conversion itself has other important effects, such as substantially increasing response rates. It may also provide an important signal to fieldwork organizations that good response rates are important, something that can also improve the quality of the survey in other respects. Information about the neighbourhood as observed by interviewers has the advantage that it provides additional information about all sample units (cooperative, reluctant, refusals and noncontacts). A prerequisite is, of course, that all sample units are visited personally on location. The recording of neighbourhood characteristics is a serious problem in the ESS at present, with deficiencies that are similar to or perhaps even greater than those of the post-stratification approach. The coding of neighbourhood characteristics in the early rounds of the ESS was based on subjective appraisal by the interviewer, with little or no guidance. As a consequence, the lack of harmonized coding training for interviewers means that these variables are not useful for generating correction techniques. Specific training in this area is strongly recommended, as well as improvement of the questions and categories for household and neighbourhood characteristics, to ensure that they are adequate to accommodate country-specific characteristics and also lead to equivalent measurement across countries. Collecting core information on nonrespondents is useful for detecting and adjusting for bias, as long as the numbers of initial refusals who cooperate are
280
DESIGNS FOR DETECTING NONRESPONSE BIAS
adequate for nonresponse modelling. In a small ESS study in four countries, the response rate of refusals in a follow-up survey ranged from 23% in Poland to 53% in Switzerland. It therefore seems that it is possible to convince some nonrespondents to answer a small set of key questions. Core information on nonrespondents was collected in two ways in this quasi-experimental study. In Norway, Poland and Switzerland, a follow-up survey was conducted that targeted both respondents (or a sample of them) and nonrespondents. This was the ‘nonresponse survey’ (NRS) approach. In Belgium, refusals were asked to answer a small set of questions on the doorstep; this was called the ‘doorstep questionnaire survey’ (DQS) approach. It is a rather expensive endeavour to mount a follow-up survey several weeks after the end of the main survey. The key advantage of the NRS approach is that the propensity weights can be based on refusals and noncontacts. Those who participate in the NRS can be compared with respondents in the main survey, and with respondents who also cooperated in the follow-up survey (the ‘double cooperators’). The first comparison uses the large sample size of the original respondents; the second makes it possible to incorporate possible mode effects, question context effects and timing effects. One advantage of the DQS approach used in Belgium is that it is less expensive than conducting another round of fieldwork later on. In addition, the answers given on the doorstep to the core questions can be linked to auxiliary information from the contact forms for all sampled units, both respondents and nonrespondents. This enables the DQS approach to be combined with both reluctant respondent data and neighbourhood data as observed by the interviewers. A disadvantage is that there may be mode and context effects that prevent a clear identification of nonresponse bias. Another disadvantage is that the DQS only gives information about refusals, not about other nonrespondents. Of course, the value of auxiliary information on nonrespondents depends on the content of the core questions; the more closely these are related to the likelihood of response, the better. Several of the basic questions that were used in the long and short versions of the supplementary questionnaire suggested bias. Results differed across countries, but the questions on education, social participation, interest in politics, voluntary work, watching television and attitudes towards surveys were the most informative. In this section, we used different post-survey adjustment methods to provide different sets of alternatively corrected estimates, all of which aim to measure the same population parameters. We compared the different methods, where possible based on the same core variables, and compared results with the unadjusted sample. This approach, suggested by Groves (2006, p. 656; see also Section 8.2), appears to be optimal in a cross-national study. Each individual approach to measuring and correcting for nonresponse bias has its flaws, sometimes serious flaws, but they can be combined and at the very least can alert the analyst to the presence of bias. Clearly, further work is required to investigate the possibility of identifying a better and more reliable set of post-weighting measures. If refusal conversion is to be used to provide a clear indication of bias and possible corrections, a more uniform approach to its implementation is required. If observational data are to prove useful,
APPENDIX 8.1 OVERVIEW CORE VARIABLES AND CONSTRUCTS
281
interviewers need consistent training across countries. One promising approach is to conduct more surveys of nonrespondents. The next step is then more research and reflection on the optimal designs and the extent to which these should be harmonized across countries. A second promising approach is to combine different types of auxiliary data in one model; for instance, information from the contact forms, and interviewer observations of the neighbourhood and core variables from refusals.
Appendix 8.1 Overview core variables and constructs The numbers (F6, C13, etc.) refer to the numbers of the questions in the ESS 2 questionnaires. See http://ess.nsd.uib.no/ fieldwork documents. Most of the multiple item latent variables are tested by Multi Group Structural Equation Modelling. The measurement quality is reported in the papers in the references. Level of education (European Social Survey, 2005) ESS Education Standard (slightly modified ISCED-97) Coding frame Highest level of education 0
Not completed primary education
1
Primary or first stage of basic
2
Lower secondary or second stage of basic
3
Upper secondary
4
Post secondary, nontertiary
5
First stage of tertiary
6
Second stage of tertiary Religious involvement (construct: additive scale)
C13
C14
Question text
Scale
Scale points
Regardless of whether you belong to a particular religion, how religious would you say you are? Apart from special occasions such as weddings and
11-point scale
0. Not at all religious 10. Very religious
7-point scale
1. Every day 2. More than once a week 3. Once a week
282
C15
DESIGNS FOR DETECTING NONRESPONSE BIAS funerals, about how often do you attend religious services nowadays? Apart from when you are at religious services, how often, if at all, do you pray?
7-point scale
4. 5. 6. 7. 1. 2. 3. 4. 5. 6. 7.
At least once a month Only on special holy days Less often Never Every day More than once a week Once a week At least once a month Only on special holy days Less often Never
The scale is scalar and metric equivalent for the countries used in this analysis (Billiet and Meuleman, 2008).
Admit immigrants in country (construct: additive scale)
B35
B36
B37
Question text
Scale
Scale points
Now, to what extent do you think [country] should allow people of the same race or ethnic group as most [country’s] people to come and live here? How about people of a different race or ethnic group from most [country] people? How about people from the poorer countries outside Europe?
4-point scale
1. Allow many to come and live here 2. Allow some 3. Allow a few 4. Allow none
The scale is scalar and metric equivalent for the countries used in this analysis (Billiet and Meuleman, 2008; Davidov et al., 2008).
Perceived ethnic threat (construct: additive scale)
B38
Question text
Scale
Scale points
Would you say it is generally bad or good for [country]’s economy that people come to live here from other countries?
11-point scale
0. Bad for the economy 10. Good for the economy
APPENDIX 8.1 OVERVIEW CORE VARIABLES AND CONSTRUCTS B39
B40
And, would you say that [country]’s cultural life is generally undermined or enriched by people coming to live here from other countries? Is [country] made a worse or a better place to live by people coming to live here from other countries?
283
11-point scale
0. Cultural life undermined 10. Cultural life enriched
11-point scale
0. Worse place to live 10. Better place to live
The scale is scalar and metric equivalent for the countries used in this analysis (Billiet and Meuleman, 2008; Davidov et al., 2008).
Trust in political institutions (construct: additive scale) Question text Please tell me on a score of 0–10 how much you personally trust each of the institutions I read out. 0 means you do not trust an institution at all, and 10 means you have complete trust.
B4 B5 B6 B7 B8
Items
Scale
Scale points
. . . [country]’s parliament? . . . the legal system? . . . the police? . . . politicians? . . . political parties?
11-point scale
0. No trust at all 10. complete trust
The scale is scalar and metric equivalent for the countries used in this analysis (Billiet and Meuleman, 2008).
Political competence (construct: additive scale)
B1
Question text
Scale
Scale points
How interested would you say you are in politics.? Are you. . .
4-point scale
1. 2. 3. 4.
very interested quite interested hardly interested or, not at all interested?
284 B2
B3
DESIGNS FOR DETECTING NONRESPONSE BIAS How often does politics seem so complicated that you can’t really understand what is going on? How difficult or easy do you find it to make your mind up about political issues?
5-point scale
5-point scale
1. 2. 3. 4. 5. 1. 2. 3. 4. 5.
Never Seldom Occasionally Regularly Frequently Very difficult Difficult Neither difficult nor easy Easy Very easy
The scale is scalar and metric equivalent for the countries used in this analysis (Billiet and Meuleman, 2008).
Political participation (construct: additive scale) Question text There are different ways of trying to improve things in [country] or help prevent things from going wrong. During the last 12 months, have you done any of the following? Have you. . .
B13
B14
B15
B16
B17 B18
B19 B20
Scale is number of times ‘yes’
Items
Scale
Scale points
. . .contacted a politician, government or local government official? . . .worked in a political party or action group? . . .worked in another organisation or association? . . .worn or displayed a campaign badge/ sticker? . . .signed a petition? . . .taken part in a lawful public demonstration? . . .boycotted certain products? Is there a particular political party you feel closer to than all the other parties
2-point scale
1. Yes 2. No
APPENDIX 8.1 OVERVIEW CORE VARIABLES AND CONSTRUCTS
285
Civil obedience (construct: additive scale) Question text How much do you agree or disagree with these statements about how people see rules and laws?
E18
E19
Items
Scale
Scale points
You should always strictly obey the law even if it means missing good opportunities. Occasionally, it is alright to ignore the law and do what you want to.
5-point scale
1. Agree strongly 2. Agree 3. Neither agree nor disagree 4. Disagree 5. Disagree strongly
Pearson’s correlations between the two items: 0.43 (CH, DE), 0.39 (EE, NL), 0.37 (SK)
Social trust (construct: additive scale)
A8
A9
A10
Question text
Scale
Scale points
Generally speaking, would you say that most people can be trusted, or that you can’t or that you can’t be too careful in dealing with people? Please tell me on a score of 0 to 10, where 0 means you can’t be too careful and 10 means that most people can be trusted. Do you think that most people would try to take advantage of you if they got the chance, or would they try to be fair? Would you say that most of the time people try to be helpful or that they are mostly looking out for themselves?
11-point scale
1. You can’t be too careful 10. Most people can be trusted
11-point scale
1. Most people would try to take advantage of me. 10. Most people would try to be fai 1. Most people look out for themselves 10. Most people try to be helpful
11-point scale
The scale is scalar and metric equivalent for the countries used in this analysis (Billiet and Meuleman, 2008).
286
DESIGNS FOR DETECTING NONRESPONSE BIAS
Social isolation (construct: additive scale) Question text
Scale
Scale points
C2
How often do you meet socially with friends, relatives or work colleagues?
7-point scale
C4
Compared to other people of your age, how often would you say you take part in social activities?
5-point scale
1. 2. 3. 4. 5. 6. 7. 1. 2. 3. 4. 5.
Never Less than once a month Once a month Several times a month Once a week Several times a week Every day Much less than most Less than most About the same More than most Much more than most
Pearson’s correlations between the two items: 0.31 (CH, NL), 0.43 (DE), 0.33 (EE), 0.39 (SK)
Feeling safe (one item, percentage unsafe or very unsafe) C6 How safe do you–or would you–feel walking alone in this area43 after dark? Do- or would - you feel. . . very safe, safe, unsafe, or, very unsafe? (Don’t know)
1 2 3 4 8
Appendix 8.2 Questionnaires nonresponse modules Questionnaire nonresponse module (short) Split ballot: survey climate–short version (1 page) 1.
43
Which of these descriptions best describes your situation? Please select only one. & In paid work & In education & Unemployed & Doing housework, looking after children or other persons & Retired & Other
Respondent’s local area or neighbourhood.
APPENDIX 8.2 QUESTIONNAIRES NONRESPONSE MODULES 287 2. What is the highest level of education you have achieved? Please use the country-specific question and codes for coding into the ESS coding frame & No qualifications & CSE grade 2–5/GCSE grades D–G or equivalent & CSE grade 1/0-level/GCSE grades A–C or equivalent & A-level, AS-level or equivalent & Degree/postgraduate qualification or equivalent & Other 3.
Including yourself, how many people–including children–live regularly as members of your household? _____people
4.
Compared to other people of your age, how often would you say you take part in social activities, i.e. you participate in the meetings with your friends or family members? & Much less than most & Less than most & About the same & More than most & Much more than most
5.
How safe do–or would–you feel walking alone in your local area after dark? & Very safe & Safe & Unsafe & Very unsafe
6.
How interested would you say you are in politics? & Very interested & Quite interested & Hardly interested & Not at all interested
7.
Do you agree or disagree with the following statement: surveys are valuable for the whole society, as we all want to know what the [inhabitants of Country] think and what opinions they have on various important matters & Completely agree & Agree & Not agree, not disagree & Disagree & Completely disagree
288
DESIGNS FOR DETECTING NONRESPONSE BIAS
Questionnaire nonresponse module (long) Split ballot: survey climate–long version (2 pages) 1.
What is your gender? & Male & Female
2.
In what year were you born? _______
3.
What is the highest level of education you have achieved? Please use the country-specific question and codes for coding into the ESS coding frame & No qualifications & CSE grade 2–5/GCSE grades D–G or equivalent & CSE grade 1/0-level/GCSE grades A–C or equivalent & A-level, AS-level or equivalent & Degree/postgraduate qualification or equivalent & Other
4.
Which of these descriptions best describes your situation? Please select only one. & In paid work & In education & Unemployed & Doing housework, looking after children or other persons & Retired & Other
5.
Including yourself, how many people–including children–live regularly as members of your household? _____people
6.
On an average weekday, how much time, in total, do you spend watching television? & No time at all & Less than 1/2 hour & 1/2 hour to 1 hour & More than 1 hour, up to 11/2 hours & More than 11/2 hours, up to 2 hours & More than 2 hours, up to 21/2 hours & More than 21/2 hours, up to 3 hours & More than 3 hours
APPENDIX 8.2 QUESTIONNAIRES NONRESPONSE MODULES 289 7. In the past 12 months, how often did you get involved in work for voluntary or charitable organizations? & At least once a week & At least once a month & At least once every three months & At least once every six months & Less often & Never 8.
Compared to other people of your age, how often would you say you take part in social activities, i.e. you participate in the meetings with your friends or family members? & Much less than most & Less than most & About the same & More than most & Much more than most
9.
How safe do–or would–you feel walking alone in your local area after dark? & Very safe & Safe & Unsafe & Very unsafe
10.
Generally speaking, would you say that most people can be trusted, or that you can’t be too careful in dealing with people? Please indicate your answer on a score of 0 to 10, where 0 means you can’t be too careful and 10 means that most people can be trusted. You can’t be too careful Most people can be trusted 0 1 2 3 4 5 6 7 8 9 10 & & & & & & & & & & &
11.
How interested would you say you are in politics? & Very interested & Quite interested & Hardly interested & Not at all interested
12.
On the whole, how satisfied are you with the way democracy works in [country]? Please indicate your answer on a score of 0 to 10, where 0 means that you are extremely dissatisfied, and 10 means that you are extremely satisfied with the way democracy works in [country]. Extremely dissatisfied Extremely satisfied 0 1 2 3 4 5 6 7 8 9 10 & & & & & & & & & & &
290 13.
DESIGNS FOR DETECTING NONRESPONSE BIAS Please indicate on a score of 0–10 how much you personally trust politicians. 0 means you do not trust politicians, and 10 means you have complete trust in politicians. No trust at all Complete trust 0 1 2 3 4 5 6 7 8 9 10 & & & & & & & & & & &
14.
Is [country] made a worse or a better place to live by people coming to live here from other countries? 0 means that [country] is made a worse place to live and 10 means that [country] is made a better place to live by people coming to live here from other countries. Worse place to live Better place to live 0 1 2 3 4 5 6 7 8 9 10 & & & & & & & & & & &
15.
Do you agree or disagree with the following statement: surveys are valuable for the whole society, as we all want to know what the [inhabitants of Country] think and what opinions they have on various important matters. & Completely agree & Agree & Not agree, not disagree & Disagree & Completely disagree Questionnaire basic nonresponse module Additional question on survey behaviour
16.
Between [month] and [month], one of our interviewers has approached you to participate in the European Social Survey study. However, we were not successful in conducting an interview with you. Could you give the reason(s) why you did not participate in the survey? (more than one answer is possible) & I refused to participate because I am very busy & I refused to participate because the interviewer came at a wrong time, I had to take care of other things at that time & I refused to participate because I think that surveys are a waste of time and money & I refused to participate because I am afraid to let strangers in & I refused to participate because surveys are an intrusion into my privacy, I do not provide information about myself & I refused to participate because I had participated in surveys too many times
APPENDIX 8.2 QUESTIONNAIRES NONRESPONSE MODULES 291 & I refused to participate because I have had bad experience from previous participation in similar studies & I refused to participate because I was afraid I would not cope with providing answers to the survey questions & I refused to participate because I am not interested in the subject of the survey & I refused to participate because my family members opposed to my participation in the survey & I was absent throughout the survey period & I was ill at that time & I was often away from home and the interviewer could never find me home & I haven’t seen an interviewer & Other reason: please specify. . .. . .. . .. . . Tables 8.5, 8.6, 8.8 and 8.9 reproduced by permission of the authors of the research reports of CeSO, K.U. Leuven.
9
Lessons Learned 9.1 Introduction About 10 years ago, de Heer (1999a) presented an overview of international response trends in two European official statistical surveys, namely the Labour Force Survey (LFS) and the Expenditure Survey. As mentioned in Section 1.4, he identified a number of factors that could explain differences in response rates across countries and survey organizations and over time, and concluded that response levels are not a given but can be influenced: ‘Differences with respect to response trends, levels and types of nonresponse seem to be affected to a great extent by sample and survey characteristics, fieldwork strategy and aspects of the survey organization. To this extent, response and nonresponse levels are also partly under the control of the survey organization’ (de Heer, 1999a, p. 104). De Heer (1999a) also showed that it was possible to gather data on response, types of nonresponse and survey characteristics in different countries, and to build a database for the investigation of response trends, especially given that in an age of increased internationalization of policy-making (by organizations such as the United Nations, the European Commission and the Organisation for Economic Co-operation and Development) there was a growing demand for high-quality cross-national data. Not only would this aid policy-making, but it could also serve as a benchmark for both survey organizations and those commissioning surveys. Such data could then be used to develop and improve best practice systems for national and international surveys, and would serve as an incentive for other organizations to monitor fieldwork continuously and record fieldwork outcomes, resulting in a rich dataset to analyse trends and investigate potential determinants of nonresponse. Finally, these data might stimulate survey organizations working in different disciplines (e.g. government, market Improving Survey Response: Lessons learned from the European Social Survey Ineke Stoop, Jaak Billiet, Achim Koch and Rory Fitzgerald Ó 2010 John Wiley & Sons, Ltd
294
LESSONS LEARNED
research) to invest in long-term policies to safeguard response rates, such as policies to prevent the survey climate from being made worse by these organizations themselves. Two later studies (de Leeuw and de Heer, 2002; Couper and de Leeuw, 2003) compared response rates across countries and related these to survey design and fieldwork characteristics. In their 2002 study, de Leeuw and de Heer observed that average household size and the presence of young children correlated with contactability, that cross-sections had a higher noncontact rate than panels, that countries that applied more lenient rules for sampling and respondent selection (allowing proxies or substitution) had lower noncontact rates than countries that used stricter rules, and that stricter supervision and monitoring of interviewers was also associated with lower noncontact rates. Differences in refusal rates were associated with economic indicators (a higher unemployment rate meant a lower refusal rate; higher inflation meant a higher refusal rate), close monitoring and supervision of interviewers resulted in lower refusal rates, and – as expected – refusal rates in mandatory surveys were lower than in voluntary surveys. The authors recommended setting up an international, longitudinal, academic nonresponse study as well as further investigation of survey design differences and their impact on nonresponse. In their 2003 study, Couper and de Leeuw pinpoint a specific problem – namely the very limited availability of response information – which severely hampered the comparative study of nonresponse bias. One of their recommendations to combat nonresponse was to deploy a whole battery of measures: ‘Such considerable efforts may well increase survey costs. However, in serious cross-cultural and cross-national research reduction in variation in fieldwork and survey design between countries is of the utmost importance for valid comparisons. For this reason it is important that budget constraints do not severely limit the use of an array of response-inducing measures (de Heer 1999b)’ (Couper and de Leeuw, 2003, p. 173). They also recommended reporting the individual components of nonresponse and fieldwork details, reducing variation in response rates and the type of nonresponse between countries, planning nonresponse reduction and post-survey adjustment in advance, and developing and implementing a theoretical framework. Several years later, based largely on the evidence from the European Social Survey, we are able to report remarkable progress in the comparative study of nonresponse. More information on the response process and fieldwork is available than in most national studies, and this information is freely available and directly accessible to all. Not surprisingly, many researchers from countries all over the world have now begun to study and analyse ESS nonresponse. This includes analyses to determine how to enhance response rates and distinguish between different components of nonresponse, but also analyses of nonresponse bias. Somewhat reassuringly, there are as yet virtually no indications that nonresponse bias is extensive in the ESS data, or that survey data are hard to compare because of differential nonresponse bias. This finding is, of course, based on the operational definition of bias outlined in Chapter 8, and is limited by the number of countries where analysis was possible and sometimes by the incompleteness or deficient quality of auxiliary data. Nevertheless,
STANDARDIZATION, TAILORING AND CONTROL
295
this is an important finding, which suggests that a high-quality approach to survey implementation can pay dividends. Those who have read all of this book will realize that our observations of nonresponse in the ESS are not uniformly positive. Paradoxically, this partly reflects the detailed information that the ESS now makes available. In this chapter, we will highlight the main outcomes of the studies of ESS nonresponse, discuss what we still do not know, highlight what could be improved and try to translate this into practical lessons. The chapter reflects the descriptive approach of this book, which can be compared to two well-known approaches in nonresponse research: meta-analysis and experiments. Contrary to earlier studies on cross-national nonresponse, we used data on the fieldwork process from the contact forms. This is an advantage compared to meta-analyses, as mentioned in the introduction to this chapter, where generally only published aggregate information is used. A drawback of our approach is that, unlike in a meta-analysis study, we focused on one survey only, the European Social Survey. This is precisely because comparable, detailed fieldwork data are not available for other cross-national studies. Descriptive studies may not have the same generalizability as experiments. Given the number of practical factors that play a role in nonresponse, quite a number of which are outside the control of the researcher, the close relationship between countries, survey organizations and sampling frames, the number of countries in the ESS and the tight budgets, we saw no opportunities for large-scale experiments; however, we hope that the overview of what happened in the ESS is informative in itself.
9.2 Standardization, Tailoring and Control It is important to remind ourselves of the main aim of the European Social Survey; namely, to produce rigorous data on trends over time in underlying attitudes, values and behaviour within and between European nations. High overall quality and optimum comparability are pursued in the ESS. This means that many techniques for enhancing response rates are not permitted. The main questionnaire cannot be left behind at the request of the respondent to complete at their leisure, for example, and interviews by telephone are not permitted because this could result in uncontrolled mode effects. No proxy interviews can be conducted, and live translations by the interviewer or a family member are not permitted where the target person does not speak the survey language. All these techniques would be likely to increase measurement error. The aims of minimizing total survey error and achieving optimum comparability may thus have resulted in lower response rates than might have otherwise been achieved. Since the ESS is a cross-national survey, the majority of its users will compare the different countries. In order to facilitate optimum comparison of countries, methodological differences need to be minimized, so that as far as possible, differences between countries are due to differences in their populations, not to the phrasing of questions, the interview mode, the response rate or nonresponse bias. For this reason, the ESS is based on input harmonization. From the beginning, however, wholesale
296
LESSONS LEARNED
adoption of precisely the same methods or procedures across countries was deemed to be neither practical nor optimal. For instance, the type of survey organization differed across countries: statistical organizations, academic institutes and commercial survey agencies are all acceptable as long as they have a good track record of working with random probability samples without substitution and using face-to-face interviews. Details of the fieldwork strategies also differed. For instance, the content (or even presence) of advance letters varied. Although a template for an advance letter was provided by the Central Coordinating Team, it was also stressed that national teams should mention those topics in the advance letter that would be most appropriate in a particular country. There was also variation in whether or not incentives were used and, where they were used, in their timing (unconditional, after the interview) and their type and value (money, small token). The use of respondent incentives exhibited a negative relationship with the response rates across countries since incentives were used more widely in countries with low response rates. Some countries have no tradition of using respondent incentives, while in others there was patently little need to use them. Another difference between countries that is probably relevant was the interviewer status, both in terms of their employment conditions and the remuneration they received (incentives and bonuses, and the payment regime – either per hour or per completed interview). These differences resulted from the fact that a fieldwork organization was selected in each country that would be able to perform the task in accordance with the central specifications. In this case it is very difficult, if not impossible, to change existing employment and remuneration strategies. Some fieldwork details were thus allowed to differ to enable the central approach to be tailored to make it effective in the individual national situations. Other differences were unavoidable, such as the sampling frames; the best available sampling frame was selected in each country, making sure that these were effectively equivalent even if they were not actually the same. Yet other implementation details differed because fieldwork funding became available at a late stage or was barely sufficient. Late funding caused the fieldwork period to vary across countries; it also resulted in a dilemma in some cases, forcing a decision between finishing the fieldwork on time or ending with a lower response rate than would have been possible if the fieldwork had been able to run its course. Insufficient funding also resulted in some countries stopping fieldwork early and not making the required number of calls to all target persons. Finally, as shown in Chapter 6, there were differences in interview efforts across countries. Few calls were made to noncontacts in some countries, whereas they were pursued relentlessly in others. There was no refusal conversion in some countries, while substantial numbers of initial refusals ultimately cooperated in others. Part of the explanation for this was that fieldwork conditions were much more advantageous in some countries than in others. If 90% of the target persons are found at home at the first call (the Slovak Republic), there is less need for a large number of calls to the noncontacted sample units than when only 45% are found at home (Portugal). Similarly, when 95% of the final respondents cooperate at the first contact and the
STANDARDIZATION, TAILORING AND CONTROL
297
final response rate is high (Greece), there is less need for a costly refusal conversion programme than in a country where only 20% of the final respondents cooperate at the first contact and the final response rate is well below the target (the Netherlands). However, the difficulty of making contact and obtaining cooperation is not the only explanation for extended efforts; in some countries, the survey conditions seemed comparatively good, yet great efforts were still made, whereas in other countries the results did not look promising early in fieldwork period, yet little was done to improve the situation. These examples show that input harmonization will always have to include coping with national restrictions and adapting general guidelines to national best practices. This has a number of consequences. When comparing response rates (as in Chapter 5), it should be borne in mind that these will partly be the consequence of different sampling and fieldwork strategies. When adjusting for nonresponse bias (as in Chapter 8), it is important to remember that in many cases different types of auxiliary data will be available across countries. A second lesson to be drawn from the shaky balance between the desire for standardization and the need for tailoring is that some of the guidelines and requirements of the ESS may have to be reconsidered. A more tailored, countryspecific approach might be the optimum; for example, in some countries a larger number of weekend calls might be specified. What could be improved is the refusal conversion strategy, both as recommended and implemented. Refusal conversion aimed mainly at ‘soft’ refusals does not appear to improve the representativeness of the sample in a particular country (see Section 8.3.2). The ideal would be to reapproach all refusals, as far as ethically possible (see Section 7.9) and financially feasible. A second-best solution would be to re-approach a random subsample of all refusals, or random subsamples of different types of refusals; for example, according to type of neighbourhood or reason for refusal. A third lesson that can be drawn is that the requirements have not always been met. As Chapter 6 showed, noncontacts have not always been followed up as diligently as they should have been, and in some countries no visible efforts were made to convert even soft refusals. This could be the result of lack of funds, local expertise or a willingness to adhere to rules that run counter to national or organizational survey practice. It could also be that the decentralized fieldwork implementation of the ESS, where the funding is national and fieldwork organizations are selected and controlled nationally, makes it very difficult to convey the meaning and purpose of central survey specifications. The recently introduced ESS field directors meetings (Zabal and Wohn, 2008) are a first step towards bringing national practitioners and central designers together, improving communication and learning from each other. From a cross-national perspective, an important consideration with regard to the nonresponse literature is that in many large-scale, cross-national collaborative studies, much less is under the control of ‘the researchers’ than the conceptual model of survey cooperation by Groves and Couper (1998) suggests, and probably even less under the control of a coordinating authority, such as the Central Coordinating Team of the ESS. Even in a national study, when fieldwork has to be outsourced, the
298
LESSONS LEARNED
characteristics of the interviewers – including their experience and remuneration – usually cannot be manipulated. In a cross-national study, it has to be accepted that in some countries only very few organizations will be able to conduct a high-quality face-to-face survey, and that organizational practices differ. The situation might be more malleable in a study like SHARE, where the fieldwork is financed centrally, CAPI software is implemented centrally, and the format and content of interviewer training is prescribed and monitored by central staff (B€orsch-Supan and J€urges, 2005). Even then, however, some differences will undoubtedly remain. In a situation with limited central and national funding of fieldwork, as in the present ESS, the scope for control is more limited. Hopefully, this situation will improve in the near future when new governance arrangements and funding systems for the ESS are developed (see www.europeansocialsurvey.org).
9.3 Achieving High Response Rates The target response rate in the ESS is 70%; the target maximum noncontact rate is 3%. High response rates are an important methodological aim of the ESS. One reason for this is that while high response rates do not necessarily guarantee low nonresponse bias, they do limit the maximum nonresponse bias. A second reason is that it was assumed that similar nonresponse rates would promote comparability. It was also hoped that setting high response rate targets would encourage improvements through the successive rounds, especially in the countries with the lowest response rates. Has the ESS achieved the envisaged high response rates? Chapter 5 shows that the answer to this question is ‘no’, as many countries did not achieve the 70% target. Response rates were higher than usual in some countries (possibly because of great efforts) and lower in others (possibly due to stricter fieldwork monitoring and definitions of response rates). Has the ESS achieved higher response rates than other cross-national surveys? This question is difficult to answer because, despite the pleas by analysts referred to earlier, response rates are rarely calculated in a standard way across countries and across surveys (Couper and de Leeuw, 2003; EQLS, 2007; Eurostat, 2008). Can we compare response rates in the ESS with response rates from other surveys? The Eurobarometer1 does not publish any information on response rates; the Survey on Health, Ageing and Retirement in Europe (B€orsch-Supan and J€ urges, 2005) is a panel survey and covers a specific population; and participation on the Labour Force Survey is obligatory in a number of European countries. Response rates in other cross-national attitudinal surveys, such as the International Social Survey Programme (ISSP), the World Values Survey (WVS) and the European Values Study (EVS), should in principle be more readily comparable. However, as Couper and de Leeuw (2003) have shown, the response rates in the ISSP are difficult to compare across countries because the interview modes differ and because nonrespondents and ineligibles cannot always be distinguished. In addition, 1
http://ec.europa.eu/public_opinion/standard_en.htm.
REFUSAL CONVERSION
299
the ISSP questionnaire is a relatively short survey module that in a number of cases is administered as a drop-off from another survey. In the WVS, at least in some countries (GfK Marktforschung, 2006), quota sampling is used, which makes it impossible to calculate nonresponse. Halman (2001) gives an overview of response rates in EVS countries, which indicates that in a substantial number of countries quota controls were used, and that in a majority of cases some kind of substitution was permitted. In none of these three surveys were call records kept that allow for a standardized response calculation. This illustrates that unless random sampling is used, strict rules applied to sampling and respondent selection and response rates calculated in a standardized way, it is very difficult to compare response rates across countries and across surveys. We do not believe we can reliably compare ESS response rates with those from telephone surveys, surveys in which modes differ across countries, surveys on completely different topics, considerably shorter surveys, surveys that use quota sampling or surveys where proxy interviewing and substitution are allowed. We do, however, feel that the ESS response rates in a number of countries are close to what is realistically achievable taking into account the efforts, improvements across rounds and the feedback from National Coordinators and Field Directors on how hard they had tried. In some countries, however, there is clearly room for improvement. Continual innovation is important in all countries. We learned from our analyses in Chapters 5 and 6 that response rates could have been somewhat higher in some countries, and noncontact rates somewhat lower, had the prescribed rules been implemented in all cases. A number of final noncontacts could probably have been reached and some of them would most likely have cooperated. Evidence from some countries shows that refusal conversion attempts will produce additional respondents, and this would most likely have been the case in those countries where little or no attempt was made at refusal conversion. A matter of some concern is that it may be difficult to maintain high response rates. We saw in Chapter 5 that some of the best-performing countries did a little worse in ESS 3 than in previous rounds. This suggests that aiming for high response rates involves more than developing a strategy that works and continuing with it round after round. Strategies will lose their effectiveness, incentives may lose their value, and interviewers may lose the initial motivation that they had when they were working on something new and exciting. A responsive design effort between rounds, which utilizes the lessons learned from contact form analysis, is therefore an important means of maintaining high quality.
9.4 Refusal Conversion It was expected that a fruitful strategy for enhancing response rates and minimizing bias would be to set up refusal conversion programmes. Persuading initial refusals to cooperate would mean smaller nonresponse bias if these initial refusals were more similar to final refusals than those who cooperated without first refusing. Information
300
LESSONS LEARNED
on the initial refusals, also called reluctant respondents, could then also be used to adjust for nonresponse bias. In reality, results from Chapter 7 show that refusal conversion as currently implemented in the ESS is a less promising strategy than initially assumed. In some countries, virtually no efforts are made to try and convert refusals; and where refusal conversion efforts are made, their success varies across countries. Even in countries where refusal conversion seems to be successful, the strategy seems to have varied, since in some countries most of the converted refusals appeared to be ‘soft’ refusals, whereas in others a large number of ‘hard’ refusals also ultimately participated. This severely limits the possibility of using the converted refusal data for the correction of nonresponse bias. That said, refusal conversion in the ESS accounts for a growing proportion of final survey participants. This means that in a number of countries refusal conversion substantially enhances response rates. In addition, in ESS 3 a larger number of countries than before achieved over 100 converted refusals, a number that is sufficient for analysis purposes. Data from the ESS (see Section 7.3.3) provide evidence that it is better to wait about two or three weeks before re-approaching an initial refusal. This adds further weight to the idea that in many instances refusal to take part in a survey is a timedependent phenomenon. In addition, analysis of ESS contact form data confirms that it is better to use a new and more experienced interviewer, since they will have a higher likelihood of success. In terms of data quality, we found that converted refusals in the ESS do not in general provide poorer-quality data. There is no evidence of satisficing, although we do have some indication that hard-to-convert refusals have rather more problems with the questions. This area needs further research. Most importantly, data from the ESS suggest that refusal conversion makes little significant difference to the composition of the final achieved sample, and that where it does, there is variability in terms of whether it makes the samples more or less representative. These results are based on data from no more than five countries, and often just two; and are therefore limited in their generalizability. What the lack of effects perhaps does reflect is the greater likelihood of fieldwork organizations and interviewers focusing their conversion efforts on the soft refusals. This makes sense if the goal is to increase the overall response rate, since the conversion probability is significantly higher for soft than for hard refusals. This could also mean more cases that are similar to already cooperating respondents, and may be of little use in reducing nonresponse bias. Further research into the possibility of reapproaching all refusals or a random subsample of all refusals, or focusing only on hard refusals, would be very beneficial in terms of making a decision about the usefulness of this approach. The ethics of re-approaching hard refusals were discussed in Section 7.9.
9.5 Nonresponse Bias High response rates are never a goal in themselves but, rather, a means of enhancing survey data quality. While we know that the maximum bias will be smaller as response
NONRESPONSE BIAS
301
rates go up, in absolute terms the actual bias can be larger (for some variables) in a high-response survey than in a low-response survey. In the ESS, even in the higherresponse countries, there is ample room for bias. Furthermore, given the differences in fieldwork practices, nonresponse rates and response composition, the possibility cannot be ruled out that there are differences in bias across countries. Chapter 8 of this book looks in some depth at ways of assessing and correcting for bias, based on different types of auxiliary data as well as a range of different techniques and models; namely, post-stratification, using interviewer observations, collecting doorstep information and conducting follow-up surveys among refusals. Post-stratification (PS) is frequently used to improve the precision of survey estimators when categorical auxiliary information is available from sources outside the survey, in this case population statistics. The underlying assumption is that due to sampling errors, noncoverage and different types of nonresponse, the distributions of the PS variables in the survey will differ from those in the population, and that PS weighting will makes these distributions identical. In principle, therefore, poststratification can be effective in reducing nonresponse bias (Bethlehem, 2009, p. 250). In practice, however, this approach to adjusting for nonresponse will have a limited effect when the target variables co-vary weakly with the PS variables, as they often do. In addition, weighting may reduce the precision of survey outcomes. Other problems are that the PS variables in the survey or in population statistics may contain errors. So although post-stratification is in principle easily to apply, its value as a general method to correct for nonresponse in the ESS is limited. The problems are even greater when trying to detect bias by comparing converted refusals (also called reluctant respondents) with cooperative respondents. One practical problem is that refusal conversion seems to be a different process in different countries (see Chapter 7), as summarized in the previous section. We cannot assume that converted refusals reflect the final nonrespondents. As stated in the conclusion to Chapter 8, refusal conversion can substantially enhance response rates, but would be more useful if refusal conversion decisions were better documented and if some efforts were directed towards hard refusals. Chapter 8 also showed that the data quality on the contact forms varies from variable to variable and from country to country. In theory, the neighbourhood information observed by interviewers could provide useful auxiliary data on all sample units that could be used as a proxy for household characteristics and could indicate practical difficulties in establishing contact and obtaining cooperation. In practice, however, the quality of these data prevents them from being reliably used in all countries for detecting and correcting nonresponse bias. More interviewer training in collecting neighbourhood data is needed here, as well as more cross-national harmonization. Collecting information on core questions among nonrespondents is useful for bias detection and adjustment, as long as the number of refusals who cooperate is adequate for nonresponse modelling. The response rate in the follow-up survey of ESS nonrespondents ranged between 23% in Poland and 53% in Switzerland. This partly reflects the high response rate in Poland for the main fieldwork and the relatively low
302
LESSONS LEARNED
initial response rate in Switzerland. The costs of follow-up surveys are substantial, although as a percentage of the total survey costs they are limited. Two approaches were used in ESS 3: a doorstep questionnaire survey among refusals and a follow-up survey among nonrespondents and (a sample of) respondents. Both approaches have provided useful auxiliary data that have been used in propensity weighting. Chapter 8 showed that core question topics that are prone to bias and that should be included in a follow-up survey or doorstep questionnaire are education, social participation, interest in politics, voluntary work, watching television and the attitude towards surveys. None of the methods used to assess and correct for nonresponse bias is a panacea, and all have their theoretical and practical problems. The best approach appears to be to use multiple types of auxiliary data and multiple types of weighting models, even when we cannot do this in all countries. We found that weighting with auxiliary data has small effects on statistics such as means and proportions, and even smaller effects on estimates from multivariate models. This is an important finding, as social researchers are usually less interested in descriptive statistics and more interested in the comparison of explanatory models. The small effects could mean that we are failing to identify and measure those auxiliary data that determine both response behaviour and survey outcomes. What we can conclude at this juncture is that – using all the information that is available and based on different approaches – we have no evidence of serious nonresponse bias in the ESS.
9.6 Contact Forms and Fieldwork Monitoring With the contact forms (see Chapter 3, including Appendix 3.1), the ESS has developed and implemented a tool that can be used to follow each step of the fieldwork, to calculate standard response rates across countries (acknowledging sampling frame differences), to assess whether guidelines have been adhered to, and to estimate bias and suggest improvements for future rounds. Never before has so much information been available on the response process in a cross-national survey, and never has so much information been publicly and freely available on the response process. Every interviewer in every country, in every round of the ESS, records information in a uniform way on the result of each visit and each contact at each address. At least, this is the ideal. In practice, paradata – the process data available from the contact forms – contain errors and can provide only an approximation of the fieldwork process. Ideally, collecting paradata requires the same quality standards as interview data. This means that collecting and editing paradata should be seen as extra work that requires adequate funding. An incentive to collect better ESS paradata is when the information gathered in one round of the survey can be used to improve the fieldwork in the next round, thus facilitating a kind of responsive fieldwork design between rounds (Koch et al., 2009). This can be done in simple ways (for instance, by ensuring that every interviewer makes the required number of calls) or more general ways (by
INTO THE FUTURE
303
ascertaining the best time to call on sample units in a particular country, or by training interviewers to react adequately to the most frequent reasons for refusal). We could also try to learn more from the contact sequences in call record data, as has been done by Kreuter and Kohler (2009). Ideally, the feedback loops between analysis of the contact form data and the fieldwork strategies employed in the next round need to be strengthened. In the first round of the ESS, the fieldwork in some countries seemed to be going very well until it ultimately turned out that response rates in some regions of these countries were very low and that the final response rates were much lower than expected. The fieldwork monitoring programme in the ESS has been intensified since then, requiring detailed response forecasts prior to the start of fieldwork and placing greater emphasis on fortnightly reports on final disposition codes and the fieldwork status of all sample units from each country. These efforts have reduced unpleasant surprises, although they cannot guarantee high response rates. There is an additional incentive to collect ESS paradata, train interviewers in how to collect those data and allocate additional funds for this part of the data collection process when the contact form data can be used for monitoring and control during fieldwork. This is only possible when call record data are collected or recorded electronically and processed by the fieldwork organization on a regular, preferably daily, basis. This is now rarely the case in the ESS.
9.7 Into the Future In 2008, the ESS Review Panel (Bethlehem et al., 2008) recommended that the ESS should step up the studies of nonresponse bias, given the increasing risks posed by falling response rates. We feel that with this book we have given an overview both of how to keep response rates as high as possible and of the potential effect of bias. This book is intended to be relevant for the future of the ESS, but also for survey researchers in Europe and other continents who are engaged in cross-national and national surveys. We have learned that we must never stop trying to obtain the best response rates possible, that we must keep adapting and improving our response enhancing strategies, that it is of the utmost importance to monitor and record the response process, and that minimizing nonresponse bias should be our ultimate goal.
References Abraham, K.G., Helms, S. and Presser, S. (2009) How social processes distort measurement: the impact of survey nonresponse on estimates of volunteer work in the United States. American Journal of Sociology, 114(4), 1129–65. Abraham, K.G., Maitland, A. and Bianchi, S.M. (2006) Nonresponse in the American Time Use Survey: who is missing from the data and how much does it matter? Public Opinion Quarterly, 70, 676–703. Allison, P.D. (1999) Logistic Regression Using the SASÒ System, SAS Institute, Inc., Cary, NY. American Association for Public Opinion Research (AAPOR) (2005) AAPOR Code of Professional Ethics & Practice. Available at: http://www.aapor.org/aaporcodeofethics American Association for Public Opinion Research (AAPOR) (2008) Standard Definitions: Final Dispositions of Case Codes and Outcome Rates for Surveys, 5th edn, AAPOR, Lenexa, Kansas. Atrostic, B.K., Bates, N., Burt, G. and Silberstein, A. (2001) Nonresponse in U.S. government household surveys: consistent measures, recent trends and new insights. Journal of Official Statistics, 17(2), 209–26. Bates, N. and Creighton, K. (2000) The last five percent: what can we learn from difficult interviews? In Proceedings of the Annual Meetings of the American Statistical Association, 13–17 August. Bates, N., Dahlhamer, J. and Singer, E. (2008) Privacy concerns, too busy, or just not interested: using doorstep concerns to predict survey nonresponse. Journal of Official Statistics, 24(4) 591–612. Baumgartner, H. and Steenkamp, J.B.E.M. (2001) Response styles in marketing research: a cross national investigation. Journal of Marketing Research, 38, 43–156. Bethlehem, J.G., & H.M.P. Kersten (1986) Werken met non-respons. Statische Onderzoekingen M30, CBS publikaties. ’s-Gravenhage, Staatsuitgeverij. Bethlehem, J.G. (2002) Weighting nonresponse adjustments based on auxiliary information, in Survey Nonresponse (eds R.M. Groves, D.A. Dillman, J.L. Eltinge and R.J.A. Little), John Wiley & Sons, Inc., New York, pp. 265–87. Bethlehem, J. (2009) Applied Survey Methods: A Statistical Perspective, John Wiley & Sons, Inc., Hoboken, NJ.
Improving Survey Response: Lessons learned from the European Social Survey Ineke Stoop, Jaak Billiet, Achim Koch and Rory Fitzgerald Ó 2010 John Wiley & Sons, Ltd
306
REFERENCES
Bethlehem, J.G. and Keller, W.J. (1987) Linear weighting of sample survey data. Journal of Official Statistics, 3(2), 141–53. Bethlehem, J.G. and Kersten, H.M.P. (1985) On the treatment of nonresponse in sample surveys. Journal of Official Statistics, 1(3), 287–300. Bethlehem, J., Medrano, J., Groves, R. et al. (2008) Report of the Review Panel for the European Social Survey, European Science Foundation. Bethlehem, J. and Schouten, B. (2003) Nonresponse analysis of the integrated survey on living conditions (POLS). Paper presented at the 14th International Workshop on Household Survey Nonresponse, Leuven. Bethlehem, J.G. and Stoop, I. (2007) Online panels, a paradigm theft? In The Challenges of a Changing World (eds M. Trotman et al.). Southampton, UK, Association for Survey Computing, Southampton, UK, pp. 113–31. Beullens, K., Billiet, J. and Loosveldt, G. (2009a) Selection Strategies for Refusal Conversion of Four Countries in the European Social Survey, 3rd Round, Research Report of CeSO, SM/2009-10, Leuven. Beullens, K., Billiet, J. and Loosveldt, G. (2009b) The effect of the elapsed time between initial refusal and conversion contact on conversion success: evidence from the 2nd round of the European Social Survey. Quality & Quantity, DOI 10.1007/s11135-009-9257-4. Beullens, K., Symons, K. and Loosveldt, G. (2009c) Can we rely on the interviewers’ opinion about the respondent to assess survey quality? An application of a three-level random coefficient model. Working paper of the Survey Methodology section, CeSO, K.U. Leuven. Beullens, K., Vandecasteele, L. and Billiet, J. (2007) Refusal conversion in the Second Round of the European Social Survey. Working Paper of CeSO. SM/2007-5. Biemer, P., Groves, R., Lyberg, L. et al. (1988) Telephone Survey Methodology, John Wiley & Sons, Inc., Hoboken, NJ. Biemer, P.B. and Lyberg, L.E. (2003) Introduction to Survey Quality, John Wiley & Sons, Inc., Hoboken, NJ. Billiet, J. (2007a) Het belang van regelmatig onderzoek naar opinies en houdingen in de bevolking, in Vlaanderen gepeild! (ed. J. Pickery), SVR-Studie 2007/2 (7-36), Studiedienst van de Vlaamse Regering. Billiet, J. (2007b) Reflections on the quality of cross-national surveys: lessons of the European Social Survey. Concepts & Methods, 3(2), 3–8. Billiet, J. and Davidov, E. (2008) Testing the stability of an acquiescence style factor behind two interrelated substantive variables in a panel design. Sociological Methods & Research, 36(4), 542–62. Billiet, J., Koch, A. and Philippens, M. (2007) Understanding and improving response rates, in Measuring Attitudes Cross-nationally. Lessons from the European Social Survey (eds R. Jowell, C. Roberts, R. Fitzgerald and G. Eva), Sage Publications, London, pp. 113–37. Billiet, J. and Loosveldt, G. (1988) Improvement of the quality of responses to factual survey questions by interviewer training. Public Opinion Quarterly, 52, 190–211. Billiet, J. and McClendon, J.M. (2000) Modeling acquiescence in measurement models for two balanced sets of items. Structural Equation Modeling. An Interdisciplinary Journal, 7(4), 608–29. Billiet, J., Matsuo, H., Beullens, K. and Vehovar, V. (2009) Non-response bias in cross-national surveys: designs for detection and adjustment in the ESS. ASK. Society. Research. Methods, 18, 3–43. Billiet, J. and Meuleman, B. (2008) Measuring attitudes and feelings towards discrimination in cross-nation research: lessons learned from the European Social Survey, in Proceedings of
REFERENCES
307
the33rd CEIES Seminar Ethnic and Racial Discrimination on the Labour Market, Malta, 6–7 June 2007. Billiet, J. and Philippens, M. (2004) Data quality assessment in ESS Round 1. Between wishes and reality. Paper presented at the Sixth International Conference on Social Science Methodology, Amsterdam. Billiet, J., Philippens, M., Fitzgerald, R. and Stoop, I. (2007) Estimation of nonresponse bias in the European Social Survey: using information from reluctant respondents. Journal of Official Statistics, 23(2), 135–62. Billiet, J. and Pleysier, S. (2007) Response Based Quality Assessment in the ESS – Round 2. An Update for 26 Countries, Version of 5 May 2007, Center of Sociological Research (CeSO), K.U. Leuven. Billiet, J., Swyngedouw, M. and Waege, H. (2004) Attitude strength and response stability of a quasi-balanced political alienation scale in a panel study, in Studies in Public Opinion: Attitudes, Nonattitudes, Measurement Error, and Change (eds W.E. Saris and P.M. Sniderman), Princeton University Press, Princeton, NJ, pp. 268–92. Blair, J. and Chun, Y.I. (1992) Quality of data from converted refusers in telephone surveys. Paper presented at the conference of the Association for Public Opinion Research, St Petersburg, FL. Blohm, M. and Diehl, C. (2001) Wenn Migranten Migranten befragen. Zum Teilnahmeverhalten von Einwanderern bei Bev€olkerungsbefragungen. Zeitschrift f€ ur Soziologie, 30(3), 223–42. Blohm, M., Hox, J. and Koch, A. (2007) The influence of interviewers’ contact behavior on the contact and cooperation rate in face-to-face household surveys. International Journal of Public Opinion Research, 19, 97–111. Blom, A., Lynn, P. and J€ackle, A. (2008) Understanding cross-national differences in unit nonresponse: the role of contact data. ISER Working Paper Series No. 2008-01, University of Essex. Bogen, K. (1996) The effect of questionnaire length on response rates – a review of the literature, in Proceedings of the Survey Research Methods Section, American Statistical Association, pp. 1020–5. Borg, I. (2000) Fr€uh- versus Sp€atantworter. ZUMA-Nachrichten, 47, Jg. 24, 7–19. B€ orsch-Supan, A. and J€urges, H. (eds) (2005) The Survey of Health, Ageing, and Retirement in Europe – Methodology, Mannheim Research Institute for the Economics of Ageing (MEA), Mannheim. Brackstone, G.J. and Rao, J.N.K. (1979) An investigation of raking ratio estimators. Sankhya, Series C, 41, 97–114. Bradburn, N.N. (1992) Presidential address: a response to the nonresponse problem. Public Opinion Quarterly, 56, 391–97. Bradburn, N.M. (1984) Discussion: telephone survey methodology, in Health Survey Research Methods: Conference Proceedings (eds C.F. Cannell and R.M. Groves). Brehm, J. (1993) The Phantom Respondents: Opinion Surveys and Political Representation, University of Michigan Press, Ann Arbor. Bronner, A.E. (1988) Surveying ethnic minorities, in Sociometric Research, vol. 1 (eds W.E. Saris and I.N. Gallhofer), Macmillan, London, pp. 36–47. Burton, J., Laurie, H. and Lynn, P. (2006) The long term effectiveness of refusal conversion procedures on longitudinal surveys. Journal of the Royal Statistical Society, Series A, 169, 459–78.
308
REFERENCES
Campanelli, P. and O’Muircheartaigh, C. (1999) Interviewers, interviewer continuity, and panel survey nonresponse. Quality & Quantity, 33, 59–76. Campanelli, P., Sturgis, P. and Purdon, S. (1997) Can You Hear Me Knocking: An Investigation into the Impact of Interviewers on Survey Response Rates, The Survey Methods Centre at SCPR, London. Cannel, C.F. and Fowler, F.J. (1963) Comparison of a self-enumerative procedure and a personal interview. a validity study. Public Opinion Quarterly, 27(2), 250–64. Centraal Bureau voor de Statistiek (CBS) (1987) De leefsituatie van Turken en Marokkanen in Nederland 1984, Staatsuitgeverij, ’s-Gravenhage. Centraal Bureau voor de Statistiek (CBS) (1991) De gezondheidsenqu^ete Turkse ingezetenen in Nederland 1989/1990, CBS, Voorburg/Heerlen. Cincinatto, S., Beullens, K. and Billiet, J. (2008) Analysis of Observable Data in Call Records ESS – R2, Deliverable no. 6 of ESSi-JRA2, CeSO, K.U. Leuven. Cordero, C., Groves, R., Kreuter, F. et al. (2007) Using paradata to improve nonresponse adjustment. Paper presented at the ESRA conference, Prague, 2007. Council of American Survey Research Organizations (CASRO) (2009) CASRO Code of Standards and Ethics for Survey Research. Available at: http://www.casro.org/ codeofstandards.cfm Couper, M.P. (1997) Survey introductions and data quality. Public Opinion Quarterly, 61, 317–38. Couper, M.P. and Groves, R.M. (1996) Social environmental impacts on survey cooperation. Quality & Quantity, 30, 173–88. Couper, M.P. and de Leeuw, E.D. (2003) Nonresponse in cross-cultural and cross-national surveys, in Cross-cultural Survey Methods (eds J.A. Harkness, F.J.R. van de Vijver and P.Ph. Mohler), John Wiley & Sons, Inc., Hoboken, NJ, pp. 157–77. Curtin, R., Presser, S. and Singer, E. (2000) The effects of response rate changes on the index of consumer sentiment. Public Opinion Quarterly, 64, 413–28. Darcovich, N. and Murray, T.S. (1997) Data collection and processing, in Adult Literacy in OECD Countries: Technical Report on the First International Adult Literacy Survey (eds T.S. Murray, I.S. Kirsch and L.N. Jenkins), National Center for Education Statistics, Office of Educational Research and Improvement. NCES 98-053, pp. 75–91. D€aubler, T. (2002) Nonresponseanalysen der Stichprobe F, Materialen 15, Deutsches Institut f€ur Wirtschaftsforschung, Berlin. Davidov, E., Meuleman, B., Billiet, J. and Schmidt, P. (2008) Values and support for immigration: a cross-country comparison. European Sociological Review, 24(5), 583–99. Deding, M., Fridberg, T. and Jakobsen, V. (2008) Non-response in a survey among immigrants in Denmark. Survey Research Methods, 2(3), 107–21. de Heer, W. (1999a) International response trends: results of an international survey. Journal of Official Statistics, 15, 129–42. de Heer, W. (1999b) Survey Practices in European Countries. Statistics Netherlands, Report to the Eurolit Expertgroup. de Heer, W. (2000) Survey practices in European countries, in Measuring Adult Literacy. The International Adult Literacy Survey (IALS) in the European Context (ed. S. Carey). London, Office for National Statistics, London, pp. 43–67. de Kruijk, M. and Hermans, E. (1998) Effect van non-response op de representativiteit, in Recente ontwikkelingen in het marktonderzoek (eds A.E. Bronner, P. Ester, P.S.H.
REFERENCES
309
Leeflang et al.), Jaarboek van de Nederlandse Vereniging voor Marktonderzoek en Informatiemanagement, De Vrieseborch, Haarlem, pp. 55–69. de Leeuw, E. (2001) I am not selling anything: experiments in telephone introductions. Kwantitatieve Methoden, 22(68), 41–8. de Leeuw, E.D. (2005) To mix or not to mix data collection modes in surveys. Journal of Official Statistics, 21, 233–55. de Leeuw, E., Callegaro, M., Hox, J. et al. (2007) The influence of advance letters on response in telephone surveys: a meta-analysis. Public Opinion Quarterly, 71(3), 413–43. de Leeuw, E. and de Heer, W. (2002) Trends in household survey nonresponse: a longitudinal and international comparison, in Survey Nonresponse (eds R.M. Groves, D.A. Dillman, J.L. Eltinge and R.J.A. Little), John Wiley & Sons, Inc., New York, pp. 41–54. de Leeuw, E.D. and Hox, J.J. (1998) Nonrespons in surveys: een overzicht. Kwantitatieve Methoden, 19(57), 31–53. De Luca, G. and Peracchi, F. (2005) Survey participation in the first wave of SHARE, in The Survey of Health, Aging, and Retirement in Europe – Methodology (eds A. B€ orsch-Supan and H. J€urges), Mannheim Research Institute for the Economics of Aging (MEA), Mannheim, pp. 88–104. Deville, J.C. and S€arndal, C.-E. (1992) Calibration estimators in survey sampling. Journal of the American Statistical Association, 87, 376–82. Dijkstra, W. and Smit, J.H. (2002) Persuading reluctant recipients in telephone surveys, in Survey Nonresponse (eds R.M. Groves, D.A. Dillman, J.L. Eltinge and R.J.A. Little), John Wiley & Sons, Inc., New York, pp. 121–34. Dillman, D. (1978) Mail and Telephone Surveys. The Total Design Method, John Wiley & Sons, Inc., New York. Dillman, D. (2000) Mail and Internet Surveys. The Tailored Design Method, John Wiley & Sons, Inc., New York. Dillman, D.A., Eltinge, J.L., Groves, R.M. and Little, R.J.A. (2002) Survey nonresponse in design, data collection, and analysis, in Survey Nonresponse (eds R.M. Groves, D.A. Dillman, J.L. Eltinge and R.J.A. Little), John Wiley & Sons, Inc., New York, pp. 3–26. Dong, X. and Fuse, K. (2004) Refusal and refusal conversion in telephone survey research: successful conversion. Paper presented at the 59th annual Meeting of the American Association for Public Opinion Research. Arizona, Phoenix. Duffy, B., Smith, K., Terhanian, G. et al. (2005) Comparing data from online and face-to-face surveys. International Journal of Marketing Research, 47(6), 615–39. Duhart, D., Bates, N., Williams, B. et al. (2001) Are late/difficult cases in demographic survey interviews worth the effort? A review of several federal surveys, in Proceedings of the Federal Committee on Statistical Methodology Research Conference, November 2001. Edwards, S., Martin, D., DiSogra, C. and Grant, D. (2004) Altering the hold period for refusal conversion cases in an RDD survey, in Proceedings of the Survey Research Methods Section, Alexandria, VA, American Statistical Association, alexandria, VA, pp. 3440–4. Elliot, D. (1991) Weighting for Non-response: a Survey Researcher’s Guide, Office of Population Censuses and Surveys, London. EQLS (2007) First European Quality of Life Survey. Methodological Review, European Foundation for the Improvement of Living and Working Conditions, Dublin. Available at: http://www.eurofound.europa.eu/docs/areas/qualityoflife/eqls2methreview.pdf ESF (1999) The European Social Survey (ESS) – a Research Instrument for the Social Sciences in Europe: Summary, European Science Foundation Standing Committee of the Social Sciences, Strasbourg.
310
REFERENCES
ESOMAR (2008a) Global Market Research 2008, ESOMAR Industry Report. ESOMAR (2008b) ICC/ESOMAR International Code on Market and Survey Research. Available at: http://www.esomar.org/index.php/codes-guidelines.html European Communities (2004) How Europeans Spend Their Time. Everyday Life of Women and Men, Office for Official Publications of the European Communities, Luxembourg. Available at: http://epp.eurostat.cec.eu.int/cache/ITY_OFFPUB/KS-58-04-998/FR/KS-5804-998-FR.PDF European Social Survey (2003) ESS1-2002 Documentation Report, Edition 6.0, European Social Survey Data Archive, Norwegian Social Science Data Services, Bergen. European Social Survey (2005) ESS2-2004 Documentation Report, Edition 3.1, European Social Survey Data Archive, Norwegian Social Science Data Services, Bergen. European Social Survey (2007a) ESS3-2006 Documentation Report, Edition 2.0, European Social Survey Data Archive, Norwegian Social Science Data Services, Bergen. European Social Survey (2007b) Round 4 Specification for Participating Countries, Centre for Comparative Social Surveys, City University, London. Available at: www. europeansocialsurvey.org Eurostat (2008) Quality report on the European Union Labour Force Survey 2006. Methodologies and Working Papers, Office for Official Publications of the European Communities, Luxembourg. Fellegi, I.P. (2001) Comment [on ‘Can a statistician deliver?’ – same issue]. Journal of Official Statistics, 17(1), 43–50. Feskens, R., Hox, J., Lensvelt-Mulders, G. and Schmeets, H. (2007) Nonresponse among ethnic minorities: a multivariate analysis. Journal of Official Statistics, 23(3), 387–408. Feskens, R., Hox, J., Schmeets, H. and Wetzels, H. (2008) Incentives and ethnic minorities: results of a controlled randomized experiment in the Netherlands. Survey Research Methods, 2(3), 159–65. Fitzgerald, R. and Jowell, R. (2008) Measurement equivalence in comparative surveys: the European Social Survey (ESS) – from design to implementation and beyond. Paper presented at the 3MC conference, Berlin, 2008. Fuse, K. and Dong, X. (2005) A successful conversion or double refusal: a study of the conversions in telephone survey research. Paper presented at the 60th annual meeting of the Association for Public Opinion Research. Florida, Miami Beach. Gelman, A. and Carlin, J.B. (2002) Post-stratification and weighting adjustments, in Survey Nonresponse (eds R.M. Groves, D.A. Dillman, J.L. Eltinge and R.J.A. Little), John Wiley & Sons, Inc., New York, pp. 289–302. GfK Marktforschung (2006) Methodological report. World Values Survey 2005/2006. France, Great Britain, Italy, Netherlands, Russia, USA. Available at: www.worldvaluessurvey.org Goyder, J. (1986) Surveys on surveys: limitations and potentialities. Public Opinion Quarterly, 50, 27–41. Goyder, J. (1987) The Silent Minority. Nonrespondents on Sample Surveys. Polity Press, Cambridge. Goyder, J., Lock, J. and McNair, T. (1992) Urbanization effects on survey nonresponse: a test within and across cities. Quality & Quantity, 26, 39–48. Goyder, J., Warriner, K. and Miller, S. (2002) Evaluating socio-economic status (SES) bias in survey nonresponse. Journal of Official Statistics, 18(1), 1–11. Greenleaf, E.A. (1992a) Improving rating scale parameters by detecting and correcting bias components in some response styles. Journal of Marketing Research, 29, 176–88.
REFERENCES
311
Greenleaf, E.A. (1992b) Measuring extreme response style. Public Opinion Quarterly, 56, 328–50. Groves, R.M. (1989) Survey Errors and Survey Costs, John Wiley & Sons, Inc., New York. Groves, R.M. (2006) Nonresponse rates and nonresponse bias in household surveys. Public Opinion Quarterly, 70, 646–75. Groves, R.M. and Benki, J. (2006) 250 Hello’s: acoustic properties of initial respondent greetings and response propensities in telephone surveys. Paper presented at the 17th International Workshop on Household Survey Nonresponse, Omaha, Nebraska. Groves, R.M., Cialdini, R.B. and Couper, M.P. (1992) Understanding the decision to participate in a survey. Public Opinion Quarterly, 56, 475–95. Groves, R.M. and Couper, M.P. (1998) Nonresponse in Household Interview Surveys, John Wiley & Sons, Inc., New York. Groves, R.M., Couper, M.P., Presser, S. et al. (2006) Experiments in producing nonresponse bias. Public Opinion Quarterly, 70(5), 720–36. Groves, R.M., Dillman, D.A., Eltinge, J.L. and Little, R.J.A. (eds) (2002) Survey Nonresponse, John Wiley & Sons, Inc., New York. Groves, R.M., Fowler, F.J., Couper, M.P. et al. (2004) Survey Methodology, John Wiley & Sons, Inc., Hoboken, NJ. Groves, R.M. and Heeringa, S.G. (2006) Responsive design for household surveys: tools for actively controlling survey errors and costs. Journal of the Royal Statistical Society, Series A: Statistics in Society, 169(3), 439–57. Groves, R.M. and McGonagle, K.A. (2001) A theory-guided interview training protocol regarding survey participation. Journal of Official Statistics, 17(2), 249–66. Groves, R.M., Presser, S. and Dipko, S. (2004) The role of topic interest in survey participation decisions. Public Opinion Quarterly, 68, 2–31. Groves, R.M., Singer, E. and Corning, A. (2000) Leverage-saliency theory of survey participation: description and an illustration. Public Opinion Quarterly, 64, 299–308. Groves, R.M., Singer, E., Corning, A.D. and Bowers, A. (1999) A laboratory approach to measuring the effects on survey participation of interview length, incentives, differential incentives, and refusal conversion. Journal of Official Statistics, 15(2), 251–68. Gupta, R. (2008) Coding categorical variables in regression models: dummy and effect coding. StatNews, no. 72, May. Cornell University, Cornell Statistical Consultancy Unit. H€ader, S. and Lynn, P. (2007) How representative can a multi-nation survey be? In Measuring Attitudes Cross-nationally. Lessons from the European Social Survey (eds R. Jowell, C. Roberts, R. Fitzgerald and G. Eva), Sage Publications, London, pp. 33–52. Halman, L. (2001) The European Values Study: a Third Wave. Source book of the 1999/2000 European Values Study surveys. EVS, WORC, Tilburg University. Hansen, M. and Hurvitz, W. (1946) The problem of nonresponse in sample surveys. Journal of the American Statistical Association, 41, 517–29. Harkness, J.A. (2003) Questionnaire translation, in Cross-cultural Survey Methods (eds J. A. Harkness, F. van de Vijver and P.Ph. Mohler), John Wiley and Sons, New York, pp. 35–56. Harkness, J.A. (2007) Improving the comparability of translations, in Measuring Attitudes Cross-nationally. Lessons from the European Social Survey (eds R. Jowell, C. Roberts, R. Fitzgerald and G. Eva), Sage Publications, London, pp. 79–93. Heerwegh, D. (2003) Explaining response latencies and changing answers using client-side paradata. Social Science Computing, 21, 360–73.
312
REFERENCES
Heerwegh, D., Abts, K. and Loosveldt, G. (2007) Minimizing survey refusal and noncontact rates: do our efforts pay off? Survey Research Methods, 1(1), 3–10. Hidiroglou, M.A. and Patak, Z. (2006) Raking ratio estimation: an application to the Canadian Retail Trade Survey. Journal of Official Statistics, 22(1), 71–80. Hippler, H.-J. and Hippler, G. (1986) Reducing refusal rates in the case of threatening questions: the ‘door-in-the-face’ technique. Journal of Official Statistics, 2(1), 25–33. Hoffmeyer-Zlotnik, J.H.P. (2005) How to measure education in cross-national comparison: Hoffmeyer-Zlotnik/Warner matrix of education as a new instrument, in Methodological Aspects in Cross-national Research (eds J.H.P. Hoffmeyer-Zlotnik and J.A. Harkness), ZUMA-Nachrichten Spezial, 11, ZUMA, Mannheim, pp. 223–40. Holbrook, A.L., Green, M. and Krosnick, J.A. (2003) Telephone vs. face-to-face interviewing of national probability samples with long questionnaires: comparisons of respondent satisficing and social desirability response bias. Public Opinion Quarterly, 67, 79–125. Hox, J. and de Leeuw, E. (2002) The influence of interviewers’ attitude and behavior on household survey nonresponse: an international comparison, in Survey Nonresponse (eds R.M. Groves, D.A. Dillman, J.L. Eltinge and R.J.A. Little), John Wiley & Sons, Inc., New York, pp. 103–20. Hox, J., de Leeuw, E. and Vorst, H. (1995) Survey participation as reasoned action; a behavioral paradigm for survey nonresponse? Bulletin de Methodologie Sociologique, 48, 52–67. International Statistical Institute (1985) Declaration on Professional Ethics. ISI, Voorburg. Available at: http://isi.cbs.nl/ethics.htm Jansma, F., van Goor, H. and Veenstra, R. (2003) Verschillen in stadswijken in onderdekking en nonrespons? Een verkennend onderzoek naar selectieve uitval in een telefonische enqu^ete, in Ontwikkelingen in het marktonderzoek (eds A.E. Bronner et al.), Jaarboek 2003, Marktonderzoek Associatie, De Vrieseborch, Haarlem, pp. 41–58. Japec, L. and Lundqvist, P. (1999) Interviewer strategies and attitudes. Paper presented at the International Conference on Survey Nonresponse. Portland, Oregon, October 1999. Johnson, T.P. (1998) Approaches to equivalence in cross-cultural and cross-national surveyresearch, in Cross-cultural Survey Equivalence (ed. J. Harkness), ZUMA-Nachrichten Spezial, 3, ZUMA, Mannheim, pp. 1–40. Johnson, T.P., Cho, Y.I., Campbell, R.T. and Holbrook, L. (2006) Using community-level correlates to evaluate nonresponse effects in a telephone survey. Public Opinion Quarterly, 70, 704–19. Johnson, T.P., O’Rourke, D., Burris, J. and Owens, L. (2002) Culture and survey nonresponse, in Survey Nonresponse (eds R.M. Groves, D.A. Dillman, J.L. Eltinge and R.J.A. Little), John Wiley & Sons, Inc., New York, pp. 55–70. Jowell, R. (1998) How comparative is comparative research? American Behavioural Scientist, 42(2), 168–77. Jowell, R. and Eva, G. (2009) Happiness is not enough: cognitive judgements as indicators of national wellbeing. Social Indicators Research, 91(3), 317–28. Jowell, R., Kaase, M., Fitzgerald, R. and Eva, G. (2007) The European Social Survey as a measurement model, in Measuring Attitudes Cross-nationally. Lessons from the European Social Survey (eds R. Jowell, C. Roberts, R. Fitzgerald and G. Eva), Sage Publications, London, pp. 1–31. Jowell, R., Roberts, C., Fitzgerald, R. and Eva, G. (eds) (2007) Measuring Attitudes Crossnationally. Lessons from the European Social Survey, Sage Publications, London. Kalton, G. and Kasprzyk, D. (1986) The treatment of missing survey data. Survey Methodology, 12(1), 1–16.
REFERENCES
313
Kaminska, O. (2009) Satisficing among reluctant respondents in a cross-national setting. Unpublished PhD thesis, Faculty of the Graduate College at the University of Nebraska, Lincoln, NE. Kaminska, O. and Billiet, J. (2007a) Satisficing among reluctant respondents in a cross-national context. Presentation at the European Survey Research Association (ESRA) Conference, Prague, Czech Republic. Kaminska, O. and Billiet, J. (2007b) Measuring satisficing in a face-to-face mode. Presentation at the Midwest AAPOR Conference (MAPOR), November 2007. Kaminska, O., Goeminne, B. and Swyngedouw, M. (2006) Satisficing in Early versus Late Responses to a Mail Survey, FSW, Sociologisch Onderzoeksinstituut, K.U. Leuven, Leuven. Keeter, S., Miller, C., Kohut, A. et al. (2000) Consequences of reducing nonresponse in a national telephone survey. Public Opinion Quarterly, 64, 125–48. Kish, L. (1965) Survey Sampling, John Wiley & Sons, Inc., New York. Kish, L. (1994) Multipopulation survey designs: five types with seven shared aspects. International Statistical Review, 62(2), 167–86. Knot, Ph.S. (2006) Using calibration weightings to adjust for nonresponse and coverage errors. Survey Methodology, 32(2), 133–42. Koch, A. (1993) Sozialer Wander als Artefakt unterschiedlicher Aussch€ opfung? Zum Einfluß von Ver€anderungen der Aussch€opfungsquote auf die Zeitreihen des ALLBUS. ZUMA Nachrichten, 33, 83–113. Koch, A. (1997) Teilnahmeverhalten beim ALLBUS 1994. Soziodemographische Determinanten von Erreichbarkeit, Befragungsf€ahigkeit und Kooperationsbereitschaft. K€ olner Zeitschrift f€ur Soziologie und Sozialpsychologie, 49, 99–122. Koch, A. and Blohm, M. (2006) Fieldwork details in the European Social Survey 2002/2003, in Conducting Cross-national and Cross-cultural Surveys, Papers from the 2005 Meeting of the International Workshop on Comparative Survey Design and Implementation (CSDI) (ed. J.A. Harkness), ZUMA-Nachrichten Spezial, 12, ZUMA, Mannheim, pp. 21–51. Koch, A., Blom, A., Fitzgerald, R. and Bryson, C. (2008a) Field Procedures in the European Social Survey Round 4: Enhancing Response Rates, European Social Survey, GESIS, Mannheim. Koch, A., Blom, A., Fitzgerald, R. and Bryson, C. (2008b) Round 4 Progress Reports from Survey Organisations, European Social Survey, GESIS, Mannheim. Koch, A., Blom, A., Stoop, I. and Kappelhof, J. (2009) Data collection quality assurance in cross-national surveys: the example of the ESS. MDA – Methoden, Daten, Analysen, Jg. 3, Heft 2, pp. 219–47. Kolsrud, K., Kalgraff Skjak, K. and Henrichsen, B. (2007) Free and immediate access to data, in Measuring Attitudes Cross-nationally. Lessons from the European Social Survey (eds R. Jowell, C. Roberts, R. Fitzgerald and G. Eva), Sage Publications, London, pp. 139–56. K€ orner, T. and Meyer, I. (2005) Harmonising socio-demographic information in household surveys of official statistics: experiences from the Federal Statistical Office, Germany, in Methodological Aspects in Cross-national Research (eds J.H.P. Hoffmeyer-Zlotnik and J.A. Harkness), ZUMA-Nachrichten Spezial, 11, ZUMA, Mannheim, pp. 149–62. Kreuter, F. and Kohler, U. (2009) Analyzing contact sequences in call record data. potential and limitations of sequence indicators for nonresponse adjustment in the European Social Survey. Journal of Official Statistics, 25(2), 203–26. Kreuter, F., Lemay, M. and Casas-Cordero, C. (2007) Using proxy measures of survey outcomes in post-survey adjustments: examples from the European Social Survey (ESS), in Proceedings of the Survey Research Methods Section, American Statistical Association.
314
REFERENCES
Kropf, M.E., Blair, J. and Scheib, J. (1999) The effect of alternative incentives on cooperation and refusal conversion in a telephone survey, in Proceedings of the 1999 American Association for Public Opinion Research Meeting. Krosnick, J.A. (1991) Response strategies for coping with the cognitive demands of attitude measurement in surveys. Applied Cognitive Psychology, 5, 213–36. Krosnick, J.A. (1999) Survey research. Annual Review of Psychology, 50, 537–67. Krosnick, J.A. and Alwin, D.F. (1987) An evaluation of a cognitive theory of response order effects in survey measurement. Public Opinion Quarterly, 51, 201–19. Krosnick, J.A., Miller, J.M. and Wedeking, J. (2003) Data quality of refusal conversions and call-backs. Paper presented at the annual meeting of the American Association for Public Opinion Research, Nashville, TN. Krosnick, J.A., Narayan, S.S. and Smith, W.R. (1996) Satisficing in surveys: initial evidence, in Advances in Survey Research (eds M.T. Braverman and J.K. Slater), San Francisco: JosseyBass, San Francisco, pp. 29–44. Laaksonen, S. and Chambers, R. (2006) Survey estimation under informative nonresponse with follow up. Journal of Official Statistics, 22(1), 81–95. Laurie, H., Smith, R. and Scott, L. (1999) Strategies for reducing nonresponse in a longitudinal survey. Journal of Official Statistics, 15(2), 269–82. Lavrakas, P., Bauman, S. and Merkle, D. (1992) Refusal report forms, refusal conversions and non-response bias. Paper presented at the American Association for Public Opinion Research. St Petersburg, Florida. Lee, S. (2006) Propensity score adjustment as a weighting scheme for volunteer panel web surveys. Journal of Official Statistics, 22(2), 329–49. Lee, S. and Valliant, R. (2008) Weighting telephone samples using propensity scores, in Advances in Telephone Survey Methodology (eds M. Lepkowski, C. Tucker, M. Brick et al.), John Wiley & Sons, Inc., Hoboken, NJ, pp. 170–83. Lee, S. and Valliant, R. (2009) Estimation of volunteer panel web surveys using propensity score adjustment and calibration adjustment. Sociological Methods & Research, 37(3), 319–43. Lievesley, D. (1983) Reducing unit non-response in interview surveys, in Proceedings of the Survey Research Methods Section, American Statistical Association, pp. 295–99. Lin, I.-F. and Schaeffer, N.C. (1995) Using survey participants to estimate the impact of nonparticipation. Public Opinion Quarterly, 59, 236–58. Lind, K., Johnson, T. and Parker, V. (1998) Telephone non-response: a factorial experiment of techniques to improve telephone response rates, in 1998 Proceedings of the Survey Research Methods Section, American Statistical Association, Alexandria, VA, pp. 848–50. Little, R.A. (1986) Survey nonresponse adjustments for estimates of means. International Statistical Review, 54, 139–57. Little, R.J.A. and Rubin, D.B. (1987) Statistical Analysis with Missing Data, John Wiley & Sons, Inc., New York. Little, R.J. and Vartivarian, S. (2005) Does weighting for nonresponse increase the variance of survey means? Survey Methodology, 31(2), 161–68. Loosveldt, G. (1999) The interviewer as an informant about the interview process, in Leading Survey and Statistical Computing into the New Millennium (eds R. Banks, C. Christie, J. Cirrall et al.), Association for Survey Computing, Edinburgh, UK, pp. 113–24. Loosveldt, G. and Carton, A. (2002) Utilitarian individualism and panel nonresponse. International Journal of Public Opinion Research, 14(4), 428–38.
REFERENCES
315
Loosveldt, G., Carton, A. and Billiet, J. (2004) Assessment of survey data quality: a pragmatic approach focused on interviewer tasks. International Journal of Market Research, 46(1), 65–82. Loosveldt, G. and Philippens, M. (2004) Modelling interviewer-effects in the European Social Survey. Paper presented at the International Conference on Social Science Methodology (RC33), Amsterdam. Loosveldt, G. and Sonck, N. (2008) An evaluation of the weighting procedures for an online access panel survey. Survey Research Methods, 2, 93–105. Loosveldt, G. and Storms, V. (2001) The relevance of respondents’ attitudes towards surveys. Paper presented at the 12th International Workshop on Household Survey Nonresponse, Oslo. Loosveldt, G. and Storms, V. (2003) Peilen in Vlaanderen. De houding van de Vlaming t.a.v. surveyonderzoek, in Vlaanderen gepeild! Studie gehouden te Brussel op 6 mei 2003, Ministerie van de Vlaamse Gemeenschap, Brussels, pp. 347–70. Loosveldt, G. and Storms, V. (2008) Measuring public opinions about surveys. International Journal of Public Opinion Research, 20(1), 74–89. Lyberg, L. (ed.) (2001) ‘Can a statistician deliver’ and comments. Journal of Official Statistics, 17(1), 1–127. Lyberg, L. et al. (2001) Summary Report from the Leadership Group (LEG) on Quality, SPC. Lyness, K.S. and Brumit Kropf, M. (2007) Cultural values and potential nonresponse bias: a multilevel examination of cross-national differences in mail survey response rates. Organizational Research Methods, 10(2), 210–24. Lynn, P. (2003a) PEDAKSI: methodology for collecting data about survey non-respondents. Quality & Quantity, 37, 239–61. Lynn, P. (2003b) Developing quality standards for cross-national survey research: five approaches. International Journal of Social Research Methodology, 6(4), 323–36. Lynn, P., Beerten, R., Laiho, J. and Martin, J. (2002a) Towards standardisation of survey outcome categories and response rate calculations. Research in Official Statistics, no. 1, 61–84. Lynn, P. and Clarke, P. (2001) Separating refusal bias and non-contact bias: evidence from UK national surveys. Working Papers of the Institute for Social and Economic Research, paper 2001-24, University of Essex, Colchester. Lynn, P. and Clarke, P. (2002) Separating refusal bias and non-contact bias: evidence from UK national surveys. Journal of the Royal Statistical Society: Series D (The Statistician), 51, 319–33. Lynn, P., Clarke, P., Martin, J. and Sturgis, P. (2002b) The effects of extended interviewer efforts on nonresponse bias, in Survey Nonresponse (eds R.M. Groves, D.A. Dillman, J.L. Eltinge and R.J.A. Little), John Wiley & Sons, Inc., New York, pp. 135–48. Lynn, P., Japec, L. and Lyberg, L. (2006) What’s so special about cross-national surveys? In Conducting Cross-national and Cross-cultural Surveys (ed. J.A. Harkness), ZUMANachrichten Spezial 12, ZUMA, Mannheim, pp. 7–20. Maitland, A., Casas-Cordero, C. and Kreuter, F. (2008) An exploration into the use of paradata for nonresponse adjustment in a health survey, in Proceedings of the 2008 AAPOR Conference. Section on Survey Research Methods, New Orleans, 15–18 May 2008, pp. 2250–5. Martin, E.A., Traugott, M.W. and Kennedy, C. (2005) A review and proposal for a new measure of poll accuracy. Public Opinion Quarterly, 69(3), 342–69.
316
REFERENCES
Mathiowetz, N.A., Couper, M.P. and Butler, D. (2000) The Impact of nonresponse in the American Travel Survey. Paper presented at the 11th International Workshop on Household Survey Nonresponse, Budapest, 2000. Matsuo, H., Billiet, J. and Loosveldt, G. (2009) Measurement and Correction of Non-response Bias Based on Non-response Surveys in Belgium, Norway, Switzerland and Poland. European Social Survey – Round 3, Working paper of the Centre for Sociological Research, CeSO/SM/2009-17, Leuven. McCrum, M. (2007) Going Dutch in Beijing. The International Guide to Doing the Right Thing, Profile Books, London. Merkle, D.A. and Edelman, M. (2002) Nonresponse in exit polls: a comprehensive analysis, in Survey Nonresponse (eds R.M. Groves, D.A. Dillman, J.L. Eltinge and R.J.A. Little), John Wiley & Sons, Inc., New York, pp. 243–58. Meuleman, B. and Billiet, J. (2005) Corrections for Nonresponse in the ESS Round 1: Weighting for Background Variables. A Simulation. Research Report CeSO, DA/2005-49. Miller, J.M. and Wedeking, J. (2003) The harder we try the worse it gets? Examining the impact of refusal conversions and high callback attempts on the quality of survey data. Paper presented at the 58th Annual Meeting of the American Association for Public Opinion Research, Nashville. Miller, J.M. and Wedeking, J. (2004) Measuring public opinion: examining the impact of refusal conversions and callbacks on data quality. Paper presented at the annual meeting of the Midwest Political Science Association, Chicago, IL. Miller, J.M. and Wedeking, J. (2006) Examining the impact of refusal conversions and high callback attempts on measurement error in surveys. Unpublished manuscript. Morton-Williams, J. (1993) Interviewer Approaches. Dartmouth Publishing, Aldershot. Mowen, J.C. and Cialdini, R.B. (1980) On implementing the door-in-the-face compliance technique in a business context. Journal of Marketing Research, 17, 253–58. Neller, K. (2005) Kooperation und Verweigerung: Eine Non-Response-Studie. ZUMA Nachrichten, 57, 9–36. Olson, K. (2006) Survey participation, nonresponse bias, measurement error bias, and total bias. Public Opinion Quarterly, 70, 737–58. Olson, K. (2007) An investigation of the nonresponse – measurement error nexus. Unpublished dissertation, Doctor of Philosophy, University of Michigan. Olson, K., Feng, C. and Witt, L. (2008) When do nonresponse follow-ups improve or reduce data quality? A meta-analysis and review of the existing literature. Paper presented at the International Total Survey Error Workshop, 1–4 June 2008. Research Triangle Park, NC. O’Muircheartaigh, C. and Eckman, S. (2007) Efficiency and bias in a two-phase field model for nonresponse. Paper presented at the 2007 ESRA conference, Prague. O’Shea, R., Bryson, C. and Jowell, R. (2003) Comparative Attitudinal Research in Europe, European Social Survey Deliverable 1. P€a€akk€onen, H. (1999) Are busy people under- or over-represented in national time budget surveys? Loisirs & Societe, 21(2), 573–82. Park, A. and Jowell, R. (1997) Consistencies and Differences in a Cross-national Survey, SCPR, London. Phelps, A. (2008) UK incentive experiment. Paper presented at the first ESS Field Directors Meeting. Mannheim, Germany, January 2008. Pickery, J. and Loosveldt, G. (2002) A multinomial analysis of interviewer effects on various components of unit nonresponse. Quality & Quantity, 36, 427–37.
REFERENCES
317
Platek, R. and S€arndal, C.-E. (2001) Can a statistician deliver? Journal of Official Statistics, 17(1), 1–20. Potthoff, R., Manton, K. and Woodbury, M. (1993) Correcting for nonavailability bias in surveys by weighting based on number of callbacks. Journal of the American Statistical Association, 88, 1192–207. Purdon, S., Campanelli, P. and Sturgis, P. (1999) Interviewers’ calling strategies on face-to-face interview survey. Journal of Official Statistics, 15(2), 199–216. R€assler, S., Rubin, D.B. and Schenker, N. (2008) Incomplete data: diagnosis, imputation, and estimation, in International Handbook of Survey Methodology (eds E.D. de Leeuw, J. Hox and D. Dillman), Lawrence Erlbaum Associates, New York, pp. 370–86. Retzer, K.F. and Schipani, D. (2005) Refusal conversion: monitoring the trends. Survey Research (Newsletter of the Survey Research Laboratory), 36(3), 1–3. Retzer, K.F., Schipani, D. and Cho, Y.I. (2005) Refusal conversion: monitoring the trends, in 2004 Proceedings of the Survey Research Methods Section (CD-Rom), American Statistical Association, Alexandria, VA, pp. 4984–90. Rizzo, L., Kalton, G. and Brick, M. (1996) A comparison of some weighting adjustment methods for panel nonresponse. Survey Methodology, 22(1), 43–53. Roberts, C., Eva, G. and Widdop, S. (2008) Assessing the Demand and Capacity for Mixing Modes of Data Collection on the European Social Survey: Final Report on the Mapping Exercise, City University, London. Rogelberg, S.G., Conway, J.M., Sederburg, M.E. et al. (2003) Profiling active and passive nonrespondents to an organizational survey. Journal of Applied Psychology, 88 (6), 1104–14. Rogelberg, S.G., Fisher, G.G., Maynard, D.C. et al. (2001) Attitudes toward surveys: development of a measure and its relationship to respondent behavior. Organizational Research Methods, 4(1), 3–25. Romans, F. and Kotecka, M. (2007) European Union Labour Force Survey. annual results 2006. Eurostat Data in Focus, 10/2007. Rosenbaum, P.R. and Rubin, D.B. (1983) The central role of the propensity score in observational studies for causal effects. Biometrika, 70(1), 41–55. Rosenbaum, P.R. and Rubin, D.B. (1984) Reducing bias in observational studies using subclassification on the propensity score. Journal of the American Statistical Association, 79, 516–24. Rubin, D.B. (1986) Multiple Imputation for Survey Nonresponse, John Wiley & Sons, Inc., New York. Safir, A., Steinbach, R., Triplett, T. and Wang, K. (2002) Effects on survey estimates from reducing nonresponse, in Proceedings of the Conference: Strengthening Our Community, AAPOR 2002, pp. 3024–29. Saris, W.E. and Gallhofer, I.N. (2007a) Design, Evaluation, and Analysis of Questionnaires for Survey Research, Wiley Series in Survey Methodology, John Wiley & Sons, Inc., New York. Saris, W.E. and Galhoffer, I.N. (2007b) Can questions travel successfully? In Measuring Attitudes Cross-nationally. Lessons from the European Social Survey (eds R. Jowell, C. Roberts, R. Fitzgerald and G. Eva), Sage Publications, London, pp. 53–77. Schmeets, H. and Janssen, J.P.G. (2002) Using national registrations to correct for selective non-response: political preference of ethnic groups, in Proceedings of Statistics Canada Symposium 2001. Achieving Data Quality in a Statistical Agency: A Methodological Perspective.
318
REFERENCES
Schmeets, H. and Michiels, J. (2003) Het effect van non-respons onder allochtonen. Bevolkingstrends, 4e kwartaal 2003, Centraal Bureau voor de Statistiek. Schnell, R. (1997) Nonresponse in Bev€olkerungsumfragen, Ausmaß, Entwicklung und Ursachen, Leske und Budrich, Opladen. Schonlau, M., van Soest, A.H.O., Kapteyn, A. and Couper, M. (2006) Selection Bias in Web Surveys and the Use of Propensity Scores. RAND Working Paper No. WR-279. Available at SSRN: http://ssrn.com/abstract¼999809 Schouten, B. and Cobben, F. (2006) R-indices for the comparison of different fieldwork strategies and data collection modes. Paper presented at the International Household Survey Nonresponse Workshop, Omaha. Schwarz, N. and Sudman, S. (1992) Context Effects in Social and Psychological Research, Springer-Verlag, New York. Siegel, P., Chromy, J. and Copello, E. (2008) Propensity models versus weighting cell approaches to nonresponse adjustment: a methodological comparison. Paper presented at the annual meeting of the American Association for Public Opinion Association, Fontainebleau Resort, Miami Beach, FL. Singer, E. (2002) The use of incentives to reduce nonresponse in household surveys, in Survey Nonresponse (eds R.M. Groves, D.A. Dillman, J.L. Eltinge and R.J.A. Little), John Wiley & Sons, Inc., New York, pp. 163–77. Singer, E., Mathiowetz, N.A. and Couper, M.P. (1993) The impact of privacy and confidentiality concerns on survey participation: the case of the 1990 Census. Public Opinion Quarterly, 57, 465–82. Singer, E., Van Hoewyk, J. and Maher, M.P. (1998) Does the payment of incentives create expectation effects? Public Opinion Quarterly, 62, 152–64. Singer, E., Van Hoewyk, J., Gebler, N. et al. (1999) The effects of incentives on response rates in interviewer-mediated surveys. Journal of Official Statistics, 15(2), 199–216. Singer, E., Van Hoewyk, J. and Neugebauer, R.J. (2003) Attitudes and behavior. the impact of privacy and confidentiality concerns on participation in the 2000 Census. Public Opinion Quarterly, 67, 368–84. Skjak, K. Kallgraf and Harkness, J. (2002) Data collection methods, in Cross-cultural Survey Methods (eds J. Harkness, F. van de Vijver and P. Mohler). John Wiley & Sons, Inc., Hoboken, NJ, pp. 179–93. Smeets, I. (1995) Facing another gap: an exploration of the discrepancies between voting turnout in survey research and official statistics. Acta Politica, 30, 307–34. Smith, T.W. (1983) The hidden 25 percent: an analysis of nonresponse on the 1980 General Social Survey. Public Opinion Quarterly, 47, 386–404. Smith, T.W. (1984) Estimating nonresponse bias with temporary refusals. Sociological Perspectives, 27(4), 473–89. Smith, T.W. (1995) Trends in non-response rates. International Journal of Public Opinion Research, Research Notes, 7(2), 157–71. Smith, T.W. (2002) Developing nonresponse standards, in Survey Nonresponse (eds R.M. Groves, D.A. Dillman, J.L. Eltinge and R.J.A. Little), John Wiley & Sons, Inc., New York, pp. 27–40. Smith, T.W. (2003) A review of methods to estimate the status of cases with unknown eligibility. Report prepared for the AAPOR Standard Definitions Committee. Presented to AAPOR, Phoenix, 2004. Version 1.1. Smith, T.W. (2007) Survey non-response procedures in cross-national perspective: the 2005 ISSP Non-Response Survey. Survey Research Methods, 1, 45–54.
REFERENCES
319
Smith, T.W. (2009) The multi-level integrated database approach for detecting and adjusting for nonresponse bias. Abstract, Third Conference of the European Survey Research Association, Warsaw, 29 June – 3 July. Stinchcombe, A.L., Jones, C. and Sheatsley, P. (1981) Nonresponse bias for attitude questions. Public Opinion Quarterly, 45, 359–75. Stocke, V. and Langfeldt, B. (2004) Effects of survey experience on respondents’ attitude towards surveys. Bulletin de Methodologie Sociologique, 81, 5–32. Stoop, I. (2004) Surveying nonrespondents. Field Methods, 16(1), 23–54. Stoop, I. (2005) The Hunt for the Last Respondent. Nonresponse in Sample Surveys, Social and Cultural Planning Office of the Netherlands, The Hague. Stoop, I. (2007) No time, too busy: time strain and survey cooperation, in Measuring Meaningful Data in Social Research (eds G. Loosveldt, M. Swyngedouw and B. Cambre), Acco, Leuven, pp. 301–14. Stoop, I. (2008) Nonrespons bij bevolkingsonderzoek: een weerbarstige materie, in Vroeger was het beter (ed. P. Schnabel), Nieuwjaarsuitgave 2008, SCP, Den Haag, pp. 52–54. Stoop, I. (2009) Waarom mannen meer meedoen aan websurveys, in M/V (ed. P. Schnabel), Nieuwjaarsuitgave 2009, SCP, Den Haag, pp. 96–100. Stoop, I., Devacht, S., Billiet, J. et al. (2003) The development of a uniform contact description form in the ESS. Paper presented at the 14th International Workshop on Household Survey Nonresponse, Leuven. Sturgis, P. (2004) Analysing complex survey data: clustering, stratification and weights. Social Research Update, Issue 43, Autumn. Sturgis, P. and Campanelli, P. (1998) The scope for reducing refusals in household surveys: an investigation based on transcripts of tape-recorded doorstep interactions. Journal of the Market Research Society, 40(2), 121–39. Symons, K., Matsuo, H., Beullens, K. and Billiet, J. (2008) Response Based Quality Assessment in the ESS – Round 3. An Update for 19 Countries, CeSO, K.U. Leuven. Teitler, J.O., Reichman, N.E. and Sprachman, S. (2003) Costs and benefits of improving response rates for a hard-to-reach population. Public Opinion Quarterly, 67, 126–38. Tourangeau, R. and Smith, T.W. (1996) Asking sensitive questions: the impact of data collection mode, question format, and question context. Public Opinion Quarterly, 60, 275–304. Triplett, T. (2002) What is Gained From Additional Call Attempts & Refusal Conversion and What are the Cost Implications? Research Report. Washington DC, The Urban Institute. Available at: http://mywebpages.comcast.neet/triplett13/tncpap.pdf Triplett, T. (2006) 2002 NASF Nonresponse Analysis, Methodology Reports, Report No. 7, The Urban Institute, Washington, DC. Triplett, T. and Abi-Habib, N. (2005) Determining the probability of selection for a telephone household in a random digit dial sample design is becoming increasingly more difficult, in 2002 NSAF Collection of Papers, The Urban Institute, Washington, DC, pp. 1-11–1-17. Triplett, T., Blair, J., Hamilton, T. and Kang, Y. (1996) Initial cooperators vs. converted refusals: are there response behaviour differences? In Proceedings of the Survey Research Methods Section, American Statistical Association, Alexandria, VA, pp. 1038–41. Triplett, T., Safir, A., Wang, K. et al. (2002) Using a short follow-up survey to compare respondents and nonrespondents, in Proceedings of the Survey Research Methods Section, American Statistical Association, Alexandria, VA. Triplett, T., Scheib, J. and Blair, T. (2001) How long should you wait before attempting to convert a telephone refusal? Proceedings of the Survey Research Methods Section, American Statistical Association, Alexandria, VA.
320
REFERENCES
V€ais€anen, P. (2002) Diary nonresponse in the Finnish Time Use Survey. Paper presented at the 13th International Workshop on Household Survey Nonresponse, Copenhagen. Van Ingen, E., Stoop, I. and Breedveld, K. (2009) Nonresponse in the Dutch Time Use Survey: strategies for response enhancement and bias reduction. Field Methods, 21(1), 69–90. Van de Vijver, F.J.R. (2003) Bias and substantive analysis, in Cross-cultural Survey Methods (eds J.A. Harkness, F.J.R. van de Vijver and P.Ph. Mohler), John Wiley & Sons, Inc., Hoboken, NJ, pp. 207–33. Vehovar, V. (2006) Weighting in ESS Round 1. University of Ljubljana, Faculty of Social Sciences. Available at: http://vasja.ris.org Vehovar, V. (2007) Non-response bias in the European Social Survey, in Measuring Meaningful Data in Social Research (eds G. Loosveldt, M. Swyngedouw and B. Cambre), Acco, Leuven, pp. 335–56. Vehovar, V. (2008) Problems with weighting in ESS Round 1, Round 2, Round 3. Research Note, Faculty of Social Sciences, University of Ljubljana. Vehovar, V. and Zupanic, T. (2007) Weighting in the ESS Round 2. University of Ljubljana, Faculty of Social Sciences. Available at: http://vasja.ris.org Verhagen, J. (2008) De ongrijpbare respondent, Netherlands Institute for Social Research/SCP, The Hague. Voogt, R. (2004) ‘I’m not interested’. Nonresponse bias, response bias and stimulus effects in election research. Academisch Proefschrift, Amsterdam, Universiteit van Amsterdam, Amsterdam. Voogt, R.J.J., Saris, W.E. and Niem€oller, B. (1998) Non-response, and the gulf between the public and the politicians. Acta Politica, 33, 250–80. Warriner, K., Goyder, J., Gjertsen, H. et al. (1996) Charities, no, lotteries, no, cash, yes: main effects interactions in a Canadian incentives experiment. Public Opinion Quarterly, 60, 542–62. Weijters, B. (2006) Response Styles in Consumer Research, Vlerick Leuven-Gent Management School. Zabal, A. and Wohn, K. (2008) Feedback from the 1st ESS Field Directors Meeting. ESSi Quality Feedback Paper 1, European Social Survey, GESIS, Mannheim. Zuzanek, J. (1999) Non-response in time use surveys: do the two ends meet? Loisirs & Societe, 21(2), 547–9.
Glossary address sample The sample units are addresses of dwellings: either they are taken from an existing list, or lists are constructed by interviewer in selected streets. advance letter (synonym: survey letter) A letter sent to a target respondent announcing that an interviewer will call, and providing information on the topic and procedures of the survey (see Chapters 3–5). auxiliary information A set of variables that have been measured in the survey and for which information on the population (or the complete sample) is available (see Bethlehem, 2009). background variables The socio-demographic and socio-economic characteristics of households and individuals. Central Coordinating Team (CCT) The management team responsible for the design and coordination of the European Social Survey. central specifications The central Specifications for Participating Countries, the fieldwork guidelines for ESS (see Chapter 3). Computer Assisted Personal Interviewing (CAPI) A method of data collection in which an interviewer uses a computer to display questions and accept responses during a face-to-face interview (OECD).1 Computer Assisted Telephone Interviewing (CATI) A method of data collection by telephone, with questions displayed on a computer and responses entered directly into a computer (OECD). contact form A form to be completed by an interviewer, comprising information on timing and outcomes of each call (call records) and neighbourhood and dwelling information (see Appendix 3.1). contactability The ease with which target respondents can be reached: usually measured by the number of calls to first contact. 1
Entries marked ‘OECD’ are derived from the OECD glossary of statistical terms, which is available at http://stats.oecd.org/glossary/
Improving Survey Response: Lessons learned from the European Social Survey Ineke Stoop, Jaak Billiet, Achim Koch and Rory Fitzgerald Ó 2010 John Wiley & Sons, Ltd
322
GLOSSARY
cooperation Participation, willingness to be interviewed. cooperative respondent A respondent who is willing to cooperate at the first request: sometimes compared with initially reluctant respondent (a person who refuses at first) (see Chapters 7 and 8). core variables (synonym: target variables) The central questions in a survey, the main outcome variables and the main dependent variables. disposition codes The outcome of an attempt to interview a target respondent (noncontact, refusal, interview). The final disposition code reflects the outcome for each sample unit. doorstep questionnaire (DSQ) A short questionnaire comprising target questions: refusals can be asked to answer these questions on the doorstep, at the time of the refusal (see Chapter 8). eligible Meets the screening criteria of the sample. European Social Survey (ESS) See www.europeansocialsurvey.org (see also Chapter 3). face-to-face (F2F) In-person interviewing, usually at the home of the respondent. follow-up survey A survey among nonrespondents and, ideally, respondents or a sample from the latter to collect information on core variables (see Chapter 8). functional equivalence A measurement method that is not identical in different countries but should provide the same results. household sample The sample units are households, selected from lists of households. incentives Small gifts for respondents: incentives can be unconditional (all target respondents) or conditional (handed out on completion), and they can be monetary or nonmonetary (see Chapters 4 and 5). individual sample The sample units are named individuals, selected from lists of persons. ineligible Does not meet the screening criteria of the sample (for instance, business units in a list of residential addresses, or noncitizens in a survey among citizens). initial refusal (synonym: reluctant respondent) Refusal to cooperate, followed by participation at a later request. input harmonization Strict comparability of questionnaires and the survey design. interview mode (synonym: survey mode) The way in which the interview is conducted (F2F, telephone). Missing at Random (MAR) Missingness (or nonresponse) at random given covariates: groups with unequal response rates homogeneous with respect to target variables. Missing Completely at Random Missingness (or nonresponse) not following any pattern; that is, not related to background or target variables. National Coordinator (NC) The intermediary between the CCT and the national survey agency: responsible for conducting national survey according to central specifications, and advises on questionnaire design. neighbourhood data Information on neighbourhoods and dwellings, collected by interviewer for all target respondents (see Chapter 8).
GLOSSARY
323
Noncontact Failure to establish contact with the target respondent or a household member. nonresponse bias The bias that arises when those who do not respond have different survey outcomes than those who do respond: the size depends on the size of the difference and the nonresponse rate. not able The target respondent is not able to participate because of language problems, physical or mental problems, or for other reasons. Not Missing at Random (NMAR) (synonym: nonignorable nonresponse) Missingness (or nonresponse) when the core variable is related to response behaviour not mediated by known covariates. observational data (synonym: neighbourhood data) Information on neighbourhoods and dwellings, collected by the interviewer for all target respondents. odds ratios The ratio of the odds of an event occurring in one group to the odds of it occurring in another group, or to a sample-based estimate of that ratio: it is the basis of logistic regression. optimal comparability (synonym: functional equivalence) The operational aim of a cross-national survey; that is, to minimize methodological dfferences that could be confounded with substantive differences between countries. output harmonization The opposite of input harmonization: to provide a common, cross-nationally agreed definition for a variable and then leave it to each participating country to decide on how this variable is collected. paper and pencil interviewing (PAPI) A method of data collection in which an interviewer fills in responses on a paper questionnaire during a face-to-face interview (OECD). paradata Data about specific sample units on a case-by-case basis, such as the timing and outcomes of contact attempts, the outcomes of contacts with target respondents, interviewer characteristics and interview duration. participating countries (synonym: ESS countries) European countries that have participated in one or more rounds of the ESS (see Table 3.1). post-stratification (PS) Stratification after data collection using qualitative auxiliary data (e.g. age classes, sex and educational level), used to calculate nonresponse weights. propensity scores Propensity score weighting is a quasi-experimental correction strategy in which it is assumed that units in different groups (e.g. respondents and nonrespondents) have the same distribution on a number of auxiliary variables. proxy refusal Refusal by someone else in the household, other than the target respondent. recruitment mode Recruitment is the first phase in a F2F survey in which the cooperation of target respondents is sought: it may differ from the interview mode. refusal Unwillingness to cooperate in a survey: reasons for refusal could be situational, topic related or sponsor related. refusal conversion An attempt to persuade an initial refusal to cooperate after all, usually made by a new interviewer.
324
GLOSSARY
reluctant respondent (synonym: soft refusal) A respondent who refused initially, but was converted: sometimes compared with cooperative respondent. response rate The number of interviews divided by the number of eligible target respondents. sample unit (synonym: target respondent) Any of the units constituting a specified sample. sampling frame A list of all members of a population used as a basis for sampling. target respondent (synonym: sample unit) An individual in the sample selected to participate in the survey (OECD). target variables (synonym: core variables) The central questions in a survey, the main outcome variables and the main dependent variables.
Index Ability to cooperate, 16–17, 91, 95, 98–103, 158–9, 185 Accessibility, see also Contactability, 13, 15, 116, 124 Accuracy, 29–30 Adjusting for nonresponse, see Weighting Advance letters 21, 42, 63, 85–6, 104, 107, 296 Age of target person, 13–4, 18–20, 123–4, 154–6, 189–91, 214–7, 231–2, 258–60, 262–5 Apartment building, living in, see also Dwelling type, 13–5, 244–9, 249–52 At-home behaviour, 13–5, 119–21 Auxiliary data, 32–34, 212–13, 251, 266–7, 280–1, 301–2 Background variables, see Sociodemographic characteristics Basic questions, see Key questionnaire Bias, nonresponse, 29–32, 103–4 assessing, 207–13 correcting for, 33–7, 214–78 standardized, 220–3 Brochure, 21, 86, 88, 104, 107 Burden, cognitive, 24, 25, 122, 124 fieldwork, 18, 56, 68, 70, 117, 277 response, 167, 209, 257, 277 survey, see fieldwork burden
Call, 13–16 records, see Contact form scheduling, see Timing of calls Calls, number of, 13, 15, 18, 60, 135–8, 296–7 timing of, 13–16, 60–1, 120–1, 138–142 Central Coordinating Team (CCT), 48–69, 133 Cohesion, social, 20, 125 Competence, political, 28, 223–6, 283–4 Computer Assisted Personal Interviewing (CAPI), 44, 76–7 Computer Assisted Telephone Interviewing (CATI), 255, 257, 267, 271 Confidentiality, see also Privacy, 24, 26–7, 35, 38, 117 Contact attempt, see call Contact form (ESS), 67–74, 91, 116–17, 129–30, 157, 176–7, 301–3 Contact, ease of, see Contactability Contactability, 13–6, 117–22, 135–42 Contracting, see also House effects, 57–8 Cooperation, estimation of future, 72, 176–85, 238–9, 242 Cooperation rate, see also Willingness to cooperate, 17–29, 59–60, 115–16, 122–9, 131, 144–50, 185–8, 202–3 Cooperators, double, 267–72, 276, 280 Core information on nonrespondents, 214, 252–78
Improving Survey Response: Lessons learned from the European Social Survey Ineke Stoop, Jaak Billiet, Achim Koch and Rory Fitzgerald Ó 2010 John Wiley & Sons, Ltd
326 Core variables, see also Target variables, 37, 209 Data collection mode, 18, 52, 76–7, 253–5, 267, 271, 295 Doorstep interaction, see Interviewer respondent interaction Doorstep Questionnaire Survey (DQS), 252–3 Dwelling type, 13, 74, 243–52 Education of target person, 20, 154–6, 189–91, 214–29, 257–76, 280–1, 287–8 Eligibility, 12, 14–15, 63–9, 90–3, 96–7 Environment, social, 13–15, 17–18, 125 Error, 4 measurement, 191–9 nonresponse, see Nonresponse bias sampling, 220 total survey, 295 Ethnic minority groups, see Survey participation of immigrants Ethnic threat, perceived, see also Attitudes towards immigrants, 223–7, 282–3 European Research Area, 48 European Science Foundation (ESF), 40, 48 European Social Survey (ESS), 39–70 Family composition of target person, 13–4, 18–20, 119, 123–4, 189–91, 231–2, 257–69 Fieldwork agency, see Fieldwork organization Fieldwork efforts, see Response-enhancing strategies (ESS) Fieldwork organization, see also House effects, 18, 57, 238 Follow-up survey, see also Nonresponse Survey (NRS), 37, 210, 253 Functional equivalence, see Optimal comparability Gold standard, 209, 213, 215, 228, 242, 251, 278
INDEX Harmonization, input, 11, 20, 67, 297 models, 10–11 House effects, 22, 57 Household composition, see Family composition Immigrants, attitudes towards, 155–6, 220–5, 237, 264, 271, 273, 282 survey participation of, 20, 125 Impediments, physical, 13–15, 118, 120 Incentives, 18, 21–2, 63, 85–7, 102–4, 107–8, 166–7, 255–7 Ineligible, see Eligibility Interviewer, assignment, 58, 62 briefing, see Interviewer training experience, 13, 18, 23, 62–3, 83–4, 104, 107–8, 202–3 numbers, 78–9 pay, see Interviewer remuneration remuneration, 83–5, 104, 107, 110 respondent interaction, 24–9, 38 training, 13, 16, 18, 22–3, 61–2, 83–5, 107, 110, 298 Isolation, social, 20, 125–6, 233, 236, 240–2, 286 Key questionnaire, 37, 252–78 Key survey items, see core variables Language problems, 16–7, 20, 41, 65, 91–2, 185, 257 Leaflet, see brochure Lifestyle, 122, 127, 139 Litter/vandalism, 74, 203, 243–52 Missing at Random (MAR), 32–3, 207 Missing Completely at Random (MCAR), 31, 33 Monitoring fieldwork (ESS), 64 National Coordinator (NC), 48–50 National Technical Summary (NTS), 51, 66, 76 Neighbourhood characteristics, see Observable data, 243–52
INDEX Noncontact rate, 59–60, 64, 94–5, 97–8, 103, 133–5, 137–8, 299 Non-ignorable nonresponse, see Not Missing at Random Nonresponse survey (NRS), 252–3 Not Missing at Random (NMAR), 32, 156, 209, 228 Observable data, 243–52 Odds ratio, 234 Optimal comparability, 5–8, 11, 20, 51–4, 59 Opt-out system, see also Population register, 65, 101, 109, 131,144 Paper and Pencil Interviewing (PAPI), 12, 76–7, 254 Paradata, 35, 116–17, 201–11, 302–3 Participating countries (ESS), 45–7 Participation, political, 231, 233, 236–7, 284 Participation, social, see also Social isolation, 126, 259–60, 262–6, 269, 272–4, 280 Penalty clauses, 22–3 Politics, interest in, 126, 155–6, 220–2, 260–6, 269, 272–4 Population register, see also Opt-out system, 34, 56, 120, 163, 207 Precision, 29–33, 57, 118, 212, 217–9, 301 Privacy, see also Confidentiality 18, 20, 150–2, 163, 243, 274, 290 Quality approaches, 10–13 Quota sampling, 16, 23, 40, 44, 55, 299 Raking, 208, 215 Recruitment mode, 13, 18, 21, 130–3 Reducing nonresponse, see responseenhancing strategies Refusal, conversion, 161–203 new interviewer, 35–6, 167, 178–86, 188 success factors, 164–7 timing of attempts, 182–8 final, 152, 181, 182, 230, 237–8, 241, 242, 248–51, 262
327 hard, 176–7, 181, 182, 184, 196–7, 241, 255, 300–1 initial, 26–7, 35–6, 151–2, 161–203, 229–43, 245–51, 257–63, 276–9, 299–300 proxy, 27, 177–81, 183, 187, 239, 257 rate, 64, 95–8, 103, 152, 158–9 reasons for, 24–29, 150–2 soft, 36, 171, 178, 196, 201, 238, 240, 257, 297, 300 Reissue, 35–6, 62–3, 171, 173, 178–82, 242–3, 279 Representativeness, 32, 59, 154–7, 168, 189–91, 201, 297, 300 Respondent, cooperative, 169–70, 181–2, 191–9, 249, 257–76, 301 immediate, see Cooperative respondent reluctant, see Refusal conversion Response code, final, 91 Response easiness, 191–7 Response rate, calculation, 12, 66, 144 target, 60, 93, 173, 198 trend, 2–3, 106–13 Response-enhancing strategies (ESS), 20–3, 60–3, 85–7, 102–13 Safe, feeling, 14, 233–8, 260, 269, 272–4, 286 Sample size, 30–2 effective, 57, 78–80 Sample unit, see Target person Sampling design and procedures (ESS), 55–7, 77, 90, 129–31 Satisficing, 4, 168–70, 191–9, 201 Sex of target person, 14, 19–20, 154–6, 215–29, 231, 240–2, 268 Situational factors, 29, 34, 127–9, 144, 150, 200 Socio-demographic characteristics, see also post-stratification weighting, 13–5, 19–20, 122–6, 153–6, 168–9, 189–91 Socio-economic status, 18, 20, 124–5, 189–91 Specification for participating countries (ESS), 49–51
328 Standardization, 10–12, 39–42, 52, 55, 87, 295–7 Subgroup approach, 207, 209 Substitution, 55, 60, 296 Survey, attitudes, see also Trust in surveys, 18, 26–29, 150–2, 183–6, 275, 277, 280, 302 climate, 24, 44, 60, 102, 252, 290 costs, 81–2, 302 design, 20–3 ethics, 37–8, 199–200 implementation (ESS), 75–88 organization, see Fieldwork organization scarcity, 6, 18, 25, 116, 157 sponsor, 9, 17, 18 Tailoring interviewer approach, 14, 21, 131, 202 Target person, 10, 13–17, 56 Target population, 16, 31, 55, 63, 65, 253 Target variables, see also core variables, 212–13 Time concerns, 126, 150–2, 182–6 Topic, survey, 17–18, 24–6, 29–32, 42–3, 86, 122–3, 126–9, 206, 302 European Social Survey, 43 nonresponse modules (ESS), 286–91 rotating modules (ESS), 42–3, 219
INDEX Translation, 48–50, 52–3, 67–8, 295 Trust, in institutions, 29, 128–9, 155–6, 234, 237, 241–2 in politicians, 252, 271, 272–4, 290 in surveys, see also Survey attitudes, 34, 151, 179, 257, 287 of strangers, 19–21, 25, 122, 125 social, 43, 155–6, 220, 234, 264, 271–3, 285, 289 Urbanicity, 18–20, 119–20, 125–6, 154–5, 189–191 Variance inflation factor (VIF), 217–9, 226 Visit, see call Voluntary work, 126, 128, 271–4, 280, 289, 302 Weight, design, 3, 55–6, 214–8 final, 215–18, 224–7 Weighting, 212 post-stratification, 214–29, 301 propensity score, 260–73 Willingness to cooperate, see also Cooperation rate, 17–29, 122–9, 142–152
WILEY SERIES IN SURVEY METHODOLOGY Established in Part by Walter A. Shewhart and Samuel S. Wilks Editors: Graham Kalton, Mick P. Couper, Lars Lyberg, J. N. K. Rao, Norbert Schwarz, Christopher Skinner A complete list of the titles in this series appears at the end of this volume. WILEY SERIES IN SURVEY METHODOLOGY Established in Part by Walter A. Shewhart and Samuel S. Wilks Editors: Graham Kalton, Mick P. Couper, Lars Lyberg, J. N. K. Rao, Norbert Schwarz, Christopher Skinner The Wiley Series in Survey Methodology covers topics of current research and practical interests in survey methodology and sampling. While the emphasis is on application, theoretical discussion is encouraged when it supports a broader understanding of the subject matter. The authors are leading academics and researchers in survey methodology and sampling. The readership includes professionals in, and students of, the fields of applied statistics, biostatistics, public policy, and government and corporate enterprises. ALWIN · Margins of Error: A Study of Reliability in Survey Measurement *BIEMER, GROVES, LYBERG, MATHIOWETZ, and SUDMAN · Measurement Errors in Surveys BIEMER and LYBERG · Introduction to Survey Quality BRADBURN, SUDMAN, and WANSINK ·Asking Questions: The Definitive Guide to Questionnaire Design—For Market Research, Political Polls, and Social Health Questionnaires, Revised Edition BRAVERMAN and SLATER · Advances in Survey Research: New Directions for Evaluation, No. 70 Chambers and Skinner (editors · Analysis of Survey Data COCHRAN · Sampling Techniques, Third Edition Conrad and Schober · Envisioning the Survey Interview of the Future COUPER, BAKER, BETHLEHEM, CLARK, MARTIN, NICHOLLS, and O’REILLY (editors) · Computer Assisted Survey Information Collection COX, BINDER, CHINNAPPA, CHRISTIANSON, COLLEDGE, and KOTT (editors) · Business Survey Methods *DEMING · Sample Design in Business Research DILLMAN · Mail and Internet Surveys: The Tailored Design Method GROVES and COUPER · Nonresponse in Household Interview Surveys GROVES · Survey Errors and Survey Costs GROVES, DILLMAN, ELTINGE, and LITTLE · Survey Nonresponse GROVES, BIEMER, LYBERG, MASSEY, NICHOLLS, and WAKSBERG · Telephone Survey Methodology GROVES, FOWLER, COUPER, LEPKOWSKI, SINGER, and TOURANGEAU · Survey Methodology *HANSEN, HURWITZ, and MADOW · Sample Survey Methods and Theory, Volume 1: Methods and Applications *HANSEN, HURWITZ, and MADOW · Sample Survey Methods and Theory, Volume II: Theory
HARKNESS, van de VIJVER, and MOHLER · Cross-Cultural Survey Methods KALTON and HEERINGA · Leslie Kish Selected Papers KISH · Statistical Design for Research *KISH · Survey Sampling KORN and GRAUBARD · Analysis of Health Surveys LEPKOWSKI, TUCKER, BRICK, DE LEEUW, JAPEC, LAVRAKAS, LINK, and SANGSTER (editors) · Advances in Telephone Survey Methodology LESSLER and KALSBEEK · Nonsampling Error in Surveys LEVY and LEMESHOW · Sampling of Populations: Methods and Applications, Fourth Edition LYBERG, BIEMER, COLLINS, de LEEUW, DIPPO, SCHWARZ, TREWIN (editors) · Survey Measurement and Process Quality MAYNARD, HOUTKOOP-STEENSTRA, SCHAEFFER, VAN DER ZOUWEN · Standardization and Tacit Knowledge: Interaction and Practice in the Survey Interview PORTER (editor) · Overcoming Survey Research Problems: New Directions for Institutional Research, No. 121 PRESSER, ROTHGEB, COUPER, LESSLER, MARTIN, MARTIN, and SINGER (editors) · Methods for Testing and Evaluating Survey Questionnaires RAO · Small Area Estimation REA and PARKER · Designing and Conducting Survey Research: A Comprehensive Guide, Third Edition SARIS and GALLHOFER · Design, Evaluation, and Analysis of Questionnaires for Survey Research Sa¨rndal and Lundstro¨m · Estimation in Surveys with Nonresponse SCHWARZ and SUDMAN (editors) · Answering Questions: Methodology for Determining Cognitive and Communicative Processes in Survey Research SIRKEN, HERRMANN, SCHECHTER, SCHWARZ, TANUR, and TOURANGEAU (editors) · Cognition and Survey Research SUDMAN, BRADBURN, and SCHWARZ · Thinking about Answers: The Application of Cognitive Processes to Survey Methodology UMBACH (editor) · Survey Research Emerging Issues: New Directions for Institutional Research No. 127 VALLIANT, DORFMAN, and ROYALL · Finite Population Sampling and Inference: A Prediction Approach