VIEWPO1,NTS AND CONTROVERSIES IN SENSORY SCIENCE AND CONSUMER PRODUCT TESTING
F
N
PUBLICATIONS IN FOOD SCIENCE AND NUTRITION P
Books VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE, H.R. Moskowitz er al. PIONEERS IN FOOD SCIENCE, VOL. 2, J.J. Powers DRY-CURED MEAT PRODUCTS, F. ToldriI VEROCYTOTOXIGENIC E. coli, G. Duffy, P. Garvey and D.A. McDowell OPEN DATING OF FOODS, T.P. Labuza and L.M. Szybist NITRITE CURING OF MEAT: N-NITROSAMINE PROBLEM, R.B. Pegg and F. Shahidi DICTIONARY OF FLAVORS, D.A. DeRovira FOOD SAFETY: THE IMPLICATIONS OF CHANGE, J.J. Sheridan et al. FOOD FOR HEALTH IN THE PACIFIC RIM, J.R. Whitaker et al. DAIRY FOODS SAFETY: 1995-1996, A COMPENDIUM, E.H. Marth OLIVE OIL, SECOND EDITION, A.K. Kiritsakis MULTIVARIATE DATA ANALYSIS, G.B. Dijksterhuis NUTRACEUTICALS: DESIGNER FOODS 111. P.A. Lachance DESCRIPTIVE SENSORY ANALYSIS IN PRACTICE, M.C. Gacula, Jr. APPETITE FOR LIFE: AN AUTOBIOGRAPHY, S.A. Goldblith HACCP: MICROBIOLOGICAL SAFETY OF MEAT, J.J. Sheridan et al. OF MICROBES AND MOLECULES: FOOD TECHNOLOGY AT M.I.T., S.A. Goldblith MEAT PRESERVATION. R.G. Cassens PIONEERS IN FOOD SCIENCE, VOL. 1. S.A. Goldblith FOOD CONCEPTS AND PRODUCTS: JUST-IN-TIME DEVELOPMENT, H.R. Moskowitz MICROWAVE FOODS: NEW PRODUCT DEVELOPMENT, R.V. Decareau DESIGN AND ANALYSIS OF SENSORY OPTIMIZATION, M.C. Gacula, Jr. NUTRIENT ADDITIONS TO FOOD, J.C. Bauernfeind and P.A. Lachance NITRITE-CURED MEAT, R.G. Cassens CONTROLLEDlMODIFIED ATMOSPHERENACUUM PACKAGING, A.L. Brody NUTRITIONAL STATUS ASSESSMENT OF THE INDIVIDUAL, G.E. Livingston QUALITY ASSURANCE OF FOODS, J.E. Stauffer SCIENCE OF MEAT & MEAT PRODUCTS, 3RD ED., J.F. Price and B.S. Schweigen NEW DIRECTIONS FOR PRODUCT TESTING OF FOODS, H.R. Moskowitz PRODUCT DEVELOPMENT & DIETARY GUIDELINES, G.E. Livingston, et al. SHELF-LIFE DATING OF FOODS, T.P. Labuza Journals JOURNAL OF FOOD LIPIDS, F. Shahidi JOURNAL OF RAPID METHODS AND AUTOMATION IN MICROBIOLOGY, D.Y.C. Fung, M.C. Goldschmidt and D.H. Kang JOURNAL OF MUSCLE FOODS, M.S. Brewer JOURNAL OF SENSORY STUDIES, M.C. Gacula, Jr. FOODSERVICE RESEARCH INTERNATIONAL, P.L. Bordi JOURNAL OF FOOD BIOCHEMISTRY, N.F. Haard and B.K. Simpson JOURNAL OF FOOD PROCESS ENGINEERING, D.R. Heldman and R.P. Singh JOURNAL OF FOOD PROCESSING AND PRESERVATION, B.G. Swanson JOURNAL OF FOOD QUALITY, J.J. Powers JOURNAL OF FOOD SAFETY, T.J. Montville and K.R. Matthews JOURNAL OF TEXTURE STUDIES, M.C. Bourne, T. van Vliet and V.N.M. Rao
Newsletter FOOD, NUTRACEUTICALS AND NUTRITION, P.A. Lachance and M.C. Fisher
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE AND CONSUMER PRODUCT TESTING Howard R. Moskowitz, Ph.D. Moskowitz Jacobs Inc. White Plains, New York
Alejandra M. Mufioz, M.S. IRIS: International Resources for Insights and Solutions, LLC Mountainside, New Jersey
Maximo C. Gacula, Jr., Ph.D. Department of Psychology Arizona State University Tempe, Arizona
FOOD & NUTRITION PRESS, INC. TRUMBULL, CONNECTICUT 06611 USA
Copyright
@
2003 by
FOOD & NUTRITION PRESS, INC. 4527 Main Street Trumbull, Connecticut 04411 USA
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means: electronic, electrostatic, magnetic tape, mechanical, photocopying, recording or otherwise, without permission in writing from the publisher.
Library of Congress Control Number: 2003109437
ISBN: 0-917678-57-5
Printed in the United States of America
DEDICATIONS To my mother, Leah Moskowitz. You have encouraged me over the years to develop my ideas, to write them down, and to share with others through publishing. To your constant support and guidance, I owe so much. Thank you. HOWARD R. MOSKOWITZ
To my son Ryan. Your being, smiles and love are my encouragement, inspiration and drive as a mother and a professional. My experiences and accomplishments are enriched by your presence and love. ALEJANDRA M. MUROZ
To my parents, Maxim0 Sr. and Elena Calo. Thank you so much for planting in me the value of education, courage, respect, and love while growing up, which became the foundation of my daily life. MAXIM0 C. GACULA, JR.
PREFACE We, the authors, thank you for spending some time with us by reading this book. You may have noticed from the title that the book is not a simple presentation of a field with a unified focus. Rather, we deal with controversies. Our field of sensory science has grown mightily in the past decades. To a great extent the growth has come from the resolution of different points of view regarding what is appropriate in sensory science, what are reasonable truths, and what are good practices hallowed by the experience of practitioners. In no case do we present points of view as ultimate truths. As the construction of the book reveals, we rather present different approaches to the same problem, and even different ways to look at the same type of data. In our discussions amongst ourselves and in our evidencing disagreements with each other, we sincerely hope that we provoke you to think more deeply about the issues involved in product assessment, the design of studies, and the analyses of data. If we cause you to think more critically about the problems and even take issue with our points of view (joint and several), we will have succeeded in our task. HOWARD R. MOSKOWITZ ALEJANDRA M. MUROZ MAXIM0 C. GACULA, JR.
ACKNOWLEDGMENTS It is an honor having the following individuals participating in this endeavor. ANDRE ARBOGAST
Biosystkmes 9, rue des Mardors F-21560 Couternon France Email:
[email protected]
DANIEL ENNIS, Ph.D.
The Institute for Perception 7629 Hull Street Road, Suite 200 Richmond, VA 23235 USA Email: ennis@i€press.com
CHRIS FINDLAY, Ph. D.
Compusense Inc. 111 Farquhar Street Guelph, Ontario Canada NlH 3N4 Email:
[email protected]
PAUL LICHTMAN
Sensory Computer Systems 16 South Street Morristown, NJ 07960 USA Email:
[email protected]
CONTENTS CHAPTER
PAGE
1 . The Role of Sensory Science in the Coming Decade . . . . . . . . . . . 1 2 . International Sensory Science . . . . . . . . . . . . . . . . . . . . . . . . 31 3 . Sensory Mythology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 4 . Contrasting R&D. Sensory Science. and Marketing Research Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 5 . Validity and Reliability in Sensory Science . . . . . . . . . . . . . . . 97 6. The Interface Between Psychophysics and Sensory Science: Methods Versus Real Knowledge . . . . . . . . . . . . . . . . . . . 103 7 . Descriptive Panels/Experts Versus Consumers . . . . . . . . . . . . 109 8 . Sample Issues in Consumer Testing . . . . . . . . . . . . . . . . . . . 125 9 . Hedonics. Just-About-Right. Purchase and Other Scales in Consumer Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 10. Asking Consumers to Rate Product Attributes . . . . . . . . . . . . . 173 11. Questionnaire Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 12. Choice of Population in Consumer Studies . . . . . . . . . . . . . . . 209 13 . Biases Due to Changing Market Conditions . . . . . . . . . . . . . . 231 14. Sample Size N. or Number of Respondents . . . . . . . . . . . . . . 241 15 . The Use and Caveats of Qualitative Research in the DecisionMaking Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255 16. The Four D’s of Sensory Science: Difference. Discrimination. Dissimilarity. Distance . . . . . . . . . . . . . . . . . . . . . . . . . 267 17 . Replication in Sensory and Consumer Testing . . . . . . . . . . . . . 299 18 . Language Development in Descriptive Analysis and the Formation of Sensory Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . 313 19. Use of References in Descriptive Analysis . . . . . . . . . . . . . . . 337 20 . Training Time in Descriptive Analysis . . . . . . . . . . . . . . . . . 351 21 . Consumer-Descriptive Data Relationships in Sensory Science . . . 359 22. Product and Panelist Variability in Sensory Testing . . . . . . . . . 375 23 . Foundations of Sensory Science by Daniel M . Ennis . . . . . . . . 391 24 . Applications of SAS@Programming Language in Sensory Science by Maximo C . Gacula. Jr . . . . . . . . . . . . . . . . . . . . . . . . 433 25 . Advances and the Future of Data Collection Systems in Sensory Science by Andrd Arbogast. Chris Findlay and Paul Lichtman . 459 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 471
CHAPTER 1 THE ROLE OF SENSORY SCIENCE INTHECOMINGDECADE HOWARD R. MOSKOWITZ Sensory Science is enjoying a period of strong growth, both at the intellectual and at the practical levels. At the intellectual level the influx of statisticians, psychologists, in addition to the usual complement of food scientists, continues to increase the knowledge base and skill set. At the practical level Sensory Science has graduated to first class membership from its former role of a second-class citizen in both academia and corporations. Product developers and product marketers seek out sensory scientists for advice in designing studies, for assistance in collecting data from experts and consumers, and for guidance in interpreting results. Yet, for many years Sensory Science lacked solid intellectual foundations in many of its aspects, perhaps because Sensory Science grew organically, in an undisciplined fashion. The growth was dictated by the early use of Sensory Science as a practical, albeit “kitchen tool, and only later by the concerted efforts of researchers in science and business that would then establish the field on a more rigorous foundation. The beginnings of Sensory Science involved practitioners, and only later would involve scientists. This history flies counter to the usual order of events, whereby a field begins with science and evolves to practice. As a consequence, sensory scientists are only beginning to have available to them a coherent corpus of knowledge, embodied in textbooks and refereed journals, respectively. The first major text in the field (Principles of Sensory Evaluation of Food, Amerine et al. 1965) comprised short abstracts and mini-discussions of much of the work known to the researcher 35 years ago. The book reflects the bias and approach of Rose Marie Pangborn, one of the first in the field, and certainly the most prominent for the three decades during which she published and flourished. However, the literature in this seminal book was abstracted from contributions in many diverse fields, since Sensory Science itself had not been established. The current state of affairs is far different. Beginning in the early 1980s, authors in the field have contributed numerous volumes, both authored and edited. These volumes provide the reader with overviews as to how the senses work, how to describe product, how to measure sensory and hedonic responses, and how to combine sensory data with analytic data. Journals in the field include Journal of Sensory Studies (published by Food & Nutrition Press, Inc.), and Food Quality and Preference (published by Elsevier Ltd.), respectively. These journals were introduced in the late 1980s and middle 1990s, and reflect the demand for a 1
2
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
scientific corpus of knowledge beyond books and published proceedings of symposia. With the growth of any field there emerge currents and counter-currents that inevitably affect the way the field mamres, and the form that it takes. Sensory Science is no exception. We see today a variety of different trends that we believe will interact to shape the field. Some of them and their likely effects follow. These observations, written in the early years of the decade (2000-2010) may represent just a local perturbation, or may in fact represent a simple extension of trends that began two and three decades ago. One can only understand the full ramifications of a trend looking backward, and not peering forward. Trend 1 - More Education, More Universities as Centers of Excellence, Better Education Sensory scientists are becoming more educated in basic science and in the applications of sensory methods. The education obtained by the sensory scientist may be formal (from universities, with defined curricula) as well as practical (from research projects, undertaken in companies or at universities). More education means that the sensory scientist in the future will have a better understanding of where the field has been, what types of problems previous researchers and practitioners have faced, what is held to be true, and what methods have been accepted as standard. Some management consultants would call this an expansion of the "skill" set of the sensory researcher. Armed with this knowledge the sensory researcher may expect to gamer more respect in the scientific and business fields because Sensory Science itself will be recognized as a formal discipline. One can contrast this happy state of affairs with the situation faced by Pangborn and her professional colleagues some 30 to 40 years ago, when sensory researchers were few and far between, and there were no available courses or texts on Sensory Science. The education of a sensory scientist 40 years ago was crafted after discovering and then reading a literature widely dispersed in different content journals. It is also worth noting that there are many more universities offering education in Sensory Science, so that no one single viewpoint or world-view dominates. Twenty years ago it was the University of California at Davis, under Pangborn's inspired guidance. The Davis way represented, at that time, the way. the ne plus ultra, the revealed doctrine. Today there are a dozen universities or more offering a full series of courses on Sensory Science. Internationally, many more universities are recognizing the need for Sensory Science, and whereas before they had shrugged off this American phenomenon, now they embrace it and offer specialized majors. One need only attend conferences to recognize the
THE ROLE OF SENSORY SCIENCE IN THE COMING DECADE
3
growth of the universities, and the internationalization of the academic field to recognize this development.
Trend 2 - Advanced Statistics Widely Available Today’s sensory scientists have access to numerous statistical packages available on personal computers. Some of these are powerful versions of traditional computational tools, such as analysis of variance, factor analysis, discriminant function analysis, and the like. Other statistical tools incorporate developments in mapping, where the objective is to locate the stimulus product or the panelist or the attribute (or even all three) in a geometrical space with the property that points “close together” in the space are similar along one or more characteristics. These tools go under the name of multidimensional scaling. A third set of tools involves the relation of two or more sets of variables. These methods include partial least-squares (PLS), and reverse engineering. A fourth set of tools involves modeling and optimization, with the objective to identify processing or ingredient conditions leading to a high level of customer satisfaction, while simultaneously maintaining the independent variables and specified dependent variables within the range tested. Properly used, these advanced statistics enable the sensory researcher to better understand relations among stimuli or relations among variables. They lead to new insights that make the results more valuable. For example, maps or models provide significant value both to the scientist looking to understand how nature works, and to the developer, looking to create new products that fulfill the consumer requirements.
Trend 3 - Increased Contact among Researchers Through Books, Journals and Specialized Conferences If we look back forty years ago, to 1960 or thereabouts, we find virtually no consistent recognition of Sensory Science as an emerging specialty. The articles on Sensory Science were scattered across different journals, often as “reviews” of “how to do things,” rather than as structured reports. There were no journals in the field devoted to Sensory Science per se. There might be featured articles from time to time about the importance of understanding the senses for a particular product category. However, there was no single source to which an interested person could turn to learn what was really known about the senses with regard to product development and consumer perception of products. Furthermore, there were no conferences dealing specifically with the sensory aspects of products. Conferences would feature one or another speaker talking about Sensory Science, but often such a speaker was relegated to a minor position on the program. Changes came in the 1960s. As noted above, the Amerine, Pangborn and Roessler book, which was the first to appear with extensive scientific literature
4
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
and approaches, was published in the middle of the decade. Indeed, it was only in the middle 1960s and afterward that any attention was paid to the chemical senses (taste, smell) and the ancillary use of these senses in Sensory Science. Among the first organized meetings was the Gordon Research Conference at Issaquah, Washington, devoted to the chemical, and an early meeting at the Swedish Institute for Food Preservation Research at Gothenberg, Sweden. These meetings promoted interaction among the sensory researchers who were just beginning to recognize that they were founding a distinct science. This author remembers the palpable excitement generated by the early meetings, where perhaps for the first time researchers in a variety of disciplines began to recognize that they were forming the nucleus of a new science. Researchers such as Emily Wick, of Arthur D. Little and then of MIT (Food Science) would later talk fondly about these early meetings and the agenda of research goals that they set. The trend today is for far more contact among researchers. The United States led the way in the 1970s by founding the Sensory Evaluation Division of the Institute of Food Technologists (IFT), and the Sensory Evaluation Division (E-18) of the American Society for Testing and Materials. The 1990s witnessed an explosive growth in societies, meetings, journals, and all forms of contacts among researchers. Annual meetings of the Sensory Evaluation of the IFT, long the mainstay of the sensory world, were complemented by professional meetings of Committee E-18 on Sensory Evaluation of the ASTM (American Society for Testing and Materials). Worldwide sensory scientists contact each other more frequently through meetings such as the triennial and now biennial Pangborn Symposium, the biennial Sensometrics meetings, and other organizations. Sensory scientists are also beginning to attend meetings such as Food Choice, and participate heavily in specialized conferences dedicated to one or another theme involving food. Trend 4
- Beyond Expert Panels Towards Consumer Research
Many sensory researchers now recognize that their specialization in product testing fits well with the emerging corporate needs in consumer research. They have changed their focus from experts to consumers. Twenty-five years ago it was rare for a sensory scientist to venture out into consumer or market research. Sensory scientists assumed a purist stance, demanding highly controlled test environments (the well-known “booths”), well-practiced panels, experts if possible, and analytical rigor (both in statistical terms and in conceptual terms). Consequently, sensory scientists ran their own in-house expert panels, and market researchers ran consumer studies, often on the same products. Sensory scientists asked relatively few questions of consumers, if they ever tested consumers, whereas market researchers asked much of consumers. The two
THE ROLE OF SENSORY SCIENCE IN THE COMING DECADE
5
professions, Sensory Science and market research, never really met, or if they did then the encounter was simply one of the many chance encounters that inevitably occur at large meetings, perhaps in connection with a specific project. Both sensory scientists and consumer researchers attempted to gain corporate recognition for product guidance. Consumer researchers by and large were untrained in sensory methods, and more accustomed to what we might today consider primitive test procedures. There was the inevitable fight for “turf,” and for corporate recognition and resources. Often the fight took the guise of a polite interaction and disagreement. Just as often the fight took the guise of an all out confrontation for control, survival, and corporate respect. Today in many but not all corporations there is much more communication between sensory scientists and consumer researchers. Indeed, in some forwardlooking companies, sensory scientists and market researchers report to the same vice president, eliminating what traditionally has been an area of contention. As a consequence of the enhanced communication between the two disciplined groups responsible for guidance, many companies now enjoy the fruits of their knowledge. For example, advanced methods for product development, such as response-surface mefhods, once an arcane statistical discipline, are now widely used in many companies to identify the effects of ingredients on sensory and liking reactions, and to optimize foods along a limited number of formula dimensions. Consumer research procedures, such as category appraisal to assess a wide range of competitive products in regard to Drivers of Liking@have been adopted by sensory scientists for routine use. Sensory scientists have even gotten into the concept development business, along with consumer researchers. Rather than handing over to marketing the role of creating product concepts, R&D directors have accepted the challenge of membership in a team that creates concepts, joining forces with the marketing group. The interaction of sensory scientists and consumer researchers is not always a pleasant one because of the aforementioned turf issues, and because the relations between the two disciplines grew out of a history of competition, not cooperation.
Hazards and Remedies. With every advance, however, there are always hazards to overcome. One of these hazards is the tendency of sensory scientists to think of themselves as the low cost suppliers of data, a self-view reinforced by many short-term focused corporate managers. In the face of increasing costs to obtain consumer data, many sensory scientists still maintain that they can produce data at lower costs than that of conventional market researchers. Market researchers are also held to limited budgets, but over the years they have had the benefit of budgets that they can allocate to outside suppliers and consultants. Inevitably, market researchers become smarter as they interact with outsiders who bring to the problem a different point of view. In contrast, the sensory scientist may miss this opportunity to grow through outward focus. Sensory
6
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
scientists make the mistake of trying to do more maintenance work and less innovative development work. It is, in fact impossible for sensory scientists to mature, do more innovative work, when the majority of the budget and the corporate kudos come from the performance of maintenance work at ever-lower costs. Simply stated, the wrong behavior is being reinforced, and the sensory researcher is turned into a drone, ever so slowly, but ever so relentlessly. The Sensory Scientist as a Sensory Professional Forty years ago there were no sensory professionals. A sensory professional at best was someone conversant with the methods of testing, with experience, capable of carrying out fairly straightforward tests. The concept of a “sensory professional” in and of itself would not have been thinkable. There was no corpus of knowledge that this “professional” would access, other than the disparate articles in the different subject fields referred to above. In the middle 1970s, with the acceptance of Sensory Science by companies, but without the widespread knowledge base, sensory scientists needed to define themselves. For want of a better name they called themselves sensory professionals. They could not call themselves sensory researchers, for that was a label more appropriate to biologists and experimental psychologists. In the middle 1970s, therefore, the title of sensory professional served to legitimize the developing field and the workers therein. Looking back one can see how the use of the term “sensory professional” did indeed produce a greater espirit de corps, and more self-pride. As one might imagine, however, the use of the term “sensory professional” also hindered development and growth, simply because it gave the researcher a title that signified “arrival,” rather than “skill set.” The situation has dramatically changed during the past 25 years. The change has been continual, not discrete, with few individual points that signal a change. There is a corpus of knowledge, there are standard methods, and the sensory scientist continues to grow. The meaning of sensory professional is now changing, taking on more “baggage,” both positive and negative. The sensory professional has won some academic recognition, attends and presents at conferences, occasionally publishes, and interacts with other scientists in various subject fields. At the same time, the use of the term “sensory professional” appears to be diminishing as practitioners enter from other professions, e.g., psychology, food science, and sociology, equipped already with well accepted academic and professional credentials. The term “sensory professional” is now often used for the technician rather than the higher-level management end of the field, perhaps as a legitimizing device for those individuals. Happily, however, the upper strata of sensory scientists do not need the title of “sensory professional” in order to gain legitimacy.
THE ROLE OF SENSORY SCIENCE IN THE COMING DECADE
I
Can We Improve Sensory Education? Sensory education has come a long way in the past forty years. The early days of sensory education consisted primarily of statistical analysis techniques, and detailed descriptions of how to run studies. Both of these went together, because the food world perceived Sensory Science as “taste testing.” Indeed, the informal appellation of Sensory Science was “taste testing. ” Participants would be invited down for a “taste test” in the “taste-test kitchen.” In the main a sensory scientist would use inferential statistics (e.g., tests of difference to determine whether two or more products were different from each other, whether or not storage had an effect on sensory quality, and the like). Rarely did the sensory scientist venture beyond comparison among products to higher levels such as modeling. Indeed, the notion of modeling was antithetical to taste testing, except perhaps to model the effects of time of storage on product acceptance, and even in that case the demand for modeling came from the food scientist, not from the “taste tester.” With this type of history it is no wonder that for many years sensory researchers had little in the way of a formal, rich, education that would prepare them to advance the field. Execution, not education, was called for. Those who succeeded did the test properly, not the proper test. What about the future of sensory education? Today’s texts on Sensory Science amalgamate with various degree of success such different disciplines as sensory psychophysics and sensory biology, cognitive psychology, statistical analysis, and discussions of the numerous findings published in the literature on a variety of problems of interest to the sensory scientist. Some of these topics are intensity scaling, mixtures, time-intensity, expert panels, correlation of sensory and instrumental measures, etc. Sensory education appears to be developing appropriately, on schedule, with increasing scope and sophistication. The trends for the next generation of sensory education will probably encompass more cognitive psychology (e.g., concept development of food concepts; applied concept development), as well as higher level statistical techniques to discover patterns and relations in existing data (e.g., multidimensional scaling, partial least-squares, neural nets, etc.). Today’s professors of Sensory Science favor advanced thinking, along with more integration of information and modern techniques from other fields. Today’s sensory professors are well aware, as never before, of the profound inter-connectivity of different fields of research underlying Sensory Science. It should come as no surprise that the education is more rigorous, more thought provoking, and more wide-ranging than ever before. We may expect to see that happy trend continue, and the field mature even more. It is worthwhile noting that the issue of proficiency and accreditation is surfacing both in Sensory Science (see MuAoz, following), and in related areas,
8
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
such as marketing research (William Neal, personal communication to HRM, 1999). As the disciplines of study design, data acquisition, and analysis become more accepted in industry, and as the financial stakes rise and competition sets in, many individuals in market research “hang out their shingle,” and evolve rapidly into practitioners. These superficially trained individuals without depth experience in Sensory Science, like those individuals lacking any experience in Sensory Science, may ultimately harm the profession. It remains to be seen whether accreditation can be awarded on an equitable basis, without favoritism and nepotism. Yet, with the increasing financial importance of product successes, and with the intense competition among practitioners, it is vital that some form of validation of the researcher be instituted. Fads and Beliefs Versus Enduring Truths As Sensory Science matures into an accepted discipline we also see emerging a variety of fads, or to be less critical and more generous, a variety of different popular topics. Fads in Sensory Science comprise approaches to problems that endure for some time, but eventually fade away, either because they are founded on “me-too-ism,” or because they are well-funded by corporations that want to keep abreast of the most au courant methods. Each era of research brings its own fads, partly because of the cross-pollination of disciplines, partly because researchers want to do new work, and not simply shamble along well-trodden paths. Nothing so gratifies one as participating in research deemed to be “new and ground-breaking” by the so-called cognoscenti, or at least by those currently in power, in favor, or in positions of professional authority. Here are five fads that have enjoyed favor over the past years: (1) Scales: e.g., scaling procedures (different types of rating scales) and representation methods (e.g., different types of mapping). One has to be careful not to confuse fads with the honest attempt to bring methods/ procedures from other subject fields into Sensory Science. What is one person’s science may be another person’s fad. Only time will tell whether scaling procedures are faddish or substantive, and contributory.
(2) Panelist Selection: e.g., specific screening procedures to identify “sensitive panelists.” Extensive panelist screening may be a fad because all too often the sensitivity in the panelist screening has nothing to do with the product being evaluated, but is just adopted because someone in the corporation, or someone in the literature recommended doing so.
THE ROLE OF SENSORY SCIENCE IN THE COMING DECADE
9
(3) Replication: The reason for the replicates is not clearly stated, nor often even understood. Replication is faddish when the rationale is not provided, but simply followed blindly. On the other hand, replication is not faddish when it is used to estimate subject variability from evaluation to evaluation. (4) Product Test Limits: Limiting the number of products tested by a panelist
to very few because one assumes that the panelist cannot accurately evaluate more than a few products without becoming confused. This assumption of panelist’s lack of ability is belied by the observation that in every day life panelists have no discrimination problems when eating. They rarely if ever report loss of sensitivity during a meal, and in fact hardly pay attention to the fact that they are consuming mouthful after mouthful of product, without seeming to lose any sensitivity at all. ( 5 ) Expert Panelists as The Repository of Subjective Truth: The researcher uses expert panelists to describe the sensory magnitude of a stimulus, under the assumption that the consumer is absolutely incapable of validly scaling sensory magnitude of a stimulus on a variety of different sensory dimensions. This belief is a passing fad, belied by the literature, and simply spread through “me-too-ism” (viz., “I too have an expert panel to acquire valid sensory data”). The consumer, in turn, is relegated to ratings of liking. This is a particularly pernicious fad that, in this author’s opinion, has retarded progress in the field, and has been often motivated by less than noble reasons.
Enduring truths, in contrast, are usually of a much more prosaic nature, and seem almost trivial. Enduring truths may be the obvious: (1) You Can’t Describe What You Can’t Verbalize. For example, if the researcher wants to use novel terms for descriptive analysis, then the researcher should orient or train the panelist on these terms, or at least define them. Orientation does not necessarily mean weeks of intensive training. Orientation may simply entail a clear, usable explanation.
(2) Too Few Scale Points Hinder Discrimination. Another enduring truth is that by confining the panelist to a short scale (e.g., three scale points), the researcher loses a great deal of information about the differences among products. (3) People Perform Pretty Well. They are biologically adapted to do so. A third enduring truth is that panelists can accurately track the changes in physical characteristics of the product, especially if they understand the
10
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
meaning of the scale and the meaning of the attribute. Psychophysicists (Stevens 1975) and sensory scientists alike (Moskowitz 1983) have demonstrated this capability again and again, in numerous, well-controlled studies. It is unfortunate that many uneducated practitioners aver, quite vehemently, that panelists cannot do much. In contrast, the human being is a marvelously constructed organism that has no problem performing most evaluation tasks.
REFERENCES AMERINE, M.A., PANGBORN, R.M. and ROESSLER, E. 1965. Principles of Sensory Evaluation of Food, Academic Press, San Diego. MOSKOWITZ, H.R. 1983. Product Testing and Sensory Evaluation of Food: Marketing and R&D Approaches. Food and Nutrition Press, Trumbull, Conn. STEVENS, S.S. 1975. Psychophysics, An Introduction To Its Sensory, Neural And Social Prospects, John Wiley & Sons, New York.
ALEJANDRA M. -0Z Sensory Science has evolved from supporting “expert taste tastings” in the beer, wine and other similar industries (Hinreiner 1956; Amerine et al. 1959) to developing formal and sound methods, to incorporating elements of other disciplines, such as psychology, statistics and psychophysics in its methods and developments, and established itself as a respected science. In the last five decades, Sensory Science has been refined, has matured and been applied in other disciplines, and its professionals have learned and applied the existing methods in many industrial, academic and research applications (Pangborn 1964). In addition, sensory professionals have participated in and supported activities that have contributed to their continued education and growth including workshops, symposia and publications. Entering the new millennium we professionals in this field are awaiting and willing to support and contribute to our “next steps” and new developments. Sensory Science will continue to grow. How can we grow as individuals, and make our field grow in the next decade and new millennium? It will continue on the path of establishing itself as a well-known, well-accepted applied discipline and will expand into new areas never before pursued. Some of the areas Sensory Science will concentrate on in the next decade include the following, as viewed by this author.
THE ROLE OF SENSORY SCIENCE IN THE COMING DECADE
11
Continued Application of Existing Sound Sensory/Consumer Methods Experienced sensory professionals have used the sound sensory and consumer testing methods in a wide range of applications, modified and adapted them to new situations/applications, and confirmed their value through many successes over the years. As discussed below, we need to develop and apply new sensory techniques to contribute to the growth of our field. However, while we develop, test and validate new methods, we will continue using these wellknown, established and validated methods to accomplish our daily functions in industry, research and academia. In addition, we will teach these methods to our young professionals to provide them with a good foundation to operate in the field. These methods that have solid principles and that will continue being used by sensory professionals include standard discrimination tests (simple difference, triangle, duo trio tests, etc.) (Stone and Side1 1993; Meilgaard et al. 1999), consumer qualitative and quantitative consumer tests to collect consumer opinions, acceptance and/or preference responses as well as intensity/diagnostics product information (Peryam and Pilgrim 1957; Moskowitz 1983; Chambers and Smith 1991; Stepherd et al. 1991; Casey and Krueger 1994; Krueger 1994; Resurreccion 1998), as well as the fundamental descriptive methods, such as the Flavor and Texture Profile and the Quantitative Descriptive Analysis Methods (QDA), (Caul 1957; Brandt et al. 1963; Stone et al. 1974; ASTM 1992), and its derivatives, such as Modified Profile Methods (Muiioz and Civille 1992; Stampanoni 1994; Muiioz and Bleibaum 2001). In addition, to better understand and apply existing methods, sensory professionals should continue to learn from and to review the wealth of published information on sensory and consumer studies and their applications, which is published in journals such as the Journal of Sensory Studies, Food Quality and Preference, Chemical Senses, Journal of Texture Studies, etc. Hopefully, an effort can be made by the sensory professionals to set aside the time in order to keep up-to-date with this published information. As scientists, we should also encourage, support and commend the use of and the research on new methods and techniques, while continuing to apply existing methods. As these new methods and approaches appear in the scientific literature, we should expend the effort to learn these new developments, as well as have the courage to apply them.
The Development and/or Use of New Methods and Concepts As with any other science, Sensory Science is evolving and growing. New theories, methods and concepts are being developed and applied. Sensory professionals are encouraged to contribute to these developments and apply several of the new techniques.
12
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
Challenging areas and new developments that have been addressed in the past few years that sensory professionals need to track, learn and apply include:
non traditional and new discrimination test theories and methods, especially those based on signal detection theory, Thurstonian scaling, R-index (O’Mahony 1979; Ennis 1993, 1998; Ennis and Bi 1998; Cliff ef al. 2000; Rousseau 2001; Rousseau and O’Mahony 2001). deeper consumer understanding and exploration of consumer articulated and unarticulated wants and needs, using new consumer testing approaches, such as, ethnography, projective and elicitation techniques (Swan ef al. 1996; Bech er al. 1997; Perry 1998; Baxter ef al. 1998; Rousset and Martin 2001; Muiioz and Everitt 2003; Urbick 2003; Woodland 2003). application of traditional and creative qualitative techniques (Krueger 1994; Dudek and Childrens 1999). new statistical applications and advanced analysis for sensory data and applications including analysis of categorical and replicated data, application of experimental design, new statistical approaches for data relationships, etc. (Pouplard er al. 1997; Wilkinson and Yuksel 1997; Brockhoff and Schlich 1998; Kunert 1998; Bi er al. 2000; Tang er al. 2000; Malundo er al. 2001; Best and Rayner 2001). data relationships techniques (consumer, descriptive, instrumental) to integrate product information and to understand products and populations (e.g., consumers) (McEwan 1996; ASTM 1997; Wilkinson and Yuksel 1997; Elmore ef al. 1999; Malundo ef al. 2001). monitoring techniques needed to calibrate attribute/descriptive panels (sensory and statistical methods) (Schlich 1994; McEwan 1999; Qannari and Meyners 2001; King er al. 2001).
Participation in, and Support of All Research Involving the Development of New Concepts and Methods Currently, most of the publicly available sensory research is conducted by universities. Sensory professionals working in industry are either too busy to participate in research studies, or are not allowed to publish, since this information sharing may not be supported by management. Our field can only grow to the extent that research and new developments are completed. Therefore, it is our obligation to encourage and participate in this activity. In
THE ROLE OF SENSORY SCIENCE IN THE COMING DECADE
13
this upcoming decade we should participate in and/or support much of the research in our field.
Financial support is critical. Sensory programs in universities benefit from the support received by grants, industry and other organizations to be able to get involved in sensory research. Hopefully more support will be given to aid universities in completing more and better research. Even if some sensory professionals are unable to personally engage in research, they can still be part of the growth by reading and learning about the new developments and applying them. A number of sources of support have recently become available, giving practitioners and academics alike the prospect of even greater developments in the field. Recently the ASTM committee El8 on Sensory Evaluation has initiated discussions on the need to support additional research in the field of Sensory Science and thus formed a standing committee on sensory research. Its goal will be to select relevant topics needing research, obtaining funds and manage the research efforts within and outside its membership. Delineation of Sensory Science’s Name and Role When Sensory Science established itself as a science, it was named Sensory Evaluation and encompassed analytical/discriminative and consumer methods (Amerine et al. 1959; Pangborn 1964). In the past few years, two trends have been observed: (1) Many professionals refer to our field as Sensory Science vs. Sensory Analysis or Evaluation; and (2) Many companies and professionals limit the role of Sensory Science to analytical/discriminative testing, thus exclude consumer testing. In addition, many of these companies make a distinction between the roles and professional involvement of sensory and consumer scientists. We sensory professionals should be aware that there are three different names used interchangeably for our field: Sensory Science, Sensory Analysis and Sensory Evaluation. Currently, there is a trend toward preferring the use of the name Sensory Science to connote the nature of this discipline. However, we should be cognizant of the fact that many companies and individuals differentiate between Sensory Science and Consumer Science. Thus, nowadays the role of a Sensory Science group within an organization cannot be easily inferred. An explanation of the tasks and involvement of a sensory group is needed to understand its role. In this author’s opinion the differentiation should not exist. A sensory professional should be recognized as one who is knowledgeable and uses BOTH sensory and consumer methods to provide answers and guidance. It is our responsibility to clarify the role of Sensory Science and ensure that individuals and companies understand the basis and role of this discipline. Until this
14
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
happens, it is important to recognize that this differentiation will continue to exist. In this publication, this author will use the word sensory professional or scientist to describe the individual who is knowledgeable and uses BOTH sensory and consumer methods to fulfill his job responsibilities. Occasionally, if necessary, the term sensory/consumer scientist may be used. Global Sensory Science Sensory Science like other sciences has taken a global view (Stone and Side1 1995). This global perspective is needed, due to: (1) the extensive collaboration taking place among professionals around the world; and (2) the need to develop and apply product evaluation techniques globally in order to respond to the industry’s needs to test products abroad. Therefore, in this author’s view the globalization of Sensory Science is happening and will continue to expand in the next decade in five areas: (1) the execution of cross cultural studies, as the companies we work for expand their business and interests into other countries and cultures; (2) trans-national collaboration for the development or improvement of companies’ affiliate/foreign sensory programs; (3) the need for companies to harmonize approaches, methods and references across all their affiliates worldwide; (4) the recognition of a common goal to understand global consumer responses and the need to adapt sensory consumer methods for world-wide use, recognizing different cultures; and ( 5 ) the interaction, exchange of ideas and collaboration among sensory scientists around the world. An in-depth discussion of this important topic is presented in Chap. 2. Sensory Proficiency and Accreditation
Proficiency/accreditation is a very current, pressing issue being addressed by sensory professionals worldwide. The general goal of all proficiency and accreditation programs is to establish guidelines for the certification of sensory laboratories and professionals. Proficiency testing was recently initiated and sensory professionals all over the world are eager to learn about the progress and participate in this effort. Proficiency testing is the use of interlaboratory test comparisons to determine the performance of individual laboratories for specific tests, and is used to monitor the consistency and comparability of a laboratory’s test data with other similar laboratories. Participation in proficiency testing schemes provide laboratories with an objective means of assessing and demonstrating the reliability of their data, and to assess the ability of the laboratory to competently perform tests. Proficiency testing schemes are used by laboratory accreditation bodies as part of the process to assess the ability of laboratories to competently
THE ROLE OF SENSORY SCIENCE IN THE COMING DECADE
15
perform tests for which accreditation is held. Proficiency testing schemes are well established in certain scientific disciplines. However, proficiency testing schemes that allow laboratories to assess their performance and competence in Sensory Science were not available. The first attempt to address sensory proficiency was initiated in 1999 by PROFISENS, a EU project financed by the European Commission Standards Measurements and Testing SMT. The PROFISENS project was internationally based and included leading laboratories in Sensory Science throughout Europe in the project consortium. Also represented were multinational industries with particular interests in ensuring that the guidelines developed met the needs of industry. Furthermore, the guidelines were evaluated by several external specialists from the USA and Canada, and their comments were incorporated into the final proficiency scheme guidelines that were published. The main output of the PROFISENS project was the publication of a guideline document for conducting proficiency testing in Sensory Science. Published as Guideline No. 35 by Campden & Chorleywood Food Research Association, UK (CCFRA) (the coordinator), the document covers the technical requirements for conducting sensory proficiency tests and the management systems requirements of the proficiency scheme provider (Lyon 2001). Therefore, the document provides an overall summary of how a scheme would operate and the relative responsibilities of individuals or organizations in the planning and conduct of sensory proficiency tests. Other relevant publications of this project are the papers by McEwan er al. (2002a, b) that describe measuring performance panel issues for ranking and profile testing. While the PROFISENS project has come to an end, it has provided the basis for future work in this area. In response to this effort the ASTM El8 committee has formed a task group to address this issue through ASTM International (Chambers 2003). It is anticipated that other countries will respond to this challenging issue as well.
Sensory Science and the Internet In the new millennium all scientific work, business enterprises, and personal endeavors and undertakings will continue revolving around the Internet. It is anticipated that sensory professionals will use the Internet for information gathering and communication purposes, but also to conduct tests. The web helps researchers interact with consumers and panelists in ways barely imaginable over the past decade. Netmeetings continue to give rise to new advances in synchronous interactions with our subjects coordinated and facilitated by intelligent machines. This gives rise to synchronous meetings that allow panels, focus groups, and product evaluators to share their discussions
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
16
over both space and time. Advances in privacy will help in the maintaining of the anonymity of respondents and confidentiality of responses (Stucky 2001). Currently, there are companies that have used the Internet in the area of consumer testing (Kolkebeck 2002) and sensory and descriptive analysis (Curt er al. 2000; Findlay 2002). In the area of consumer testing, consumers are being recruited through the Internet, as well as asked to complete surveys and testing. In the area of descriptive analysis, panels are being trained and asked to evaluate products and completehend their evaluation results through the Internet (Curt et al. 2000). Kolkebeck (2002) reports the following successful uses of the Internet in consumer testing : (1) Recruiting Methods a.
b.
Recruiting agencies contact prospective respondents in their database by email and request that they call the agency to complete the screening procedure; and Recruiting agencies send email notification and URL link to a screening questionnaire to potential respondents in their database.
(2) Interviewing Methods a. b.
c.
Home Use Tests (HUTS): Respondents are either mailed producth or come to a central location to pick up producth. Diariedballots are completed on-line. Central Location Tests (CLTs): Respondents come to testing facilities to participate in a quantitative test to evaluate products that require controlled preparation or to evaluate concepts or prototypes in a secure environment. On-line ballots are completed by the respondent. Multi-Modal Data Collection: Respondents choose to complete questionnaires via telephone or Internet/Web.
Those companies developing these services capitalize on the advantages of the communication via the Internet, allowing: (1) communication between professionals within the organization; (2) direct, day to day, communication with individual consumers through chats, web forms, forums, noteboards, internet communities, and email; (3) evaluation of products by consumers or trained panelists from their home or location, and thus eliminating transportation time and logistics; (4) evaluation of products at different locations (e.g., countries, cities, laboratories);
THE ROLE OF SENSORY SCIENCE IN THE COMING DECADE
17
immediate data collection; automated, intelligent, real-time reporting and quality checking; collaboration of multi-locations/countrieson the same project, making changes and decisions in real-time; and ability to quickly and easily incorporate sensory information with other information and data sources within the company, allowing the human reaction to the product(s) to be incorporated at all levels of the product lifecycle. There is skepticism in completing testing through the Internet, mainly because it only reaches a limited population (those using the Internet and being computer literate), and prevents the control of testing conditions. However, the use of the Internet will continue to grow in the corning decade in all areas. Therefore, it is expected that Sensory Science will have to participate in this boom and will use the power of the Internet in many application areas.
The Active Role of Sensory Professionals In Industry The role of Sensory Science in industry has changed over the past years. Sensory professionals have taken a more active role in their companies. They now act as consultants within the organization to address business and technical issues, as well as establishing a closer collaboration with many groups within the organization.
Interaction With and Creation of Partnerships With Marketing/ Marketing Research. Sensory Science will grow internally at our companies if we establish stronger partnerships with market/marketing research professionals. These partnerships benefit primarily the company, as better projects are completed through the combination of skills and efforts of the different disciplines. In addition, in the course of completing their work, both sensory and marketers learn from each other. The acceptance of our methods and data by marketlmarketing research professionals should result in more frequent use of sensory techniques/methods in the company’s projects, as well as the involvement of sensory professionals in more important and visible projects within the company. We, as sensory professionals, can become partners with market/marketing research professionals by assisting them in key projects, demonstrating the value of sensory data, and assuring that these data contribute to the overall success of projects. The involvement of the sensory professional in other projects will be more frequently requested if these alliances are formed. The time is over for sensory professionals to complain about how they are regarded within a company, how the sensory groups are left out of important
18
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
planning meetings and projects, and how small sensory budgets are compared to those of market research. Sensory professionals should be assertive, create these partnerships that benefit the company and the individuals, and make themselves visible within the company. Sensory groups that have not accomplished these types of relationships with marketing/market research professionals will need to develop these relationships in the coming decade.
The Sensory Professional: The Internal Sensory Consultant and His/Her Interaction With Other Functions Within a Company. In the next decade, the sensory professional will continue establishing his/her role as a consultant within an organization, and increase his/her visibility and respect. A sensory professional should seek out “clients” within the organization and assure that the services and expertise of a sensory group are well-known and utilized. The visibility of sensory professionals within their organization is essential for ongoing involvement in key technical and business issues, and for management support or internal funding for programs, facilities, equipment and software, training, external support, etc. Visibility is gained by proving the best support to all research, marketing/market research, manufacturing and quality control functions. Therefore, it is in the sensory professional’s best interest to gain this visibility. This is best achieved when the skills and services of the group are well-known by all groups, and sensory tests and research are being effectively used within the organization. Continued Application Of Advanced Statistical Methods Statistics have always been an essential part of sensory testing. All quantitative sensory data should be statistically analyzed to be properly interpreted. Sensory professionals have a good understanding of the basic and some advanced statistical techniques typically used to analyze sensory data. In the coming decade, sensory professionals will continue to learn and apply advanced statistical methods, new statistical techniques and philosophies, and experimental design concepts in sensory studies (Courcoux and Semeneou 1997; Pouplard ef al. 1997; Wilkinson and Yuksel 1997; Brockhoff and Schlich 1998; Kunert 1998; Bi ef al. 2000; Tang ef al. 2000; Best and Rayner 2001a, b; Deppe ef al. 2001; Malundo ef al. 2001). Companies will continue to create links between the sensory and the statistics groups or alternatively hire statisticians to support sensory groups. Companies with adequate statistical support will look into applying more advanced statistical and experimental design concepts into sensory studies. Companies with little or no statistical support will have to incorporate statistics into their program.
THE ROLE OF SENSORY SCIENCE IN THE COMING DECADE
19
New statistical applications in the area of Sensory Science continue to be published and presented at international symposia (e.g., Sensometrics). Statisticians, sensometricians and chemometricians are making great contributions to the field of Sensory Science by developing new techniques to analyze and link sensory data, or applying statistical approaches not known by sensory professionals to the field of Sensory Science (Brockhoff and Schlich 1998; Kunert 1998; Deppe et al. 2001; Best and Rayner 2001a, b; Vigneau et al. 2001; Wakeling et al. 2001). It must be noted, however, that statistics are necessary but they only complement good sensory projects and research. The statistical analysis is only as good as the sensory data used. Emphasis should always be placed on the improvement of sensory methods and the generation of sound sensory data. The application of statistical techniques, simple and complex, is completed after one has made sure that the best sensory data has been generated. Data Collection Systems and Software
In the past decade, many sensory programs have incorporated the use of data collection systems. These systems allow the direct and prompt collection of data, minimizing the manpower thus far used in manual data entry. Over the past years, these systems have also incorporated other features such as experimental design, test design, sample coding and data analysis. As resources become less available within sensory groups, the need for data collection systems increases. Therefore, it is expected that the majority of the sensory programs will acquire a data collection system in the near future. These systems continue to be developed and improved to meet the needs of sensory professionals. A discussion of data collection systems and new developments is discussed in Chap. 25. In addition, software for sensory applications continues to be developed for routine sensory data analysis, panel perfoxmance, and advanced multivariate and mapping techniques. Chapter 24 briefly discusses the most well known software packages used in Sensory Evaluation, emphasizing SAS programming. The Application of Modified Sensory Methods When Resources Are Limited One of the challenges sensory professionals have to confront in the next decade is the development and application of alternate or modified sensory methods as resources (particularly funds and time) become less and less available. Companies will continue to cut back on resources and demand more work from their employees. Thus, sensory professionals will be required to get the sensory testing completed with limited available resources. Therefore, sensory professionals have to develop and incorporate new sensory techniques
20
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
and strategies into projects to be able to reduce the number of tests, resources, and the time needed to complete them. Among the techniques to consider if confronted with such limitations are: more screening of products prior to testing, more informal - but sound - and educated testing, the use of fewer but better trained panelists, the use of outside consultants or contractors for routine project work, the fielding of tests and statistical analysis to outside agencies, the development of shorter more effective reports, etc.
Learning about the Sensory Properties of Products This is an area to which insufficient attention has been paid thus far. The recognition and development of this area will increase in the next decade. All sensory programs at universities and most workshops offered to the sensory community address sensory methodology. Therefore, we are or can become very knowledgeable in methods. This knowledge is critical to be able to design and execute sound sensory tests and interpret the resulting data. However, there is another skill sensory professionals should develop - the deeper learning of the sensory properties of the products with which they work. With this knowledge, which involves having the ability to adequately describe a product’s perceived sensory properties, sensory professionals are able to conduct better research and project management. This skill also enables the practitioner to make better decisions regarding product presentation designs and attributes in consumer questionnaires; design more effective and complete descriptive training programs; monitor panel exercises and programs; better communicate with colleagues about the sensory attributes of the products; and transfer that knowledge to “non-sensory” professionals within the organization. We learn sensory principles at the university, through short courses and publications, but not sensory properties of products. We cannot “learn sensory properties” from books or lectures. We have to actively educate ourselves in this area. We gain this information by participating in descriptive analysis training programs, in sessions with trained descriptive panelists, by working closely with the descriptive panel leader or some of the trained panelists, by collaborating with internal people with experience in the company’s products, or by hiring consultants with knowledge of sensory properties.
Need to Educate Non-sensory Professionals in Sensory Methods and Sensory Properties In order for our field to be accepted and used in its totality in industry, we need to educate non-sensory professionals and users of sensory data (product developers, formulators, managers, marketing/market research professionals) in
THE ROLE OF SENSORY SCIENCE IN THE COMING DECADE
21
sensory methods and the sensory properties of products they work with. Sensory professionals will focus on this area in the coming decade. On a routine basis sensory professionals should design and conduct seminars within their organizations in order to teach sensory methods, their applications and limitations, and sensory properties of the company’s products. (1) Seminars on sensory methods: These seminars should provide an introduc-
tion of basic principles of the field, as well as illustrate the applications, advantages and limitations of each of the techniques used within a company. Basic data interpretation should also be addressed. In addition, methodology seminars should address issues and questions that non-sensory professionals have within an organization. Some of the topics addressed should be: the variability of sensory data (and its effect on the outcome of tests), designs of consumer and other sensory tests, test replications, forced vs. non-forced choice tests (preference, discrimination tests), etc. These seminars are also helpful in that they facilitate the discussion of sensory methods and issues, and therefore improve the understanding and use of sensory tools by nonsensory professionals. (2) Seminars on sensory properties: These seminars provide the non-sensory professional with a basic understanding of the sensory characteristics of the product categories they work with. These seminars are short and practical. Participants do not become trained panelists, but learn technical and precise descriptive language to describe the sensory properties of the products they work with. These seminars are highly effective for participants. The seminars give the participants tools to better discuss and communicate the sensory properties of products with professions both inside and outside the company.
Sensory Methods for the Evaluation of Non-Food Products Sensory Science was developed for foods (Amerine et al. 1959; Pangborn 1964). Consequently, a significant propohion of today’s sensory methods were developed and perfected in this area. In addition, much of the sensory knowledge and practices has been promoted through the university, and for the most part through the Food Science departments and related programs. Therefore, students learn sensory methods and their applications in foods. As a result, the food and beverage industries - in general - are the most advanced in our field, have the most complete sensory programs, have used sensory techniques for a long time, and apply the most advanced sensory and statistical methods.
22
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
In the coming decade, the sensory programs, and the research and methods used in other industries will continue to evolve. TraditionaL sensory methods developed and used for food will be modified for non-food products (e.g., in the personal, household care, and pharmaceutical industries, as well as in other industries) (Giboreau er al. 2001; Braddon ef af. 2002; Griffiths and Kulke 2002). For instance, the ASTM El8 (committee on sensory testing) subcommittee E18.07 on personal care and household evaluation has several task groups that work on non-foods applications, and publish sensory practices and methods for the evaluation of non-food products (e.g., hard surface cleaning products, shampoo, etc.). Several of the published standards for the evaluation of non-food products include: Standard E1593-94 on assessing the efficacy of air freshener products in reducing sensorily perceived indoor air malodor intensity (ASTM 1994); Standard E1207-87 (2002) on Sensory Evaluation of Axillary Deodorancy (ASTM 2002), Standard E1490-92 (2002) on Descriptive Skinfeel Analysis of Creams and Lotions (ASTM 2002).
REFERENCES AMERINE, M.A., ROESSLER, E.B. and FILIPELLO, F. 1959. Modem sensory methods of evaluating wine. Hilgardia 28, 477. ASTM. 1992. Manual on Descriptive Analysis Testing. MNL 13, R. Hootman, ed. ASTM, West Conshohocken, Penn. ASTM. 1994. Standard E1593-94. Assessing the Efficacy of Air Freshener Products in Reducing Sensorily Perceived Indoor Air Malodor Intensity. ASTM, West Conshohocken, Penn. ASTM. 1997. Relating Consumer, Descriptive, and Laboratory Data to Better Understand Consumer Responses. Manual 30, A. Muiioz, ed. ASTM, West Conshohocken, Penn. ASTM. 2002. Standard E1207-87 (2002). Sensory Evaluation of Axillary Deodorancy. ASTM, West Conshohocken, Penn. ASTM. 2002. Standard E1490-92 (2002). Descriptive Skinfeel Analysis of Creams and Lotions. ASTM, West Conshohocken, Penn. BAXTER, I.A., JACK, F.R. and SCHRODER, M.J.A. 1998. The use of repertory grid method to elicit perceptual data from primary school children. Food Quality and Preference 9(1-2), 73. BECH, A.C., HANSEN, H. and WEINBERG, L. 1997. Application of House of Quality in translation of consumer needs into sensory attributes measurable by descriptive sensory science. Food Quality and Preference 8(5-6), 329-348. BEST, D.J. and RAYNER, J.C.W. 2001a. Application of the Stuart test to sensory evaluation data. Food Quality and Preference ZZ(5-7). 353-357.
THE ROLE OF SENSORY SCIENCE IN THE COMING DECADE
23
BEST, D.J. and RAYNER, J.C.W. 2001b. Nonparametric analysis for trinary sensory data. J. Sensory Studies 16, 249-261. BI, J., TEMPLETON-JANIK, J.M., ENNIS, J.M. and ENNIS, D.M. 2000. Replicated difference and preference tests: how to account for inter-trial variation. Food Quality and Preference 11(4), 269-273. BRADDON, S., JARRE’IT, G.S. and MUfiOZ A.M. 2002.Consumer Testing Methods. In: Skin Moisturization. J. Leyden and A. Rawlings, eds. Marcel Dekker, New York. BRAPIDT, M.A., SKINNER, E.A. and COLEMAN, J.A. 1963. Texture profile method. J. Food Sci. 28,404-409. BROCKHOFF, P.B. and SCHLICH, P. 1998. Handling replications in discrimination tests. Food Quality and Preference 9(5), 303-312. CASEY, M.A. and KRUEGER, R.A. 1994. Focus group interviewing. In: Measurement of Food Preferences, H .J .H . MacFie and D .M .H . Thomson, eds. Blackie Academic & Professional, London. CAUL, J.F. 1957. The profile method of flavor analysis. Advances In Food Research 7(1), 1-40. CHAMBERS, D. 2003. ASTM Task Group on Sensory Proficiency. Personal communication. CHAMBERS, E. and SMITH, E.A. 1991.The use of qualitative research in product research and development. In: Sensory Science Theory and Applications in Foods, H.T. Lawless and B.P. Klein, eds. Marcel Dekker, New York. CLIFF, M.A., O’MAHONY, M., FUKUMOTO, L. and KING, M.C. 2000. Development of a “bipolar” R index. J. Sensory Studies 15(2), 219-229. COURCOUX, P. and SEMENOU, M. 1997.Preference data analysis using a paired comparison model. Food Quality and Preference 8(5-6),353-358. CURT, C., NOGUEIRA, H., TINET, C., HOSSENLOPP, J. and TRYSTRAM, G. 2000.Gustamat: sensory quality control of cured sausage. Innovations in Food Technology 11, 6-52. DEPPE, C., CARFTENTER, R. and JONES, B. 2001. Nested incomplete block designs in sensory testing: Construction strategies 2001.Food Quality and Preference 12(5-7), 281-290. DUDEK, L. and CHILDRENS, S. 1999.Using graphics facilitation tools to tap in wisdom of consumers and customers. ASTM symposium: Translating Consumer Needs into a Strategic Business Plan. Deerfield Beach, Florida. ELMORE, J.R., HEYMANN, H., JOHNSON, J. and HEWETT, J.E. 1999. Preference mapping: Relating acceptance of “creaminess” to a descriptive sensory map of a semi-solid. Food Quality and Preference 10(6), 465-476. ENNIS, D.M. 1993.The power of sensory discrimination methods. J. Sensory Studies 8, 353-370.
24
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
ENNIS, D.M. 1998. Thurstonian scaling for difference tests. IFPress@1(3), 2-3. ENNIS, D.M. and BI, J. 1998. The Beta-Binomial model: Accounting for intertrial variation in replicated difference and preference tests. J. Sensory Studies 13, 389-412. FINDLAY, C. 2002. Computers and the internet in sensory quality control. Food Quality and Preference 13(6), 423-428. GIBOREAU, A., NAVARRO, S., FAYE, P. and DUMORTIER, J. 2001. Sensory evaluation of automotive fabrics: The contribution of categorization tasks and non verbal information to set up a descriptive method of tactile properties. Food Quality and Preference 12(5-7), 31 1-322. GRIFFITHS, P. and KULKE, T. 2002. Clothing movement - Visual sensory evaluation and its correlation to fabric properties. J. Sensory Studies 17, 229-255. HINREINER, E.H. 1956. Organoleptic evaluation by industry panels - the cutting bee. Food Technol. 31(1 l), 62-67. KING, C., HALL, J. and CLIFF, M.A. 2001. A comparison of methods for evaluating the performance of a trained sensory panel. J. Sensory Studies 16, 567-581. KOLKEBECK, N.E. 2002. Internet approaches at J . Reckner Associates (JRA). JRA Press, Montgomery, Penn. KRUEGER, R.A. 1994. Focus Groups: A Practical Guide for Applied Research, 2nd Ed. Sage Publications, Newbury Park, Cal. KUNERT, J. 1998. Sensory experiments as crossover studies. Food Quality and Preference 9(4), 243-254. LYON, D.H. 2001. International Guidelines for Proficiency Testing in Sensory Analysis, Guideline No. 35. CCFRA, Chipping Campden, GL55 6LD. United Kingdom. MALUNDO, T.M.M., SHEWFELT, R.L., WARE, G.O. and BALDWIN, E.A. 2001. An alternative method for relating consumer and descriptive data used to identify critical flavor properties of mango (Mungiferu indicu 15.).J. Sensory Studies 16, 199-214. MEILGAARD, M., CIVILLE, G.C. and CARR, B.T. 1999. Sensory Evaluation Techniques, 3rd Ed. CRC Press, Boca Raton, Florida. McEWAN, J.A. 1996. Preference Mapping For Product Optimization. In: Multivariate Analysis of Data in Sensory Science (T. Naes and E. Risvik, eds.) Elsevier Applied Science, New York. McEWAN, J.A. 1999. Comparison of sensory panels: A ring trial. J. Sensory Studies 14, 161. McEWAN, J.A., HUNTER, E.A., VAN GERMERT, L.J. and LEA, P. 2002a. Proficiency testing for sensory profile panels: measuring panel performance. Food Quality and Preference 13, 181-190.
THE ROLE OF SENSORY SCIENCE IN THE COMING DECADE
25
McEWAN, J.A., HEINIO, R., HUNTER, E.A. and LEA, P. 2002b. Proficiency testing for sensory ranking panels: measuring panel performance. Food Quality and Preference 14(3), 247-256. MOSKOWITZ, H.R. 1983. Product Testing and Sensory Evaluation of Food: Marketing and R&D Approaches. Food and Nutrition Press, Trumbull, Conn. MUAOZ, A.M. and BLEIBAUM, R.N. 2001. Fundamental Descriptive Analysis Techniques. Exploration of their Origins, Differences and Controversies. Workshop presented at “2001: A Sense Odyssey”, 4th Pangborn Sensory Science Symposium, Dijon, France. MUAOZ, A.M. and CIVILLE, G.V. 1992. Spectrum Descriptive Analysis Method. In: ASTM Manual Series MNL 13. Manual on Descriptive Analysis Testing. ASTM, West Conshohocken. Penn. MUROZ, A.M. and EVERITT, M. 2003. Non-traditonal consumer research methods. Workshop presented at the “2003: A Sensory Revolution”, 5th Pangborn Sensory Science Symposium, Boston. O’MAHONY, M. 1979. Short cut signal detection measures for sensory science. J. Food Science 44( l), 302-303. PANGBORN, R.M. 1964. Sensory evaluation of foods: A look backward and forward. Food Technol. 18(9), 63-67. PERRY, B. 1998. Seeing your customers in a whole new life. J. Quality and Participation 21(6), 38-43. PERYAM, D.R. and PILGRIM, F.J. 1957. Hedonic scale method of measuring food preferences. Food Technol. 11, 9-14. POUPLARD, N., QANNARI, E.M. and SIMON, S. 1997. Use of Ridits to analyze categorical data in preference studies. Food Quality and Preference 8(5-6), 419-422. QANNARI, E.M. and MEYNERS, M. 2001. Identifying assessor differences in weighting the underlying sensory dimensions. J. Sensory Studies 16, 505-516. RESURRECCION, A.V.A. 1998. Consumer Sensory Testing for Product Development. Aspen Publishers, Gaithersburg, MD. ROUSSEAU, B. 2001. The beta-strategy: An alternative and powerful cognitive strategy when performing sensory discrimination tests. J. Sensory Studies 16, 301-319. ROUSSET, S. and MARTIN, J.F. 2001. An effective hedonic analysis tool: Weakhtrong points. J. Sensory Studies 16(6), 643-661. ROUSSEAU, B. and O’MAHONY, M. 2001. Investigation of the dual-pair method as a possible alternative to the triangle and same-different tests. J. Sensory Studies 16, 161-178. SCHLICH, P. 1994. GRAPES: A method and a SAS program for graphical representations of assessor performance. J. Sensory Studies 9, 157.
26
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
STAMPANONI, C.R. 1994. The use of standardized flavor languages and quantitative flavor profiling technique for flavored dairy products. J. Sensory Studies 9, 383-400. STEPHERD, R., FARLEIGH, C.A. and WHARFT, S.G. 1991. Effect of quality consumed on measures of liking for alt concentrations in soup. J. Sensory Studies 6, 227-238. STONE, H., SIDEL, J., OLIVER, S., WOOLSEY, A. and SINGLETON, R. C. 1974. Sensory evaluation by quantitative descriptive analysis. Food Technol. 28(11), 24, 26, 29, 32, 34. STONE, H. and SIDEL, J.L. 1993. Sensory Evaluation Practices. Academic Press, New York. STONE, H. and SIDEL, J.L. 1995. Strategic applications for sensory evaluation in a global market. Food Technol. 49(2), 80-88. STUCKY, G. 2001. The Future of Human-Centric Technology to Help Us Do More by Doing Less. Workshop 9: Instrumentation for Sensory Data Collection for the New Millennium. Workshop presented at “2001: A Sense Odyssey”, 4Ih Pangborn Sensory Science Symposium, Dijon, France. SWAN, J., MCINNIS, B.C. and TRAWICK, F. 1996. Ethnography as a method for broadening sales force research: Promise and potential. J. Personal Selling and Sales Management 16(2), 57-64. TANG, C., HEYMANN, H. and HSIEH, F.H. 2000. Alternatives to data averaging of consumer preference data. Food Quality and Preference 11(12), 99-104. URBICK, B. 2003. Empathography - Partnering with your consumer. In: Nontraditonal consumer research methods. Workshop presented at the “2003: A Sensory Revolution”, 5th Pangborn Sensory Science Symposium, Boston. VIGNEAU, E., QANNARI, E.M., PUNTER, P.H. and KNOOPS, P. 2001. Segmentation of a panel of consumers using clustering of variables around latent directions of preference. Food Quality and Preference I2(5-7), 359-363. WAKELING, I.N., HASTED, A. and BUCK, D. 2001. Cyclic presentation order designs for consumer research. Food Quality and Preference 12(1), 39-46. WILKINSON, C. and YUKSEL, D. 1997. Using artificial networks to develop prediction models for sensory-instrumentalrelationships: an overview. Food Quality and Preference 8(5-6). 439. WOODLAND, C.L. 2003. Ethnographic and observational research. In: Nontraditonal consumer research methods. Workshop presented at the “2003: A Sensory Revolution”, 5th Pangborn Sensory Science Symposium, Boston.
THE ROLE OF SENSORY SCIENCE IN THE COMING DECADE
27
MAXIM0 C. GACULA, JR. Before we can provide a prediction of the role of Sensory Science or sensory evaluation in the next decade, we must first establish that Sensory Science is accepted by academia as a scientific discipline, supported by fundamental theories and used in practice with acceptable results. Is Sensory Science a mature science on which we can rely? What is meant by mature? Webster’s definition states “to become fully developed or ripe; having completed natural growth and development” as one of the definitions. Are the methods and principles we use today tested and verified through the so-called scientific method? If it is, then Sensory Science is mature. Fishken (1987) stated that our field has matured to the level where it is known to exist by many and is viewed with both warm acceptance and skepticism. It is hoped that in 10 years hence Sensory Science will be fully matured and accepted without skepticism as a scientific discipline. However, it is not the end when a science is mature. I consider the science of sensory as a continual process of learning and implementation. It is in the implementation process that Sensory Science gains respect, recognition, and acceptance by academia, consumer product industries and its corporate environments. Dr. Moskowitz and Ms. Muiioz have discussed the trends, needs, globalization of Sensory Science, among others and most importantly the relation between sensory evaluation and marketing research which has been a “traditional separation, ” but the nature of such separation has changed in the last five years or so. The viewpoints expounded by Moskowitz and Muiioz are relevant and substantive for sensory practice to be fruitful and should be addressed without political influence from the department umbrella, as Moskowitz indicated. Sensory and marketing research grew out of a history of competition, not cooperation. From my perspective, this is changing and more so in the next decade. “Sensory professional,” “sensory analyst,” “sensory scientist” - the naming has not been fully defined and established. All these titles are appropriate and the choice should be based on context. Sensory scientist or sensory researcher title appears appropriate for scientific meetings, resumes, reference to specific areas of sensory, etc. In research one is always dealing with the scientific method. Perhaps in social conversation and when referring to sensory as a whole, sensory professional would be a logical title. Moskowitz’s and Mufioz’s viewpoints on sensory education are well-taken and will be seen in the next decade. It will be a giant step if we can use the consumer to rate sensory intensity as stated by Muiioz and Moskowitz. We have to be assured that the consumer has a uniform understanding of the attribute to be evaluated and secondly, the problem by a consumer of using hedonic
28
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
judgment instead of intensity. However, research in the psychology of acceptability indicates that this might not be true (Booth ef al. 1983; Booth and Conner 1990). This is one situation where sensory evaluation and marketing research can collaborate to find a solution to the problem or put the question to rest. Ms. Muiioz’s viewpoints on sensory globalization are indeed timely and is happening now and will continue. Most of the global sensory studies are in descriptive analysis (Rajalakshmi ef al. 1987; Byme ef al. 1999; Rohm ef al. 1994), which in fact uses the descriptive methods developed in the USA. Examples of global consumer studies are those reported by Laing ef al. (1994) and Dransfield ef af. (2000). On accreditation, I feel that laboratory accreditation would have a future and perhaps can be realized in the next two decades. However, based on my observations with the American Statistical Association, I cannot foresee that proficiency accreditation will happen for the sensory professional in the near term. Ms. Muiloz’s viewpoint on the use of Internet in sensory evaluation and product testing will undoubtedly be the case in a few years, if not now, and particularly within a company on an intranet basis, using company employees. There are companies that today use the Internet on a regular basis to recruit panelists. On the topic of advanced statistics, the use of methods that handle serious multicollinearity in the data will increase; it is known that sensory data are highly correlated that often gives misleading results. A statistical method frequently used in Sensory Science is stepwise regression; in some results, the variables selected do not make sense in relation to the experimental variables because of multicollinearity. Because of correlation, it does not necessarily mean that variables not selected are not important. Many books and published papers have discussed the use of principal component regression to solve the problem of multicollinearity (Draper and Smith 1981; Chatterjee and Price 1991;Jackson 1991; Popper ef al. 1997). Others have approached the problem by the use of ridge regression (Neter ef al. 1996; Houshmand and Javaheri 1998). Sensory scientists in Scandinavian and European countries have used extensively the partial least-squares regression in modeling sensory responses. Another advanced method is preference mapping which is gaining popularity in sensory testing work (Greenhoff and MacFie 1994; Moskowitz 2002; Santa Cruz ef al. 2002). In the next decade, we will see increased use of these advanced regression techniques for sensory prediction work. The use of multivariate methods for hypothesis testing has not been implemented in Sensory Science because of its complexity. However, in studying the relationships of independent variables (products, treatments, etc.) and dependent variables (sensory attributes, etc.), the multivariate methods have
THE ROLE OF SENSORY SCIENCE IN THE COMING DECADE
29
been fully implemented in Sensory Science, such as factor analysis, principal component analysis, among others. Sensory professionals are aware of these applications and the edited book by Naes and Risvik (1996) is a recommended reading material. Because of multicollinearity of sensory data, technically speaking, the testing of treatment differences must be done using a multivariate test. Due to its complexity in both the statistics and software availability the univariate test method is used instead, by analyzing the sensory attribute separately. That is, if we have 15 attributes there will be 15 univariate analyses of variance. The univariate test method ignores the intercorrelations existing between sensory attributes. However, if the intercorrelations are negligible (not statistically significant), the univariate test will be acceptable. If it is not negligible, but the variance-covariance matrix is uniform (correlations among the variables are similar) the univariate test would be acceptable which is the common assumption used today in hypothesis testing of sensory and consumer data. To further advance the role of Sensory Science in the next decade, education is needed and extensive research in data analysis using actual sensory and consumer data to study the characteristics of the variance-covariance matrix in relation to hypothesis testing.
REFERENCES BOOTH, D. and CONNER, M. 1990. Characterisation and measurement of influences on food acceptability by analysis of choice differences: theory and practice. Food Quality and Preference 2, 275-285. BOOTH, D., THOMPSON, A. and SHAHEDIAN, B. 1983. A robust, brief measure of an individual’s most preferred level of salt in an ordinary foodstuff. Appetite 4, 301-3 12. BYRNE, D.V., BAK, L.S., BREDIE, W.L.P, BERTELSEN, G. and MARTENS, M. 1999. Development of a sensory vocabulary for warmed-over-flavor: Part I: In porcine meat. J . Sensory Studies 14, 47-65. CHATTERJEE, S. and PRICE, B. 1991. Regression Analysis by Example. John Wiley & Sons, New York. DRANSFIELD, E. ef al. 2000. Home placement testing of lamb conducted in six countries. J. Sensory Studies 15, 421-436. DRAPER, N. and SMITH, H. 1981. Applied Regression Analysis. John Wiley & Sons, New York. FISHKEN, D. 1987. Commercial sensory research - on achieving maturation. J. Sensory Studies 2, 69-73.
30
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
GREENHOFF, K. and MACFIE, H. 1994. Preference mapping in practice. In: Measurement of Food Preferences, Chapter 6, MacFie and Thomson, eds. Blackie Academic & Professional, London. HOUSHMAND, A. and JAVAHERI, A. 1998. Multivariate ridge residual charts. Quality Engineering 10, 617-624. JACKSON, E. 1991. A User's Guide to Principal Components. John Wiley & Sons, New York. LAING, D.G., PRESCOTT, J., BELL, G.A., GILLMORE, R., ALLEN, S. and BEST, D.J. 1994. Responses of Japanese and Australians to sweetness in the context of different foods. J. Sensory Studies 9, 131-155. MOSKOWITZ, H.R. 2002. Mapping in product testing and sensory science: A well lit path or a dark statistical labyrinth? J. Sensory Studies 17, 207-213. NAES, T. and RISVIK, E., eds. 1996. Multivariate Analysis of Data in Sensory Science. Elsevier, Amsterdam. NETER, J., KUTNER, M., NACHTSHEIM, C. and WASSERMAN, W. 1996. Applied Linear Statistical Models. Irwin, Chicago. POPPER, R., HEYMANN, H. and ROSSI, F. 1997. Three multivariate approaches to relating consumer to descriptive data. In:Relating Consumer, Descriptive, and Laboratory Data, Chapter 5 , A. Muiioz, ed. ASTM Manual 30, West Conshohocken, Penn. RAJALAKSHMI, D., DHANARAJ, S.,CHAND, N. and GOVINDARAJAN, V.S. 1987. Descriptive quality analysis of mutton. J. Sensory Studies 2, 93-1 18.
ROHM, H., KOVAC, A. and KNEIFEL, W. 1994. Effects of starter cultures on sensory properties of set-style yoghurt determined by quantitative descriptive analysis. J. Sensory Studies 9, 171-186. SANTA CRUZ, M.J., MARTINEZ' M.C. and HOUGH, G. 2002. Descriptive analysis, consumer clusters and preference mapping of commercial mayonnaise in Argentina. J. Sensory Studies 17, 309-326.
CHAPTER 2 INTERNATIONAL SENSORY SCIENCE HOWARD R. MOSKOWITZ With the growing interest in globalization, sensory scientists have been forced to turn their attention from local issues to international issues. The operational issues involve test design and execution, data analysis, how to report results, and how to implement results at the local level.
Lack of Sophistication - Its Impact on Test Design Owing to the fact that Sensory Science began in earnest in the United States, many novice practitioners are not aware that around the world and especially in developing countries the level of knowledge about Sensory Science is fairly low. Happily, however, with increasing education in Sensory Science worldwide and with the development of sensory networks on the Internet the low level of knowledge is rising, especially among the younger practitioners and the up-and-coming students. A great deal of the problem is due to the lack of an infrastructure for Sensory Science. There are few laboratories, tests must be done out in the field by untrained interviewers, and the quality of data is often lower than one desires. Test execution issues, such as rotation, re-screening of panelists, additional information about the panelists, etc., are often not up to the standards that have been accepted as de rigueur by the more sophisticated practitioners. Despite the lack of sophistication, it is not altogether clear that these data provided by untrained or less well-trained practitioners are as bad as a critic might think. Indeed there appears to be little analysis in the literature of the degree to which sophistication in sensory methods generates higher quality data. One might speculate about the emotional and invidious causes of this criticism, but that would be an unproductive exercise and a fruitless conjecture. The cooperative, multi-laboratory studies run for different products by different research groups show a great deal of variability across laboratories (Profisens 1999) in countries that have well-established sensory research groups. Such variability is accepted by practitioners as reflecting simply the “way things are across panelists and laboratories. ” Yet, such laboratory-to-laboratory variability might be roundly criticized were the data to come from less trained practitioners in developing countries. The point of this argument is that there is no clear correlation between the traininghophistication of the researcher, the up-to-date aspects of the laboratory, and the quality of the data that emerge from the experiment. With the exception of clear flaws in design such as lack of rotation 31
32
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
of products, poor screening or poor product preparation, it appears that sophistication in laboratory procedures may not add much to the data themselves. We encounter echoes of this sophistication/quality debate when we deal with the physical layout of the sensory test facility.
Scale Differences and Issues in International Research Beyond methodology, one of the key areas is whether consumers in different parts of the world react similarly to foods. At the very simplest level this could deal with the use of scales - e.g., do panelists in different countries use the same scales in similar ways? Researchers working across cultures know that the same scale number may mean different things. This cultural difference can become problematic, especially if the research objective is to identify the degree to which panelists in the different countries like or dislike the same food. Panelists living in a country or culture which values politeness and social calmness above all tend to assign high numbers to product acceptability (e.g., Mexico, Philippines). The ratings may yield false positives - viz., that a product is acceptable, when in reality the product is not acceptable. Panelists in a country or culture that tends to assign low numbers to the liking of products (e.g., Sweden, Holland) may yield false negatives - viz., that a product is not acceptable, when in reality the product is acceptable. Practitioners attempt to eliminate this intercultural difference by a variety of stratagems, of greater or lesser ingenuity. One approach eliminates scaling altogether, and relies on paired comparison tests, which conveys little information because both products may be unacceptable and one would never know. Another stratagem uses rules of thumb (e.g., a transformation of the ratings from panelists of various cultures into a common scale, with the nature of the transform a function of the particular culture). Such pust-hoc efforts to adjust the raw data are acceptable, and in the opinion of this author better than the ill-fated and perhaps confusing attempts to use different scales and attributes in a country, depending upon one’s perception of the evaluative abilities of panelists in that particular country. Parenthetically, none of these issues stands by itself alone in splendid isolation - we will encounter the same issues in dealing with adults versus children.
Do Attributes and Scales Translate Across Languages, or Across People And if Not, Then Is There Any Hope?
-
Consumers in different cultures have different languages. Beyond the language differences, however, may lurk more profound differences in the way that the consumers look at the world. A half century ago the linguist B.L. Whorf (1956) described the ability of the Eskimo to differentiate attributes of a culturally relevant stimulus - snow. Whereas the typical American perceives only few gradations in snow, the Eskimo perceives many more. Over the
INTERNATIONAL SENSORY SCIENCE
33
Eskimo’s experience, the gradations in snow have meaning for the daily life, whereas for the typical American gradations in the quality of snow mean relatively little. Cross-cultural studies of descriptive analysis show differences among people in their ability to describe the sensory characteristics of food. A study by the European Sensory Network on coffees shows differences in the descriptive profiles assigned by panelists in the various countries to the same coffee (1996). Indeed, person-to-person differences in the descriptive profiles of the same product has promoted the use of statistical methods to capture these individual differences (e.g., procrustes analysis; Williams et al. 1988). Given this massive variation across people and across experience levels, how then can sensory scientists hope to deal with the cross-cultural differences? The problem is both practical and philosophical. In many ways the problem is similar to that faced by experimental psychologists early in the century who did not believe that the human judge could really scale sensory magnitude. These psychologists let the subject match the test stimulus to one of a set of reference stimuli, varying in physical concentration. All statements about sensory magnitude were then expressed in terms of matching physical concentrations. Sensory magnitudes were not averaged across judges - rather the concentrations that matched a given stimulus (obtained across the panel) were averaged to generate a single matching number reflecting the consensus. Psychologists have names for the rules developed in view of these differences - idiographic (rules developed for an individual) versus nomothetic (rules developed for a population, where all members are assumed to behave reasonably similarly). Does this averaging process work? Even across people there are those who do not (or at least did not) believe in averaging, but accepted the different sensory worlds in which we live. In the 1940s, developers of the Arthur D. Little Inc. Flavor Profile did not permit the averaging of sensory magnitudes and the development of a sensory quality profile representing the mean across people (Caul 1957). Rather, they instituted a discussion of the sensory profile among the different panelists, and accepted only a final consensus profile after the panelists resolved their differences. We see from the early approaches to flavor description an emerging recognition that we may live in different sensory worlds. The solution involves discussion and consensus, a more fuzzy approach than averaging differences, either across people or across cultures. Research today is far more quantitative than ever before. One of the inevitable results of quantification is the desire for summary statistics, especially averages. There is an increasing demand to develop a statistical summary of the attribute profile. This demand, when not properly channeled, can create averages of the same attribute across different cultures where the individuals in the different cultures mean entirely different things by the attribute. The solution to
34
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
the international problem is not easy. It may reside in one or more of the following three activities: (1) Describe all stimuli in terms of reference standards, following the approach espoused by Schutz (1964) for odor description. (2) Ignore the individual differences, and simply average the same terms across countries. (3) Develop a system to translate sensory profiles from one country to a reference sensory profile, e.g., by the method of “reverse engineering,” so that each country’s sensory profile is mapped to a common set of terms. This third solution enables panelists in different countries to use entirely different terms, but the profiles then must be mapped to a common profile, whose numbers can be averaged across panelists. This third solution is difficult, but may eventually be surmounted given the advances in modeling, optimization, and reverse engineering (Moskowitz 1999). Sensory Preference Segmentation and International Sensory Research Individual differences abound in Sensory Science. These differences were pointed out during the first third of the twentieth century by early researchers in taste (Engel 1928), reaffirmed by psychophysicists in the middle part of the century (Ekman and Akesson 1964; Pangborn 1970), and continue to appear today. Up to 20 years ago these individual differences were treated as a secondorder irritant, and perhaps as something that could be resolved by statistics behind the scenes. These statistics were averages to identify the mean, or central tendency, and hypothesis testing to determine whether or not the variation among products exceeded the random variation within a product, e.g., through analysis of variance. As sensory scientists become increasingly adept at multicountry research it should come as no surprise that they continue to find these individual differences to recur. This time, however, the individual differences were more important, and not simply disturbing second order effects that represented noise in an otherwise noise-free picture of the world. The individual differences were signals from nature that something was going on, but what was that underlying “something” which manifested itself in the annoying variability across people? During the past 20 years or so, the concept of preference segmentation has emerged as a more fruitful way by which to understand individual differences and the tremendous variation among cultures. Preference segmentation assumes that there exists in the population fundamental groups of consumers showing similar sensory preferences. There are a limited number of these groups. These preference segments can be likened to fundamental colors (red, blue, yellow), from whose mixture all colors emerge. In a similar fashion, these preference
INTERNATIONAL SENSORY SCIENCE
35
segments exist in all countries. Differences across country in preference patterns are assumed to relate to the differential proportion of these segments. That is, in one country Segment A may predominate, whereas in another country Segment B may predominate, and so forth. The key differences today versus 30 years ago are the appreciation of these individual differences and the practical next steps taken in light of these differences. The current applications revolve around the creation of new products to appeal to these segments. For example, if the manufacturer knows that there exist two key segments in a juice product, then it makes more sense for the manufacturer to create products targeted to these two segments rather than attempting to create a juice product that appeals only modestly to both segments simultaneously. Separate, targeted development will create two high scoring products that have a greater chance for market success. Whether or not these fundamental segments really exist or whether they simply represent a convenient analytical tool by which to summarize individual differences remains for further work. There are clear preference segments for coffee (Moskowitz 1985), for juice (Moskowitz and Krieger 1998), for meat (Moskowitz and Barash 2000). for fragrance (Moskowitz 1986), etc. The segments emerge from simple algorithms presented previously (Moskowitz el al. 1985). The segmentation method generates curves showing different dynamics of the relation between sensory attribute level and liking. Figure 2.1 shows how the amount of particulates drives liking in juice for various countries (left panel), and for two sensory segments that span these countries (right panel). The segmentation method has been also used for concept work as well, and shows that there exist basic preference segments for communication. Interestingly, the sensory preference segments based upon the sensory characteristics of products do not necessarily match the concept communication segments based upon responses to different concepts (Moskowitz 1996).
Research And Business Opportunities Afforded By Global Sensory Segmentation At first blush the existence of global sensory segments appears to be an interesting research topic, and relevant for quite a number of specific product categories. What may be more interesting, however, is the possibility for knowledge and business building that the segmentation affords. From the viewpoint of knowledge development, there are studies of specific product categories showing the existence of such sensory segments in different products. The published scientific literature neither confirms nor denies the possibility that there may be a limited number of such sensory preference segments, transcending product categories. That is, if there are high-impact and low-impact seekers in a product (e.g., in pasta sauce), then do these sensory preference segments
36
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
also emerge for another product category (e.g., carbonated beverage)? Furthermore, do individuals falling into the high-impact segment for pasta sauce also fall into the high-impact segment for carbonated beverages, or does a person's segment membership vary by product category? To the degree that the researcher can identify regularities in a single individual's membership in sensory segments across different product categories, the researcher will have uncovered a new way to classify individuals that has both scientific and business import.
...
-
. . .....
m
.
.
.
, "
. .
. 1
.
. .
. .
1
, .
. .
, (
,
,
1
Amount of Particuiates FIG. 2.1. HOW THE AMOUNT OF PARTICULATES (PIECES) IN JUICE DRIVE OVERALL LIKING The left panel shows the relation for three European countries. The right panel shows the relation for two sensory segments - high impact (H) and low impact (L). [Data from Moskowitz and Krieger 19981.
REFERENCES CAUL, J.F. 1957. The profile method of flavor analysis, Advances In Food Research, 1-40. EKMAN, G. and AKESSON, C.A. 1964. Saltiness, Sweetness and Preference. A Study of Quantitative Relations in Individual Subjects. Report 177, Psychology Laboratories, University of Stockholm. ENGEL, R. 1928. Experimentelle Untersuchungen Uber die Abhangigheit der Lust und Unlust Von der Reizstarke beim Geschmacksinn. Pfluegers Archiv fur die Gesamte Physiologie 64, 1-36.
INTERNATIONAL SENSORY SCIENCE
37
EUROPEAN SENSORY NETWORK, 1996. A European Sensory and Consumer Study: A Case Study on Coffee. Published by the European Sensory Network. Available from Campden and Chorleywood Food Research Association, Chipping Campden, England. MOSKOWITZ, H.R. 1985. New Directions In Product Testing And Sensory Evaluation Of Foods. Food and Nutrition Press, Trumbull, Conn. MOSKOWITZ, H.R. 1986. Sensory segmentation of fragrance preferences. J. The Society Of Cosmetic Chemistry 37, 233-247. MOSKOWITZ, H.R. 1996. Segmenting consumers world-wide: An application of multiple media conjoint methods. Proceedings of the 49th ESOMAR Congress, Istanbul, 535-552. MOSKOWITZ, H.R. 1999. Inter-relating Data Sets for Product Development: The Reverse Engineering Approach. Food Quality and Preference 11, 105-1 19. MOSKOWITZ, H.R. and BARASH, J. 2000. Sensory Segmentation - Using All Attributes Vs Using Specific Sensory Attributes: A Methodological Note. Unpublished Manuscript. MOSKOWITZ, H.R., JACOBS, B.E. and LAZAR, N. 1985. Product response segmentation and the analysis of individual differences in liking. J. Food Quality 8, 168-191. MOSKOWITZ, H.R. and KRIEGER, B. 1998. International Product Optimization: A Case History. Food Quality and Preference 9, 443-454. PANGBORN, R.M. 1970. Individual variations in affective responses to taste stimuli. Psychonomic Science 21, 125- 128. PROFISENS, 1999. Profisens Newsletter #2. Campden and Chorleywood Food Research Association, Chipping Camden, England. SCHUTZ, H.G. 1964. A Matching Standards Method for Characterizing Odor Quality. Annals of the New York Academy of Sciences, 116, 517-526. WHORF, B. L. 1956. Language, Thought and Reality, J.B. Carroll, ed. M.I.T. Press, Cambridge, Mass. WILLIAMS, A.A., ROGERS, C.A. and COLLINS, A.J. 1988. Relating chemical/physical and sensory data in food acceptance studies. Food Quality and Preference 1. 25-3 1.
ALEJANDRA M. m O Z Sensory professionals, who in the past worked locally and developed themselves and their groups only in their country, now have an opportunity to be involved in the international developments of the field and their companies. As a result, there has been a greater involvement of Sensory Science in the international arena in the last decade (Stone and Side1 1995; Karahadian 1995),
38
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
because of five trends: (1) transnational collaboration: the need to assist international affiliates in the development of sensory programs, the development and application of new methods, the completion of training programs and the execution of local internal consumer projects, respectively; (2) execution of Cross Cultural Consumer Studies: to address the need to develop and apply product evaluation techniques globally in order to respond to the industry’s needs for testing products abroad (Wimberley 1991); (3) the need to understand global consumer responses, their differences and similarities (Rozin 1982); (4) the need to achieve the harmonization of methods and techniques globally; and the extensive collaboration taking place among sensory professionals around the world. A brief review of these trends follows. Transnational Collaboration Transnational companies have always been involved in cross-country collaborations. However, this collaboration has increased in the past years because of industry’s drive to develop existing categories in international markets, to introduce their products in these different markets, etc. The interaction among professionals in transnational companies occurs at two levels: the development, testing, and marketing of the company’s product internationally, and the improvement and/or establishment of technology and methodology at international locations/affiliates. The sensory professional working for such companies becomes involved at both levels. The sensory professional participates in the development, testing, and marketing of the company’s product internationally by his/her involvement in the design and execution of cross cultural consumer tests. In these projects, the sensory professional works either with the local affiliate or with vendors as described above. Sensory professionals also have an important role in the improvement/ development of programs at the company’s international locations/affiliates. Foreign sensory colleagues may not have courses or consultants in their country to help develop their technology and methodology. The sensory professionals from developed countries may be asked to assist in the development/implementation of a sensory program in other countries. They then have the opportunity to set up full sensory programs, train staff and establish components of a sensory program (descriptive, consumer, etc.) which will provide sensory evaluation support in that country. In addition, sensory professionals in developed countries are often regarded as consultants by company affiliates and are used to assist in specific training, testing and sensory consultation conducted abroad.
INTERNATIONAL SENSORY SCIENCE
39
Cross Cultural Sensory Consumer Studies As the companies for whom we work expand their business and interests into other countries (Wimberley 1991), sensory professionals have to be involved in the execution of cross cultural consumer studies (McEwan 1998; Janto ef al. 1998). This activity involves the close collaboration with colleagues and professionals from these countries, who are unfamiliar with those countries and cultures. To complete these cross cultural consumer studies, sensory professionals may follow one of three strategies:
Complete the study (studies) themselves, and work with the foreign company’s affiliate in the project. The sensory professional usually takes care of the test design, whereas the local affiliate most often takes care of the execution of the test, and the adaptation of the test to local conditions. This is the best form of collaboration, since the project is conducted inhouse, involves a company’s resources, allows exchange of ideas and learning, and promotes the effective execution of the testlproject. Complete the study (studies) themselves and work with a local vendor in the project. The sensory professional has to take care of most of the details of the test and work closely with the vendor to assure the proper execution of the test. Unless there has been a previous collaboration, the sensory professional needs to be involved in all details of the test, including sample shipment and tracking, sample acquisition and screening, shipment/ acquisition of materials, translation and pilot testing of the questionnaire, progress of consumer recruitment, pilot test, data collection, review of data, data analysis, etc. It behooves the sensory professional, (particularly if there has not been previous collaboration with that vendor) to travel to the test site in order to oversee the test execution. When testing is conducted with a new vendor the first time, the sensory professional has to invest a considerable amount of his/her time in the study. Once a relationship has been established with a vendor and there is confidence in the quality of the work, ensuing projects can be completed without the close involvement of the sensory professional. Hire an international consulting firm or vendor that can be involved in the details of the test execution without the close involvement of the sensory professional. This option works well when the sensory professional does not have extensive experience in cross cultural consumer studies, does not have the resources to invest in the administration or monitoring of the test, or when more than one country/culture is tested. The consulting firm or
40
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
international vendor should be experienced in cross cultural consumer research and be able to locate or have collaboration with local vendor(s) in the countries of interest. Six of the challenges sensory professionals face in cross cultural consumer studies are listed below: (1) The design of a test for another culture unfamiliar to the researcher. This involves the knowledge of cultural and etiquette differences, as well as customs and cultural nuances (Rohner 1984). Since each culture is unique, the administration of such tests may involve the potential modification of methods and protocols used in our countries to adapt them to the different cultures (Whiting 1968).
(2) The design of a questionnaire used for another culture, which involves the use of the appropriate attributes (translation or selection of the appropriate terms), scales and directions. This is the most critical aspect of the design of the cross cultural consumer test, since it determines the quality of the data obtained. It is imperative that the sensory professional work with someone from that country/culture in this effort. Attributes may not have a direct translation in that language. Different attributedwords need to be discussed and assessed to select the most appropriate. In addition, modified scales may have to be used, in order to adapt them to the culture (e.g., shorter scales, fewer anchors, different end and midpoint anchors). It is imperative that the actual questionnaire be translated back to the original language and pilot tested in order to ensure that the questionnaire properly addresses all issues of interest.
(3) The acquisition and screening of local products. The sensory professional needs to screen all products obtained in other countries. This screening can be conducted relatively easily when at the test country. If not in the country, the sensory professional has to request samples to be shipped for screening prior to testing. (4) The production, acquisition and shipping of products to the test country. This activity needs to address formula differences, product uniformity, logistics of shipping, customs, local transportation and product storage. ( 5 ) The recruitment of consumers, which may be complicated in some countries
due to the infrastructure, and the vagaries of local customs.
INTERNATIONAL SENSORY SCIENCE
41
(6) The execution of the test, which involves the use of the specially designed questionnaire, test protocol and logistics. Currently the ASTM El8 task group E18.03.06 is working on a manual addressing methods and issues involved in cross cultural consumer research (ASTM 2003). Details of some of the issues discussed in this section are covered in the manual. In addition, for certain specific cultures, the manual documents general information on that country and culture, local consumer testing approaches, and special issues. Generally speaking, sensory professionals should welcome the opportunity to be involved in cross cultural consumer studies. It provides visibility within the organization, the chance to interact with different functions within the organization (Marketing, Marketing Research, Product Development, Packaging, QC, Legal, etc.), to be involved in technical and business decisions regarding the company’s international involvement, and to interact with international colleagues in the design and execution of the projects. Global Consumer Responses With its increasingly global view, the field of Sensory Science is involved in the study of consumer responses worldwide. This involvement is occurring at two levels. Industry. Transnational companies that develop and market products worldwide are interested in the understanding of the consumer wants and needs across cultures (Vijver and Poortinga 1982; Guendelman and Abrams 1995; Moskowitz and Krieger 1998; Nielsen ef al. 1998). The following questions are asked for a given product category. Are there any common consumer wants and needs across cultures? What are the common consumer likes and dislikes? Do consumers from different cultures use the same terms to describe benefits, wants and needs? What are the terms that differ? Which cultures like the same product/formulae? How does a product/formulae have to be changed to adapt it to local conditions? Transnational companies conduct worldwide studies in order to address these important issues and to create a data-base for the improvement and development of products worldwide. Sensory professionals have an opportunity
42
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
to participate in these studies by offering their expertise in test design and administration, and data analysis.
Research. Sensory professionals are not only interested in understanding cross cultural consumer differences and similarities in products but also in methodology. How should consumer methodology be modified across different cultures (Berry 1979; Whiting 1968; Ye et al. 1998), if at all? Questions in this area include: How should the standard hedonic scales be modified for different cultures? Why? Which diagnostidintensity scales should be used for different cultures? Can diagnostidintensity questions be asked for different cultures? How appropriate is the JAR (just about right) scale for other cultures? How should it be structured? What kind of information is obtained? What is the best way to present stimuli/samples? Are there any unique issues involved for some cultures? Research in this area is needed to fully understand these issues. Answers to these questionshssues are needed and the dissemination of such information is occurring at an increasing rate with the publication of papers that address cross-cultural consumer research studies (Sobal 1998; Prescott 1998; Ye et al. 1998; Moskowitz and Krieger 1998; Nielsen 1998). More interest in this area will be generated as the globalization of methods continues.
Harmonization of Methods Frequently, the affiliates in an international company develop and practice different sensory and consumer testing approaches. The sensorykonsumer science group in one country may be testing products following a very different approach, compared to other groups worldwide. Additionally, local groups tend to support the methods used in their countryhesearch facility, and may be reluctant to consider the benefits of other techniques used by other international company affiliates. When some of the affiliates support and use different methodology, several problems may arise, such as: the inability to compare test results across countries/cultures, added difficulty in the interpretation and use of research or study findings by all international affiliates, friction in the planning and execution of global studies because of the different beliefs and methods used across groups, and
INTERNATIONAL SENSORY SCIENCE
43
friction among international groups in the efforts to harmonize test methods and techniques. Therefore, it is recommended that a common philosophy be selected, and that methods across international affiliates be harmonized. In many cases, this may not be an easy task. Often, groups cannot reach agreements in this area, if completely different approaches and philosophies are followed. Consultants or academicians are often involved in this task to facilitate the dialogue among groups and achieve: the unbiased assessment of the advantages and limitations of the different approaches, the selection of the most suitable methodology for that international company, and the harmonization of methods and techniques across international groups.
Many international corporations are organizing sensory/consumer science international meetings to review the company’s resources, methods and approaches, and to discus topics of interests. Oftentimes this is the ideal setting to review and harmonize approaches.
International Collaboration The interaction, exchange of ideas and collaboration among sensory scientists around the world has been occurring for the past two decades, and will grow as the interest in international research and issues continues. This interaction and collaboration takes place in several areas: the design, execution and publication of sensory (analytical and consumer studies), research studies (Sobal 1998; Prescott 1998; Ye ef al. 1998; Moskowitz and Krieger 1998; Nielsen 1998), the design of workshops and seminars involving sensory professionals from different countries, and the planning of and the attendance to all the international sensory (and/or food and ingredients) seminars, symposia and meetings (Pangborn Sensory Science Symposia, IS0 and ASTM International meetings, Sensometrics meeting, World Congresses of Food Science and Technology, International Congress of Meat Science and Technology, AOAC International Meeting and Exposition, IDF World Dairy Congress). Even though there continues to be local sensory courses and meetings in each country, the main symposia, meetings and seminars in Sensory Science are truly international events. In these events speakers and attendees are represented from all over the world, allowing the learning and exchange of ideas among sensory researchers worldwide. Examples of these international events include the Pangborn Sensory Science Symposium, the Sensometrics meeting, and the
44
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
Iberoamerican Sensory Symposium (SENSIBER organized by RIEPSA - Red Iberomericana de Evaluaci6n de Propiedades Sensoriales de 10s Alimentos). REFERENCES ASTM. 2003. Cross Cultural Consumer Research Manual, A. Muiioz and S. King, eds. In preparation. ASTM, West Conshohocken, Penn. BERRY, J. W. 1979. Research In Multicultural Societies: Implications For Cross-Cultural Methods. J. Cross Cultural Psychology 10(4), 415-434. GUENDELMAN, S. and ABRAMS, B. 1995. Dietary intake among MexicanAmerican women: Generational differences and a comparison with White non-Hispanic women. American Journal of Public Health 83, 20-25. JANTO, M., PIPATSATTAYANUWONG, S., KRUK, M.W., HOU, C. and MCDANIEL, M.R. 1998. Developing noodles from 135 wheat varieties for the Far East market: Sensory perspective. Food Quality and Preference 9(6), 403-412.
KARAHADIAN, C. 1995. Impact of global markets on sensory testing programs. Food Technology 9(2), 77-78. McEWAN, J. 1998. Harmonizing Sensory Evaluation Internationally. Food Technology 52(4), 52-56. MOSKOWITZ, H. and KRIEGER, B. 1998. International product optimization: A case history. Food Quality and Preference 9(6), 443-454. NIELSEN, N.A., BECH-LARSEN, T. and GRUNERT, K.G. 1998. Consumer purchase motives and product perceptions: A laddering study on vegetable oil in three countries. Food Quality and Preferences 9(6), 455-466. PRESCOTT, J. 1998. Comparison of taste perceptions and preferences of Japanese and Australian consumers: overview and implications for crosscultural sensory research. Food Quality and Preference 9(6), 393-402. ROHNER, R.P. 1984. Toward a conception of culture for cross-cultural psychology. J. Cross-cultural Psychology 15(2), 111-138. ROZIN, P. and CINES, B.M. 1982. Ethnic differences in coffee use and attitudes to coffee. Ecology of Food and Nutrition 12, 79-88. SOBAL, J. 1998. Cultural comparison research designs in food, eating and nutrition. Food Quality and Preference 9(6), 385-392. STONE H. and SIDEL J. 1995. Strategic applications for sensory evaluation in a global market. Food Technology 49(2), 80-88. VIJVER, F. and POORTINA, Y.H. 1982. Cross-cultural generalizations and universality. J. Cross-Cultural Psychology 13(4), 387-408. WHITING, J.W.M. 1968. Methods and problems in cross-cultural research. In: Handbook of Social Psychology, G. Lindsey and E. Aronson, eds., Chapter 17. Addison-Wesley, Reading, Mass.
INTERNATIONAL SENSORY SCIENCE
45
WIMBERLEY, D. 1991. Transnational corporate investment and food consumption in the third world: a cross national analysis. Rural Sociology 56(3), 406-43 1. YE, L.L., KIM, K.O., CHOMPREEDA, P.,RIMKEEREE, H . , YAU, N.J.N. and LUNDAHL, D.S. 1998. Comparison in use of the 9-point Hedonic scale between Americans, Chinese, Koreans, and Thai. Food Quality and Preference 9(6), 413-420.
MAXIM0 C. GACULA, JR. In Chap. 1, it was noted that Sensory Science will become global both in theory and applications. The lack of sophistication stated by Moskowitz in that chapter should disappear at the end of the decade. It is predicted that the level of sophistication will follow a linear curve and then plateau (Fig. 2.2). This prediction is based on the assumption that publication of sensory work in a scientific journal authored by research scientists from other countries is a good indicator of sensory globalization. The plot in Fig. 2.2 shows that about 50% of the articles published in 1999 were multi-natic.; 1. From my perspective, this assumption is most likely true. We have witn1;ssed in the past that published scientific information generally follows its industrial application. Foreign articles (non-USA) published in the Journal of Sensory Studies have come from England, UK, Spain, Argentina, Canada, Italy, France, Australia, New Zealand, Taiwan, Philippines, India, and others; indeed an evidence of sensory globalization. However, it should be noted as indicated by Moskowitz the problem of translation of sensory theory and practices to other countries still remains. From my viewpoint, sensory intensity measures can be directly translated to other cultures, but not hedonic measures; even within a population, there is segmentation in hedonic expression. Moskowitz stated the problem of scaling differences by different cultures. Again, it should not be a problem for intensity measures, especially in the presence of control/reference standard. The key problem in transnational research lies in the hedonic scale (like/dislike). Note that there is no wrong answer for hedonic responses. As such it should be applicable only to a particular country or culture and this should be a guiding principle in sensory globalization. Because of this, we cannot develop a robust product in the spirit of Total Quality that will perform effectively in other countries. Furthermore, we can no longer directly compare hedonic averages from country to country, although we may use a transformed hedonic response to compare results from different countries.
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
46
FIG. 2.2. PERCENTAGE OF FOREIGN ARTICLES PUBLISHED IN THE JOURNAL OF SENSORY STUDIES BETWEEN 1986 AND 1999
One transformation of hedonic responses uses deviation of panelist scores from the grand mean. The expression is:
where Di is the transformed score expressed as a deviation from the grand mean, Yi is the individual panelist score, and M is the grand mean. By this transformation, we remove the individual differences in the way the panelist used the scale. One can further transform the score by dividing the above formula by the standard deviation of the respective population (country, locations within a country),
D,, =
(Yijk
- M.jk)/ S,,
i = 1, 2, j =1, 2, k =1, 2,
..., nth panelist ..., cth country ..., Ith location
with mean of Dijk= 0,being a normalized value. The second formula is more powerful because it further standardizes the score by adjusting it with the variability within country and location. The
INTERNATIONAL SENSORY SCIENCE
47
statistical analysis uses the Di or D,, as the observations for comparing hedonic responses among products or stimuli. The disadvantage of transformation is that we lose the directness that emerges from seeing actual average in relation to the points on the rating scale. However, the scaling of differences among products with the transformed scale will be more reliable. If we want to retain the original scale, then there is a statistical procedure used extensively in statistical genetics, i.e, animal breeding. This is the so-called method of least-squares (Harvey 1960; Damon and Harvey 1987). Basically, the known effects are modeled and estimated. The estimated effects called the leastsquares constant can be used to adjust the observations for country and location effects. Suppose we have the model,
Y,,
=
M
+ Pi + C, + h + E,,
where
Yij, = the observed hedonic score of the ith product, jth country, and kth test location, M = overall mean across products, countries, and test locations, Pi = effect of the ith product, C, = effect of the jth country, = effect of the kth test locations within countries, E,, = random error. The least-squares constant derived from the above model, can be used to adjust the hedonic score Y,, due to country and location. The adjustment formula is Adjusted Y,,
= Y,,
- Cj - & = Y,,
- (Cj
+ L,J
Here, Cj = least-squares constant for the jth country and h = least-squares constant for the kth location. The adjusted Yv, is now in its original rating scale form but adjusted for country and test location effects. As before with D,, the adjusted Y,, is used as the observations in the statistical analysis for product comparisons. The statistical model is now reduced to,
Yij = M
+ Pi + E,
i = 1, 2, j = 1, 2,
..., kth products ..., nth panelists
and results in one-way analysis of variance for comparing products. The above methods, particularly the adjusted Y,,, have not been fully explored for use in Sensory Science and product testing. At the present time
48
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
with readily available computing system, the various adjustment procedures given above can be easily studied and implemented. Ms. Muiioz’s detailed presentation of methods, problems, and questions should provide information/guidance in the search and development of world sensory principles. Only when these topics are adequately answered that acceptable principles can be developed and used with confidence. It is suggested that companies contemplating international Sensory Science and product testing should consult Muiioz’s viewpoints as guide in making appropriate choices. REFERENCES *DAMON JR., R. and HARVEY, W. 1987. Experimental Design, ANOVA, and Regression. Harper and Row, New York. HARVEY, W. 1960. Least-squares analysis of data with unequal subclass numbers. USDA, ARS 20-8.
* MCG acknowledges his learning of least-squares analysis from Dr. Richard Damon Jr., his mentor in his graduate school days.
CHAPTER 3 SENSORY MYTHOLOGY HOWARD R. MOSKOWITZ Sensory Science evolved into a discipline of “what to do” before it developed a corpus of information with intellectual content. As an unfortunate consequence, Sensory Science comes to its professionals and students alike replete with, or even perhaps burdened with mythology, beliefs, arguments about what is the correct procedure, what is questionable, and of course what is absolutely wrong, etc. A lot of the mythology has never really been recorded but is vehemently reiterated at meetings, especially by the older generation of practitioners. Furthermore, much of what is believed never has actually appeared in the scientific literature, or has never been substantiated by data. At one level these are amusing myths. At another level they are dicta that have hindered the intellectual development of the field to the extent it could have developed. Herewith follow four choice myths. They delight the critic, perplex the novice, reaffirm the dogmatist, and madden the iconoclast. They persist.
Myth #1 - The Consumer Cannot Evaluate More Than Two Products Without Losing Semitivity It is not clear where or how this sensory myth arose, because everyday experience and the science of psychophysics certainly show that the typical consumer can and does discriminate quite easily among a set of similar products (see Marks 1974). The scientific literature in psychophysics and Sensory Science certainly cannot have given rise to this myth because panelists in both fields have been shown to validly rate the sensory magnitude of many different stimuli under relatively short time frames, and without extensive training. Studies involving compound taste stimuli, e.g., pairwise and higher order mixtures (Moskowitz 1971; Moskowitz and Klannan 1975; Moskowitz ef al. 1978), odorants (Berglund ef al. 1971) and foods (Moskowitz 1977) show clearly that panelists can and do track the physical magnitude of the stimulus through their rating of intensity. The myth possibly may have come from studies where the panelist task was to discriminate between pairs of stimuli, rather than to rate these stimuli on a scale. With N stimuli there are N(N-1)/2 pairs to assess. When N is small (e.g., 4), then this task of implicit comparison produces a manageable set of pairs. For example, with N=4 there are 6 combinations of 2 items each, or 12 items to be tasted. That is a lot, but still manageable. When N is large, then the number of 49
50
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
pairs of stimuli increases rapidly, making the task infeasible. When N is large (e.g., lo), this produces an unmanageable set of pairs (for N=lO there are 45 pairs or 90 items to be tasted). Asking the panelist to rate the items, one at a time rather than to compare the items results in a more manageable task. Another source for this myth is the use of off-hand comments offered in corporate kitchens, by the participants in “product cuttings” or “tastings, ” “showings,” or whatever the corporate term is for an R&D presentation. A cutting involves the informal evaluation of a number of products, usually prototypes but often competitors. The cutting does not demand scientific rigor. Rather, in the corporation the project team sits around or more often stands around the kitchen counter in the laboratory, opening the samples, inspecting them, and then tasting them. A cutting may appear to begin in an orderly fashion, with the sensory scientist laying out the different samples in a specified order, e.g., to show specific sensory gradations among the products. The sensory scientist then suggests an order of evaluation, and a set of criteria on which to evaluate. The cutting may involve simply the experience of the product, or may be somewhat more formal, with rating sheets. What starts out orderly quickly degenerates, especially when the cutting comprises technical participants along with marketers. The order evolves to chaos as different individuals in the cutting try a product, then another product, and finally return to the first product in order to better understand the differences among the samples. The chaos leads to comments by the participants that they don’t remember the product that they just tasted, that they feel that they are losing their sensitivity, and that they don’t remember much about the first product in the set. These complaints should come as no surprise since the participants in the cutting do not or cannot wait between samples, have to be reminded to rinse between samples, and try vainly to keep all of the sensory information in memory. The cutting becomes a “blooming, buzzing confusion,” in the apt, appropriate words of William James. As a consequence they feel that they cannot cope with this sensory overload and complain that they cannot do the task. The knowledgeable researcher attending a cutting, aware of these problems, often changes the procedure so that the panelist writes down the ratings. In this way the researcher more closely simulates what happens in an actual test. Occasionally the corporate participants in the cutting feel uncomfortable with this level of control but they never appear to complain that under this regimen that they have lost their sensitivity.
Myth #2 - Good Sensory Science is Best Done Under Rigidly Controlled Conditions, Where Panelists are Isolated From Each Other Sixty years ago few facilities featured “state of the art” equipment. Then, at some point, it was decided by persons today unknown that the proper
SENSORY MYTHOLOGY
51
evaluation of food could only be done in isolated booths, where panelists could not see each other. A lot of these booths, well-isolated, equipped with the proper lighting, with spittoons, etc., began to appear in the 1960s and 1970s. Researchers in Sensory Science took public pride in these physical facilities because the facilities offered the best in environmental control. Now it is true that the studies should be controlled for the best results. It is also necessary that the panelists should have a place to expectorate the food if they are not to swallow an unduly large amount of the test stimulus. For odor/flavor/fragrance work it is also important to evacuate the local environment, continually replacing the odorized air with fresh air. Finally, it is important to control the ambient light. This will help to simulate indoor or outdoor conditions, or to mask the product color without blindfolding the panelist. What is mythological, and patently untrue, however, is that the booths must be white, must be isolated, and that the best Sensory Science only emerges in this environment. Perhaps the isolation booth is acceptable for a 15- or 20minute test, where the panelist evaluates a few products. In many of the author’s studies, a panelist is recruited for three hours, during which time the panelist may evaluate 10 or 15 products. In such test conditions it is vital to provide an environment that is both controlled so that panelists do not speak to each other, nor see what each other has written. It is necessary to make the environment ‘livable’ so that panelists can enjoy the experience, not dread the isolation. One way to accomplish this dual objective creates a different testing space comprising tables and computers for data acquisition, arranged so that panelists can see each other. This arrangement makes it hard for them to talk to each other, and impossible for them to see what the other panelists have answered on paper or on the computer. This physical arrangement has already been developed, and works by spacing the panelist “work tables” at angles to each other, so that there is no feeling of isolation, but yet there is control.
Myth #3 - Only Experts Can Validly Assess the Sensory Characteristics of Foods, and as a Corollary the Consumer’s Sole Task Should Be To Rate Degree of Liking This is a particularly pernicious and widely held myth, albeit a myth that has no substantiation in the scientific literature. We can trace this myth back to a couple of sources, some in the academic research community, some in the business practitioner community. Some of the origins are based on scientific data, others in the desire to perpetuate a specific practice for possibly less than noble reasons. Academic research, especially in psychophysics, and tracing back more than a century ago to Fechner, began with the assumption that people could not
52
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
validly act as measuring instruments for sensory intensity (Fechner 1860). This is the so-called “direct approach,” which was rejected for more than threequarters of a century in favor of indirect measurement,,, Indirect measurement presents the panelists with two stimuli and determines the degree to which these two stimuli are confused with each other, or the degree to which, in a population, one stimulus is preferred to another. The analysis then takes the confusion or preference data, processes the variability, and estimates scale values associated with the stimuli, with the property that the more two products are confused (or the more that they are equally preferred) the closer these two stimuli lie on the underlying scale. Fechner himself, the founder of psychophysics, believed that panelists could not validly judge intensity, and instead relied upon adding together units of “discrimination” or just noticeable differences (JND’s) in order to erect the psychophysical scale of magnitude. L.L. Thurstone (1927) developed a set of mathematic transformations to deal with this type of confusion data, expanding the approach of indirect measurement even further. Beliefs about the limitation of a panelist to attend to multiple attributes of the stimulus also helped to propagate the myth. The psychophysicist did not make any pronouncements about the ability of panelists to rate intensity, and indeed the entire foundation of direct scaling is based upon the assumption that panelists can and do judge intensity in a reproducible fashion that can be shown to exhibit properties of a valid scale. Typically, academic researchers involved in direct scaling instructed panelists to rate only one particular aspect of the stimulus, e.g., the loudness of the sound. Rarely in psychophysics laboratories of 40 to 50 years ago did the researcher instruct the panelist to rate multiple attributes of the same stimulus, in the same session, at the same time. S.S. Stevens, the founder of modem day psychophysics, held a firm, publicly stated conviction that panelists could not easily shift from attribute to attribute in an evaluation task. Consequently the prudent researcher should have the panelist concentrate only on one particular attribute (Stevens 1968). In early scaling work on sweetness and pleasantness of sugars and artificial sweeteners, Moskowitz (1971) and Moskowitz and Klarman (1975) set up the study so that panelists participated in two sessions. In one session the panelist would rate the perceived sweetness of the sugar, and in the other session the panelist would rate liking. Psychophysicists working with taste stimuli that provoke multiple qualities (sweet, salty, sour, bitter; Bartoshuk 1975) would break out of this stereotyped, limiting paradigm, and encourage the panelists to directly scale the degree of each quality in the taste stimulus. These panelists were relatively untrained, but appeared to experience no problem shifting their attention to the different qualities of the taste stimulus. The author himself eventually broke out of the shackles of the uni-attribute scaling when the results from Bartoshuk’s studies were shown to be reliable and useful.
SENSORY MYTHOLOGY
53
The business end of Sensory Science and taste also contributed to this myth. The 1950s - 1970s saw a resurgent interest in the descriptive analysis of sensory characteristics. It was with some of these systems that the myth began that consumers could not validly assess the sensory attributes of products, despite mounting evidence from the scientific literature. The Flavor Profile (Cairncross and Sjostrom 1950), the QDA (Stone and Side1 1986) and the Spectrum (Meilgaard et al. 1987) methods all entailed extensive training. The training was critical for panelists to understand the meaning of specific, hard-to-understand attributes, and to agree upon the ostensive meaning of these attributes through the use of reference standards. All was well and good at that point, since indeed many attributes are unusual, and not clear without explanation. Problems arose when the science of descriptive analysis turned into the very profitable business of descriptive analysis. At this point a great deal of the scientific inquiry was squelched, and met with blanket statements about the inability of the consumer to validly scale the perceived intensity of sensory attributes. Whether or not experts really perform better never seems to have been established in the literature. Nor, in fact, has there been any clear formulation of the criteria on which the experts are supposed to perform better. Are the experts more sensitive to the same stimulus differences? Do they have a better vocabulary? Are they more consistent? Comparisons of consumers and experts on many attributes suggest that when both groups are asked to scale a common set of stimuli their ratings correlate with each other, as long as the consumers understand the meaning of the attribute. Examples were published twenty years ago in texture (Moskowitz er al. 1979) and more recently in flavor (Moskowitz 1996). Discussions about this topic now and again appear on the Internet, in the Sensory E-Groups hosted by Yahoo. The different points of view, proffered by academics and practitioners alike, can be seen by looking at the archives of the Sensory E-Groups for the years 2001 and 2002, respectively.
Myth #4 - Statistics Are the Essence of Good Sensory Research (or ... “If You Don’t Know the Latest Statistical Fad, You Really Should Not Call Yourself a Professional”) This is a potentially damaging myth, partly because the consequence of abandoning statistics will probably plunge Sensory Science right back into the dark ages. Yet, the myth has to be addressed, if only because over the past 30 years statistical knowledge has often been equated with professional competence in Sensory Science. That is, a person who was not familiar with the latest, au courant statistical methods (whether these be from mapping, modeling, signal detection theory, inferential statistics) was also thought not to beau coumnt with the latest sensory thinking.
54
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
Although this author is a big fan of statistics, especially the new statistics for data exploration and data representation (Moskowitz 1994), all too often practitioners in the field have used arcane statistics in order to amass power and prestige. Arcane statistical treatments of data, challenging the researcher to pull out as much variability as possible in analysis of variance using statistical modeling and multi-dimensional representation of data, often cloud the subject being analyzed. All too often good solid thinking is sacrificed in the name of the newest statistical procedure, as if the practitioner cannot think through the problem without the esoteric approaches. Yet, let the reader attend a meeting of sensory researchers and listen to papers, and the reader will also come to realize that a lot of the statistical analysis hides perfectly simple, often prosaic thinking, or worse, no thinking at all. Having the latest multidimensional scaling tools at one’s fingertips does not necessarily mean that the data are amenable to this treatment, or that nature will yield up her secrets if the techniques are applied to the particular data set. One unfortunate and probably unanticipated outcome may be that the worship of statistical tools and new era statistical methods may paralyze the process of critically thinking through the problem, rather than aiding the process of obtaining valuable insight.
The Comfort of Myth Why these myths? Why do sensory scientists continue to accept mythology about consumer performance and the “right way of doing things”? One of the reasons may be that myths comfort, give structure to an unstructured, frightening world. If the sensory scientist is not well-educated, as has been the case for so long, then myths ground the analyst. They give a comforting directive about what to do. They reduce the uncertainty by providing a false set of requirements. The sensory scientist, caught up in accepting mythology, need not question, need not think, and need not worry. Myth prescribes behavior, and in that behavior the novice sensory scientist feels that he is doing the “right thing. Furthermore, myth creates a community. It can become lonely when one questions and strikes out on one’s own. A community of collegial professionals, working with the same mythology, enables the novice to enter the brotherhood (or sisterhood). Myth thus becomes the cost of entry and a key to membership in the “fraternity.“ By subscribing to the myths of Sensory Science anyone can become a full member, because the analyst is simply adopting the mores of the desired group.
SENSORY MYTHOLOGY
55
REFERENCES BARTOSHUK, L.M. 1975. Taste mixtures: Is mixture suppression related to compression? Physiology and Behavior 24, 3 10-325. BERGLUND, B., BERGLUND, U.. EKMAN, G. and ENGEN, T. 1971. Individual psychophysical functions for 28 odorants. Perception and Psychophysics 9, 379-384. CAIRNCROSS, S.E. and SJOSTROM, L.B. 1950. Flavor Profiles - A New Approach to Flavor Problems. Food Technol. 4, 308-3 11. FECHNER, G.T. 1860. Elemente der Psychophysik. Breitkopf und Hartel, Leipzig. MARKS, L.E. 1974. Sensory Processes: The New Psychophysics. Academic Press, San Diego. MEILGAARD, M., CIVILLE, G.V. and CARR, B.T. 1987. Sensory Evaluation Techniques. CRC Press, Boca Raton, Florida. MOSKOWITZ, H.R. 1971. The sweetness and pleasantness of sugars. Amer. J. Psychol. 84, 387-405. MOSKOWITZ, H.R. 1973. Sweetness Additivity. J. Experimental Psychol. 99, 89-98.
MOSKOWITZ, H.R. 1977. Sensory measurement: The rational assessment of private sensory experience. Master Brewer’s Association of America, Technical Journal 24, 111-119. MOSKOWITZ, H.R. 1994. Product testing 2: Modeling versus mapping and their integration. J. Sensory Studies 9, 323-336. MOSKOWITZ, H.R. 1996. Experts versus Consumers. J. Sensory Studies 1 2 , 19-38.
MOSKOWITZ, H.R., KAPSALIS, J.G., CARDELLO, A.V., FISHKEN, D., MALLER, 0. and SEGARS, R.A. 1979. Determining relationships among objective, expert, and consumer measures of texture. Food Technol. 33, Oct. 84-88. MOSKOWITZ, H.R. and KLARMAN, L. 1975. The tastes of artificial sweeteners and their mixtures. Chemical Senses and Flavor 2, 41 1-421. MOSKOWITZ, H.R.,WOLFE, K. and BECK, C. 1978. Sweetness and acceptance optimization in cola flavored beverages using a combination of artificial sweeteners. J. Food Quality 2, 17-26. STEVENS, S.S. 1968. Personal communication. STONE, H. and SIDEL, J.L.H. 1985. Sensory Evaluation Practices. John Wiley & Sons, New York. THURSTONE, L.L. 1927. A Law of Comparative Judgment. Psychological Rev. 34, 273-286.
56
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
ALEJANDRA M. m O Z
Sensoryprofessionals learn the fundamentals of Sensory Science, the “do’s” and “don’ts”through sensory courses/workshops and books. Others, trained “on the job,” learn the sensory techniques practiced by their companies through senior colleagues and documented procedures, The general belief is that all sensory techniques and methods are robust, supported by research, and thus should be followed. We accept them at face value because they were taught to us either by professors, consultants, experienced sensory professionals, or books. Alternatively we accept them because these techniques are documented, established and practiced by our companies. As young professionals we never challenge these practices. With experience, we sensory professionals start either to examine the origin and basis of some of our practices, question certain procedures, and most importantly, separate the beliefs/myths/recommendations from the researched fundamentals of our field. We, as experienced sensory professionals should take a careful look at some of our practices and ask: To what degree are the techniques we practice fully researched? If there is limited research or documentation on these practices, then certain questions, discussions, modifications, and further research on these practices should be welcomed. Which techniques or practices are therefore “myths”? Only a few professionals have addressed this topic (Pangborn 1979; Chambers et al. 1993). We, the authors of this book, wanted to include this chapter to share our views on the myths we believe exist in the field. Hopefully, the sensory community can build on the questions and issues we ask or challenge in this chapter, reflect on or refute some of these practices, encourage discussion, broaden the application of these techniques, or motivate research. The results can only be positive for the growth of our field. We have selected a few sensory topics, practices or rules to examine in this chapter. We call these beliefs or practices “myths.” Some of these techniques are documented in books, but many only represent the opinions and recommendations of senior sensory professionals in the field (Moskowitz 1983, 1985; Stone and Side1 1993; Lawless and Heymann 1998; Meilgaard e? al. 1999). This author considers that it is legitimate to change or modify these practices, if they are myths. In addition, we have to face reality and understand that currently, we rarely have the ideal circumstances and all needed resources to complete the “ideal” tests. Budget issues or political situations often force us to modify our practices. We should feel comfortable in modifying some of these practices, especially if they are myths. In addition, this author believes that sometimes, providing a partial but sound answer or information is better than providing no answer at all. Conclusions will be drawn by the test requester or management, with or without our tests or recommendations. We should be open to changing our techniques, as long as we do not compromise our reputation and
SENSORY MYTHOLOGY
51
do not break the “unbreakable” rules. Through our expertise we can provide and report information with the proper perspective, and point out possible disadvantages, and most importantly the risks involved. Awarenessknowledge of the published literature, experience, and good judgment are required to separate the “unbreakable” rules and fundamentals of the field from the myths. Hopefully this chapter and the views from these three authors can help less experienced professionals make the distinction. In addition, we hope that this chapter can be useful to those experienced professionals who share our views.
Myth: Sensory/consumer Scientists Should Not Conduct a Test That Is Not Formal, Quantitative andlor Published This is a double edge sword, yet an important issue to discuss. We sensory professionals must challenge our test requestors when they approach us with test requests that have flaws, e.g., the participation of panelists in discrimination and hedonickonsumer tests simultaneously; the completion of discrimination tests with few participants; the selection of Type I or I1 risk levels based on the desired outcome; the use of qualitative data from one focus group with few participants and provide recommendations for the launching of a new product or for other important decisions based on such results, etc. Some professionals spend a considerable amount of their time explaining to the test requestors the unreliability of those practices and the risks involved. Obviously, professionals who are constantly faced with this scenario do not wish to modify the “formal” and accepted sensory and consumer practices, and conduct less formal tests. The recommendation against modifying formal quantitative tests is a myth. We can and should modify our practices and conduct less formal tests, when appropriate. Who determines if the modification of any of the “formal” and sound sensory techniques is acceptable? What is the risk of making a modification to one of our practices and fostering ensuing inadequate test requests? How “informal” and “qualitative” can a test be before it becomes a flawed test? Sensory professionals should ask themselves these questions. (1) Does the requested “less formal” method or practice violate well founded sensorykonsumer methods, principles and ”rules”? (2) What are the risks involved in conducting the “less formal” test? (3) How will the information be used? (4) How will the test requester obtain his/her answer otherwise?
58
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
If the requested test or practice clearly violates the principles of a given method and/or the risks are high, then the sensory professional should not agree to the modified practice. The generated data will be invalid and only lead to risky decisions. The sensory professional will loose credibility and be blamed for unsound results and decisions. Conversely, the sensory professional should assist the test requester if riskier decisions would be made due to the biased opinions and the use of incorrect practices by these researchers. For example, although not ideal and certainly not a “formal” test, the benchtop or qualitative evaluation of a set of products by a small group of trained panelists (e.g., 3 4) is better than the same evaluation completed by one or two of the researchers/product developers. These researchers are not only untrained, but have biases due to their knowledge of the formulation or processing of the products. Similarly, it is best to conduct a focus group session with naive consumers to collect qualitative and preliminary consumer reactions, rather than making decisions based on the preference responses of the same one or two biased researchers, or their family members. Experienced sensory professionals are able to modify methods and practices without compromising the data’s integrity and help the test requestor. This information will be more valuable and less risky than the one obtained exclusively by the test requester. In addition, the experienced professionalscan shed insights and use hisher expertise to position, interpret and report the information and minimize misunderstandings and risky decisions.
Myth: Test Parameters or Methods Should Not Be Modified if They Are Commonly Used or Well-accepted in the Company Arguments that endorse this belief include.
(1) “The company’s methods in place are the best methods and do not need to be modified.” (2) “The methods were established by a senior sensory professional or consultant and they should not be changed.” (3) “Test requestors will be confused if the methods are changed.” (4) “Modifications may compromise the reputation of the group. ” (5) “Notime is available to research and modify methods, and familiarize users with the new techniques.” The belief that methods and test parameters should not be modified is a myth. Sensory professionals should always be investigating new or alternate methods, and assessing how test parameters should be modified for different projects/objectives.
SENSORY MYTHOLOGY
59
Below are only a few examples of test parameters that very seldom are modified by some companies. This list is not inclusive. The topics are only meant to illustrate the point raised by this author. Hopefully, these examples can elicit the assessment and the discussion among sensory professionals of these and other internal practices that can and should be modified, when applicable. 1. Only One P Value Should Be Used in Hypothesis Testing. The “standard” probability value (P)value used by most companies or laboratories in hypothesis testing is P 5 0.05. Many sensory professionals use it blindly in all hypothesis testing. Most sensory practitioners have been asked by test requestors to explain the meaning of this P value, or the reasons why other values are not used in statistical tests. When approached with this question, many sensory professionals support the “magic” 0.05 value indicating that this significance level is commonly used in industry, that it has been traditionally used within the company, or that it represents a small enough number to minimize the risks incurred in the decision making process. The use and support of only one probability value in hypothesis testing is a myth. Practitioners should understand the meaning of this parameter (P),and know the origin of commonly used or recommended values to be able to fully explain its meaning, and to feel comfortable in considering and using other probability levels for different projects/objectives. Sensory professionals should be familiar with the origin of the “commonly used P = 0.05” level, Type I and Type I1 errors, and the meaning of this P value to be able to explain its use to test requesters and decide on its value for different tests. First, everyone must know that there is nothing “magical” about the “0.05” value. Anecdotal narratives give an account of how the 0.05 level was chosenhsed arbitrarily and became the norm. This happened because of the difficulty of calculating P values when Fisher and others published early books for “significance” (Chambers 2002). Secondly, practitioners must understand and use the concepts of Type I and I1 errors and the probability of committing those errors (alpha and beta). Thus, practitioners should know that it is often necessary to consider these two probability values (aand p) in designing tests and analyzing data. It is essential to know the test situations when one probability value (e.g., p) needs to be set, and let the other one (e.g., a) float. Finally, and most importantly, researchers should know that different projects may need different risk levels, thus P values to address different objectives. For example, it may be necessary to set a very low risk level for Type I error (a), such as 1% or 0.1% (P = 0.01 or 0.001) in certain projects where the risk of committing Type I error should be minimized. Alternatively, there are projects where the P value (a)may be set much higher such as P = 0.20, or when it is secondary in importance (since Type I1 is the most important error to control).
60
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
Very few sensory/consumer insights groups use different P values for different projects. One of the reasons given by professionals is that test requestors may be confused when they obtain test results with different P values. This author believes that perhaps these professionals avoid explaining these statistical concepts to the test requestors because they feel uncomfortable with these concepts. Sensory professionals who understand the meaning of these statistical concepts can adapt these parameters to different projects and test objectives, and comfortably explain these concepts and the reasons for the modifications to test requestors.
2. The Same Scale Should Be Used in All Attribute/Descriptive (or Consumer Tests). Many sensory professionals believe that the same scale should be used in all attribute/descriptive or consumer tests. For example, it is believed that only the 9-point hedonic scale (if that is the scale of choice) should be used in all consumer tests, or that only a 100-point scale (if this is the scale of choice) should be used in all attribute or descriptive tests. Sensory professionals should use different scales, if appropriate or more beneficial for different applications. For example, it is appropriate to use different hedonic scales for special consumer populations (e.g., children, lower income consumers, etc.). It is also appropriate to use a 50-point intensity scale for selected attribute tests, instead of, for example, a 100-point scale used in descriptive tests. It is legitimate to use different scales if they are more beneficial to the new test/application. However, one should be aware, that it is best to use the same scale type within the same test to avoid panelist or consumer confusion. In addition, one should be sensitive to the importance of building a database or comparing results to or merging them with historical data. In this case, the use of a different scale is more detrimental since it may be impossible to merge the data. Sometimes, even if the test objective justifies it, it may not be appropriate to use a different scale. 3. Only One Type of Discrimination Test Should Be Used. Often sensory/consumer insights groups only use one type of discrimination test. Some companies prefer to use triangle tests, whereas others exclusively use duo-trio, or another type of discrimination test. Sensory professionals should know the benefits of each of these tests and apply a different test if dictated by the new test parameters or needs; e.g., when testing a fatiguing product, if new/different information is needed (e.g., degree of difference), etc. There is a plethora of traditional discrimination tests (e.g., simple difference tests, duo trio, triangle, A not A, etc.) (Stone and Side1 1993; Lawless and Heymann 1998), as well as non-traditional and new discrimination test theories and methods, especially those based on signal detection theory, Thurstonian scaling, R-index a (0’
SENSORY MYTHOLOGY
61
Mahony 1979; Ennis 1993, 1998; Cliff ef al. 2000; Rousseau 2001; Rousseau and O’Mahony 2001) that should be explored and applied, when appropriate.
Myth: Descriptive Panel Data Need To Be Quantitative To Be Valid Most of the descriptive analysis tests currently completed by sensory professionals are quantitative (Keane 1992; Stone 1992; Muiioz and Civille 1992; Muiioz et al. 1992). Sound descriptive analysis places emphasis on both the qualitative and the quantitative component, as described in Chap. 18 and 20. In addition, due to the quantitative nature of descriptive analysis tests, their data are used to make important business decisions, e.g., in product maintenance, product launching, optimization and improvement, quality control programs, etc. Therefore, some professionals consider that descriptive data are only valid if they are quantitative and supported by statistics. This is a myth. Descriptive analysis should be regarded as a valuable method for providing both qualitative and quantitative information. We err in overlooking the fact that the two components are independent. Thus it is acceptable to focus on or only use one of the two components. The focus on the quantitative component is accepted by all sensory professionals. Time intensity (T-I) measures focus on the quantitative descriptive element (Lee and Pangborn 1986). One or a few of the descriptive attributes are selected to concentrate on the quantitative component (the evaluation of intensity over time). The qualitative/attribute component in T-I measures is not the focal point. Similarly, less focus can be given to the quantitative descriptive component. In fact, the original Profile method (Caul 1957; Keane 1992), a well-respected descriptive method, does place paramount importance on the qualitative component. Why shouldn’t we sensory professionals place importance only on the descriptive qualitative component, when necessary? There are several scenarios in the R&D environment, where this practice is desirable and sometimes the only way to proceed in using a descriptive panel, such as: (1) the screening of a large amount of samples (2) the need for a quick turn around of descriptive information (3) the exclusive need of qualitative information since the quantitative data may not be reported, used or understood (e.g., in management presentations, use of the data by non-sensory professionals, etc.) It is legitimate to work with the panel in these and other scenarios, focusing only on the qualitative descriptive component. In these examples, no individual quantitative data are obtained. Instead, the panel generates descriptors and may work as a group in order to reach a consensus. Therefore, either when the above
62
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
scenarios can be anticipated, or when the case arises, a quantitative panel needs to be oriented into generating qualitative information and reaching consensus judgments. Individual data may be collected on a regular basis, but the panel is then trained to only generate qualitative information, discuss it and reach consensus judgments.
Myth: Descriptive Evaluations Need To Be Replicated To Be Valid Sound descriptive evaluations are replicated. Replication provides many advantages, such as increased statistical power, ability to test more than one batch or formulation, ability to check panelists’ performance, etc. Whenever possible, tests should be replicated. However, are tests invalid without a replication? Perhaps, depending on the type of panel and descriptive approach used. QDA@(Quantitative Descriptive Analysis) panels that use unstructured line scales and no intensity references generate highly variable data. QDA data should be considered invalid when only one evaluation is completed (e.g., no replication). Stone and Side1 (1993) report that from an empirical point of view, about four trials appears optimal for QDA@tests. Therefore, the need to replicate QDA evaluations is not a myth, it is a necessity. The scenario is different in Profile, and modified Profile evaluations. These panels are highly technical, follow extensive training programs, and use intensity or quantitative references (refer to Chap. 19). These panels, when well-trained, generate data with considerably lower variability, especially when intensity references are used. Replications are also desirable for Profile and Modified Profile panels, but not essential. In this case, the requirement for replication is a myth. If replications cannot be accommodated, the data are nonetheless still valid.
Myth: Sensory Professionals Should Not Try (e.g., Taste, Apply, Smell) Products Before Designing and Planning a Test This recommendation is based on the belief that analysts become “biased” if they become familiar with the sensory properties of test products. This author believes that the exposure to and the discussion of test products does not present a bias, unless the analyst is a panelist. Sensory/consumer scientists should become familiar with the test products in order to gain information on the products and the sensory properties, as well as to decide on the best test design and product presentation scheme in both analytical/sensory and consumer tests. In analytical tests, the sensory practitioner should be familiar with the products’ sensory characteristics to help make decisions on the products to test, their presentation, their special preparation and serving conditions, the needed references for a panel, the testing sequence, etc. In designing consumer tests, sensory/consumer scientists need to inspect the products to make decisions on
SENSORY MYTHOLOGY
63
the test design, the products to test, the sample preparation and serving strategies, the questionnaire structure, etc. Therefore, analysts are encouraged to become as familiar with the products as possible before designing any sensory and consumer tests.
Myth: Sensory Scientists Should Include All Samples in the Test Sample Set Some sensory professionals believe that all products given by the test requester should be tested. Some of these sensory professionals will include all products without inspecting them, if they believe that exposure to the samples may be biasing. This point was discussed above, and this author refuted this belief/myth. Other professionals include all samples because it is a “common” company practice, no time is available for this screening, or simply because the project “requires” that all products be tested. Sometimes the latter is necessary for highly visible or political projects. In those cases, all products are tested regardless of their similarity or recommendations against their testing. Except for the “political” projects described, hopefully professionals can make the time to accommodate the product screening, or they can challenge the “common” company practice against inspecting products, in light of the many advantages that exist in reviewing the products before designing a test. Who should inspect the products to make the decisions discussed above? This author recommends that a team of professionals complete these product reviews. The selection of the most appropriate team members depends on the company and the specific test objective. Mostly, a project team includes the sensory professional, the product developer/chemist, the market researcher, the marketing professional, etc. Each participant’s expertise and contribution is valuable in this review and in making decisions on products. In these product reviews, the sensory properties and other technical and business criteria are taken into consideration in the decision making process. Occasionally, the team may be formed only by sensory professionals and possibly trained panelists. This is particularly true if decisions are based exclusively on sensory properties.
Myth: Sensory Booths Are Required To Conduct Valid Sensory Tests Booths are always desirable (Stone and Side1 1993; Lawless and Heymann 1998). They give privacy and foster concentration for panelists. When a laboratory is being built and the resources are available, booths should be built. There are, however, some occasions when this is not possible, including when space is limited (e.g., in manufacturing facilities for QCIsensory measures), or resources are unavailable. Occasionally, evaluations are not conducted in booths if the test is being run by a contractor/agency in a facility lacking booths (e.g., a focus group room, a church or hall).
64
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
Are tests invalid if booths are not available? Should tests only be conducted if there are booths available? A test can certainly be conducted without booths and be valid. Some precautions should be considered depending on the type of test being conducted. If a descriptive test is conducted and the panelists have been properly trained, then the evaluations can be conducted without booths. Very well-trained panelists learn to concentrate on the products and their evaluation, and can, to a certain extent, disregard distractions in the room. In addition, panelists prefer to quietly evaluate products, since concentration is necessary to conduct sound evaluations. Therefore, panelists themselves are not the cause of distractions. Semi-trained panelists and consumers are more likely affected by environmental distractions, thus their evaluations benefit from the use of booths. When booths are not available for tests involving semi-trained panelists or consumers, it is recommended to conduct an orientation prior to the test. The benefits of having participants concentrate and quietly complete the evaluation to foster concentration should be explained. In sufficiently large rooms without booths, it is recommended that ample space be allowed between panelists to promote independent evaluations. Portable booths or separators can also be used in these circumstances.
Myth: “X” Number of Panelists or Consumers Need To Participate in the Test There are common numbers of participants that are recommended for sensory tests. We have “learned” that we should use 100’’ consumers and “ 10 or 15” trained panelists in descriptive tests (ASTM 1992; Meilgaard et al. 1999). These numbers are accepted without questioning the rationale behind these recommendations. In addition, some professionals think that using fewer participants makes a test invalid. This is a myth. Sensory professionals have become more aware of the calculations that can be used to compute the number of panelists needed in the test, based on data parameters and some statistical criteria (e.g., the magnitude of the difference to be detected, the required alpha and beta levels, the known data variability) (Gacula and Singh 1984; Kraemer and Thiemann 1987; Gacula 1993). Whenever possible, sensory professionals should complete these calculations to determine the sample size needed, instead of using the above “common” recommendations as guidelines. There are also other issues to consider in determining the required sample size, regardless of the outcome of the sample size calculations. For example, what happens if only six panelists are available in a descriptive test? Are all panelists used if available (e.g., 16)? How many consumers are recruited for a
SENSORY MYTHOLOGY
65
test if the number/outcome of the calculation indicates that only 40 participants are needed in a test? For descriptive and attribute tests one should consider the level of training in making the final sample size determination. This parameter is in fact considered in the statistical calculation, since the standard deviation value, a measure of panel variability, is used. In general, the better trained the panel, the smaller the number of panelists needed. In other words, panelists’ variability decreases as the training level increases. Therefore, how small of a sample size is appropriate in descriptive tests? In general, a small number of panelists is adequate if they are extremely well-trained. Chambers et al. (1981) found that a panel of three highly trained, experienced individuals performed at least as reliably and discriminated among products as well as a group of eight lesstrained individuals. Those results suggested that the use of well-trained individuals could reduce the number of panelists necessary for sensory testing. Conversely, all panelists in the pool should not be used in all tests. A large pool of trained panelists is needed to be able to choose panelists for a test, to allow for panelists’ unavailability, attrition, etc. However, the whole panel pool should not be used regularly. For consumer tests, this statistical calculation also provides the sensorykonsumer scientist with the sample size needed. This number indicates the required number of participants needed to meet the set statistical criteria (a and 0) given the known variability. Frequently, the final number of participants will differ from the recommended sample size provided by this calculation. For example, despite being supported by the calculation’s outcome, the validity of consumer data may be questioned if a small number of recruited consumers participated. Therefore, the consumer sample size may be larger than the one provided by the calculation. Conversely, on occasion, the number of recruited consumers will be lower than the recommended number provided by the calculation, because of costs. These are cases where the calculated required sample size is used as a guideline, but the final sample size is determined by other factors. It is recommended that the required number of participants be calculated, even if fewer consumers will be recruited. The sensory professional can then use this information in the data interpretation and reporting of results. For example, in the case where fewer consumers than the minimum required are recruited, the sensory professional can warn the test requester of the risks involved, especially if no product differences were found. A Type I error may have been committed, i.e., the test failed to detect/declare product differences due to the smaller sample size.
66
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
Myth: Consumers Should Never Evaluate Attributes, Since This Practice Affects Their Hedonic Measures Some sensory/consumer scientists believe that consumers are biased when they are asked product attributes. Therefore, these professionals only include overall liking questions in their questionnaires and thus do not collect any product information from consumers. These sensorykonsumer scientists rely on experimental design or descriptive analysis to obtain product guidance. There has not been enough research completed to determine whether or not consumers are biased when they are asked product attributes (Husson et al. 2001). Thus the belief that consumer responses are affected by including attribute questions in a questionnaire is a myth. This is an important subject and Chap. 10 presents the authors’ opinions on the topic. Myth: Consumers Cannot Rate Products Unless Fully Anchored Scales Are used There is the belief among some sensory professionals that consumers have difficulties rating product attributes and working with scales. Thus, they are convinced that only fully anchored scales should be used in consumer tests, since these are the scales consumers understand. Studies have reported the successful use of other scales besides the fully anchored among consumers (Rohm and Raaber 1991; Hough el al. 1992). Thus believing that consumers can only understand and use fully structured scales is a myth. Consumers are capable of using any sound scale provided they understand it. This author recornmends that during a short orientation conducted prior to the consumer test, the scale(s) used in the questionnaire be shown to the consumers and their main characteristics be explained. Examples should be given to the consumers to demonstrate the scale properties and their use. Additionally, consumers should be given the opportunity to ask questions, if the use of the scales is unclear. In this author’s experience, consumers can easily use scales that are not fully structured. Consumers do not have problems using these scales, as long as the scales are sound, and have properly chosen end anchors and possibly a midpoint (if applicable). This author considers that sensory professionals who are adamant about the use of fully anchored scales introduce a problem to the test, which is to find the appropriate anchors for each scale category. It is very difficult to find words/anchors that ensure equidistance among all scale categories for all attributes. Even if a sensory professional successfully develops a sound fully structured scale for a given attribute, problems may arise when the scale is translated into other languages. It is this author’s experience that flawed scales .
SENSORY MYTHOLOGY
67
are invariably developed when they are forced to list all anchors. This is particularly true for intensity scales. Sensory professionals who only used fully structured scales wish to avoid consumer confusion with scaling. However, these professionals commit a greater error, when structuring flawed scales with inappropriate anchors, since consumers may be more confused with these scales. In addition, unsound data may be collected. This author encourages professionals with this philosophy to acknowledge that their belief is a myth, and that more serious problems may be encountered when forcing anchors to scales.
Myth: The Best Scale To Use Is the 9-point Hedonic Scale. After All a Considerable Amount of Research Was Conducted When it Was Developed There are sensory/consumer scientists who exclusively use the 9-point scale and are reluctant to consider the use of other hedonic scales. It is well-known that the 9-point hedonic scale is one of the few scales that has been fully researched (Peryam and Pilgrim 1957). It is a sound scale and, if appropriate, sensorykonsumer scientists should use it. However, sensory professionals should be open to using other hedonic scales that have been used in consumer tests (Lawless 1977; Rohm and Raaber 1991; Hough et al. 1992). Among them are unstructured line scales or category scales containing only 3 anchors: the 2 end and mid-point anchors. The advantage of these scales is that they can be used with five or seven categories, because they only list the end anchors (and possibly a mid-point).
Myth: The Order of Sample Presentation Should Always Be Balanced or Randomized Balancing or randomizing the order of sample presentation is a good sensory practice. It ensures that all samples are given equal opportunity to be presented in all positions throughout the design (Cochran and Cox 1957; Gacula and Singh 1984). In addition, this practice allows sensory professionals to study order effects (Gacula et al. 1986; Gacula 1993; Gacula er al. 1999). Furthermore, special presentation designs and schemes should be used for specific applications (Wakeling et al. 2001). This practice should be followed most of the time when there is no drastically fatiguing or unusual products in the set. In this case, it may be best not to randomize or balance the order of presentation. The above recommendation may surprise some readers as it goes against the well-established and well-known practice of randomizing or balancing. Just to clarify, this author acknowledges the appropriateness of balancing and randomizing the order of presentation and endorses it in most cases, as explained above. However, there are occasions when randomizing or balancing
68
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
the order of presentation may incorporate problems in the evaluation. What happens when there is a-priori knowledge of a product’s fatiguing effects or unusual1y/extremely different characteristics and their potential effect on the evaluation? The detrimental effects or introduction of psychological errors (e.g., convergence, contrast) in the evaluation could be anticipated (Stone and Side1 1993). When this happens due to one of the samples in the test, a different sample presentation strategy may be most appropriate than balancing or randomizing. The best strategy may be to disregard that unique/unusual sample in the randomization or balancing scheme (i.e., all other samples are randomized or balanced), and to present the differenthnique sample last in the set. It should be stressed that the above is recommended when there is only one unique sample in the set. When all, or the majority of the samples have unique characteristics (e.g., extreme grittiness) or are fatiguing (e.g., mints, chili pepper, hot sauces, etc.) all samples need to be randomized or balanced. The sensory/consumer scientist should introduce other test control practices to ensure the integrity of the test (e.g., monadic evaluations, appropriate selection of the rinsing agent, sufficiently long rest periods, etc.).
Myth: Sensory Consumer Studies Should Only Address Sensory Properties We sensory professionals have been adamant about including only liking and sensory related attributes (e.g., attribute hedonics, diagnosticdintensities) in our consumer questionnaires, We teach young professionals to concentrate on sensory and product related questions and avoid image, branding, purchase intent, and other similar questions. It is important to explore the rationale behind this recommendation. Is this a legitimate recommendation or a myth? There are a few research studies that have investigated the effect of these questions on consumer responses (Martin 1990; Shepherd ef al. 1991; Bower and Turner 2001). More research is needed to conclude on the effect of such questions on consumer performance and on the integrity of the data. This author believes that this recommendation is founded on political issues and risks. Sensory/consumer insights groups are qualified to address these non-sensory questions, such as purchase intent, brand preference, etc. However, political issues and potential risks should be assessed, as explained below.
1. Delineation of Boundaries/Turf. Chapter 4 discusses the interaction and occasional conflicts that may exist between the Market Research and the Sensory/Consumer Insights groups. In that chapter this author indicates that the conflicts may be the result of overlapping roles, and the lack of understanding or delineation of the different roles/responsibilities. In an attempt to avoid these conflicts, many groups delineate the research responsibilitiesmoundariesfor each group (Carter and Riskey 1990). In some companies, the boundaries for a
SENSORY MYTHOLOGY
69
sensory group are set around sensory analytical/laboratory tests; e.g., sensory groups do not conduct any consumer tests. In other companies, the boundaries are set by the research issues investigated, thus the questions addressed in consumer tests. In this case, sensory groups are restricted to investigating only product/design issues, whereas market research deals with non-sensory, business and marketing issues. Therefore, in this case the sensory group only addresses product-related questions (e.g., attribute hedonics and diagnostics) to help product developers/formulators address sensory defects and issues. Conversely, the Market Research groups build on the information gathered by the sensory group, and collect other information related to purchase intent, price, sales projection, etc. Consequently, these are attributes included in Market Research consumer questionnaires. The above scenarios are not followed as rigidly in every company. Market Research groups may include sensory/product properties questions in their questionnaires, in the same way that some sensory/consumer science groups may address non-sensory issues. Technically, sensory/consumer insights groups are qualified to address these non-sensory questions, such as purchase intent, brand preference, etc. If the effect of such questions on the integrity of the sensory data is suspected, sensory professionals should ask these questions at the end of the test, once all sensory product information has been obtained. Regardless, practitioners should be cautious with the representativeness of the consumer population used and risks in asking these non-sensory questions, as discussed below. 2. Lack of Representativeness and Risks. A more important issue from this author’s perspective is the risk assessment when non-sensory questions are addressed in consumer tests by a sensory/consumer insights group. As mentioned above, from the technical perspective sensory groups are able and capable of collecting non-sensory responses. However, the representativeness of the consumer population used should be assessed to determine the value and projection of those data. Sensory/consumer insights groups should cautiously report the non-sensory data, when the participants were not representative of the population of interest (e.g., if the test was only local and the project requires a national representation), or if the sample size is too small. Non-sensory questions deal with business and marketing issues. Important business decisions are made based on this information, such as branding, price determination, sales projection, and the gathering of marketing/advertising ideas. Therefore, it may be risky to use the data collected by a sensory/consumer insights group if a small and local consumer population was used.
70
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
REFERENCES ASTM. 1992.Manual on Descriptive Analysis Testing. MNL 13, R. Hootman, ed. ASTM, West Conshohocken, Penn. BOWER, J.A. and TURNER, L. 2001.Effect of liking, brand name and price on purchase intention for branded, own label and economy line crisp snack foods. J. Sensory Studies 16, 95. CARTER, K. and RISKEY, D. 1990. The roles of sensory research and marketing research in bringing a product to market. Food Technol. 44( 1 l), 160, 162. CAUL, J.F. 1957.The Profile Method of Flavor Analysis. Advances In Food Res., 7(1), 1-40. COCHRAN, W.G. and COX, G.M. 1957.Experimental Designs. John Wiley & Sons, New York. CHAMBERS, E. IV. 2002. Opinion on setting the critical P level.
[email protected] CHAMBERS, E. IV., BOWERS, J.A. and DAYTON, A.D. 1981. Statistical designs and panel training/experience for sensory science. J . Food Science 46, 1902-1906. CHAMBERS, E. IV. and SETSER, C.S.1993.Myths and monsters of sensory methodology. Presented at “Advances in Sensory Food Science”, Rose Marie Pangborn Memorial Symposium, Jarvenpaa, Finland. CLIFF, M.A., O’MAHONY, M., FUKUMOTO, L. and KING, M.C. 2000. Development of a “bipolar” R index. J. Sensory Studies 15(2), 219- 229. ENNIS, D.M. 1993. The power of sensory discrimination methods. J. Sensory Studies 8, 353-370. ENNIS, D.M. 1998. Thurstonian scaling for difference tests. IFPressO 1(3), 2-3. GACULA, JR., M.C. 1993. Design and Analysis of Sensory Optimization. Food & Nutrition Press, Trumbull, Conn. GACULA, JR., M.C. and SINGH, J. 1984. Statistical Methods in Food and Consumer Research. Academic Press, San Diego. GACULA, JR., M.C., WASHAM 11, W.W., BEARD, S.A. and HEINZE, J.E. 1986.Estimates of carry-over effects in two-product home usage consumer tests. J. Sensory Studies 1, 47-53. GACULA, JR., M., DAVIS, I., HARDY, D. and LEIPHART, W. 1999. Carry-over effects in sensory evaluation: Case studies. In: Proceedings of the Fifteenth Annual Meeting of the International Society of Psychophysics, pp. 142-147, P.R.Killeen and W.R. Uttal, eds. Tempe, AZ. HOUGH, G., BRATCHELL, N. and WAKELING, I. 1992. Consumer preference of dulce de leche among students in the United Kingdom. J. Sensory Studies 7, 1 19-132.
SENSORY MYTHOLOGY
71
HUSSON, F., LE DIEN, S. and PAGES, J. 2001. Which value can be granted to sensory profiles given by consumers? Methodology and results. Food Quality and Preference 12, (5-7), 291-296. KEANE, P. 1992. The Flavor Profile. In: ASTM Manual series MNL 13. Manual on descriptive analysis testing. ASTM, West Conshohocken, Penn. KRAEMER, H.C. and THIEMANN, S. 1987. How Many Subjects?: Statistical Power Analysis in Research. Sage Publications, Newbury Park, CA. LAWLESS, H.T. 1977. The pleasantness of mixtures in taste and olfaction. Sensory Processes 1, 227-237. LAWLESS, H.T. and HEYMANN, H. 1998. Sensory Evaluation of Food. Chapman & Hall, New York. LEE, W.E. and PANGBORN, R.M. 1986. Time Intensity: The temporal aspects of sensory perception. Food Technol. 40(11), 71-82. MARTIN, D. 1990. The impact of branding and marketing on perceptions of sensory qualities. Food Science Technology, Today: Proceedings 4( I), 44-49. MEILGAARD, M., CIVILLE, G.C. and CARR, B.T. 1999. Sensory Evaluation Techniques, 3rd Ed. CRC Press, Boca Raton, Fla. MOSKOWITZ, H.R. 1983. Product Testing and Sensory Evaluation of Food: Marketing and R&D Approaches. Food & Nutrition Press, Trumbull, COM. MOSKOWITZ, H.R. 1985. New Directions In Product Testing And Sensory Evaluation Of Foods. Food & Nutrition Press, Trumbull, COM. MUROZ, A.M. and CIVILLE, G.V. 1992. Spectrum Descriptive Analysis Method. In: ASTM Manual Series MNL 13. Manual on Descriptive Analysis Testing. ASTM, West Conshohocken, Penn. MUROZ, A.M., SZCZESNIAK, A.S., EINSTEIN, M.A. and SCHWARTZ, N.O. 1992. The Texture Profile. In: ASTM Manual Series MNL 13. Manual on Descriptive Analysis Testing. ASTM, West Conshohocken, PeM . O'MAHONY, M. 1979. Short-cut signal detection measures for sensory science. J. Food Science 4( l), 302-303. PANGBORN, R.M. 1979. Physiological and psychological misadventure in sensory measurement, or The crocodiles are coming. In: Sensory Evaluation Methods for Practicing Food Technologists, IFT Short Course, Institute of Food Technologists, Chicago, IL. PERYAM, D.R. and PILGRIM, F.J. 1957. Hedonic scale method of measuring food preferences. Food Technol. ZZ, 9-14. ROHM, H. and RAABER, S. 1991. Hedonic spreadability optima of selected edible fats. J. Sensory Studies 6, 81-88.
72
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
ROUSSEAU, B. 2001. The beta-strategy: An alternative and powerful cognitive strategy when performing sensory discrimination tests, J. Sensory Studies 16, 301-319. ROUSSEAU, B. and O’MAHONY, M. 2001. Investigation of the dual-pair method as a possible alternative to the triangle and same-different tests. J. Sensory Studies 16, 161-178. SHEPHERD, R., SPARKS, P., BELLIER, S. and RAATS, M.M. 1991. The effects of information on sensory ratings and preferences: The importance of attitudes. Food Quality and Preference 3, 147-155. STONE, H. 1992. Quantitative Descriptive Analysis (QDA). In: ASTM Manual Series MNL 13. Manual on Descriptive Analysis Testing. ASTM, West Conshohocken, Penn. STONE, H. and SIDEL, J.L. 1993. Sensory Evaluation Practices. Academic Press, New York. WAKELING, I.N., HASTED, A. and BUCK, D. 2001. Cyclic presentation order designs for consumer research. Food Quality and Preference 12(1), 39-46. MAXIM0 C. GACULA, JR. Sensory myth is a subject that has been in the sensory closets for the last two decades or so. Sensory scientists often used both traditional practices and unsubstantiated rules of thumb when applying sensory methods to various situations. There is nothing wrong with this approach of mixing procedures if the combination results in new scientific standard operating procedures. If the uncritical mixing of methods and rules of thumb remains in the closet, then it will be destructive to the advancement of Sensory Science, and the myths will continue to be silently propagated. The myths in Sensory Science were first brought into the open by Rose Marie Pangborn in 1979 and most recently by Chambers and Setser (1993). It is recommended that sensory professionals both in the academia and industry should revisit these publications and perhaps update those practices and beliefs with scientific confirmation. Then we do not have to ask ourselves - are they myths or reality? In this chapter, Dr. Moskowitz gave four myths, provided scenarios where they probably came from, and gave scientific basis to disprove the myths. Consumers can evaluate more than two samples as indicated by Moskowitz. One objective of giving only two samples per evaluation is to be able to estimate carry-over effects, and not that the consumer cannot evaluate more than two samples. If more than two samples are evaluated, then it is advisable to provide a “control” or “reference” sample in the set to prevent the so-called “wandering scores.” When the number of samples is large, say 12, consumers can still
SENSORY MYTHOLOGY
13
discriminate but the samples are usually given in the form of a balanced incomplete block design presentation for two reasons: convenience to the panelists and logistic reason in the preparation and administration of the consumer test, respectively. For highly flavored or spiced products, the number of samples per presentation would likely be two, again to provide estimates of carry-over effects and possibly re-analysis of the data if carry-over effects are large. Nothing much can be said of myth #2 which deals with “state of the art” laboratory. Although, no studies can be cited, it is a practice that good Sensory Science requires good laboratory and environmental conditions. Experience and common sense come into play and can be considered a “scientific” confirmation. The third myth is basically covered in Chap. 7. From a statistical viewpoint, we use a consumer panel of appropriate sample size (see Chap. 14) to rate degree of liking to provide good representation of the population. Using expert panels, whether based on training or experience, would not be a good representative sample of the population. The fourth myth is surprising and deserves a second look. The role of statistics in Sensory Science is secondary to the competence of the sensory professional. Statistics provide information whether the results of a study are due to experimental treatments and not due to chance variation. Statistical analysis provides fundamental information to be used in the correct interpretation of results of the research study. Interpretation should consider both the experimental aspects and data analysis findings. The simplest statistical method that can provide adequate results is always the best choice, and the result of this choice is generally easy for the statistical client to comprehend. Muiioz cited several myths in detail, which would be a good reference for sensory professionals when making decisions for a particular research situations. In general, the writer’s views on some of these myths are as follows: Modifications of techniques, making shortcuts, etc., should be based on experience, internal research data if available, and knowledge of the product. Of course, it will be great if scientific evidence exists. The choice of P-value (Type I error) is primarily based on two points: acceptable risk and repeatability of results. The tradition of using P si 0.05 which is used in the scientific world is based on these two points. How repeatable would be the result of a study if we set P=0.20? Because the repeatability would be lower, the result of the study is often referred to as “directional.” In drug clinical trials, the P-value is set at higher levels, P S 0.01, because of human risk and the need for high repeatability of experimental results.
74
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
Another factor that is often not mentioned is the fulfillment of the statistical assumptions of the test statistics, i.e., normality, equal variance, independence. These assumptions affect the choice of P value. As a protection for unsatisfied assumptions, we aim at using higher P-value (higher level of significance). The purpose of replication is to estimate residual or pure error. Test of significance of treatment or product effects is based on residual error. In the strict sense, without estimate of residual error test of significance cannot be done, hence replication is important. Furthermore, without replication the interaction of panel and products cannot be estimated. Interaction is an important part of evaluating judges’ performance. Replication also permits the estimation of panel repeatability. In the case of research guidance panel and consumer tests, where the sample size N is large and replication is impractical to do, the residual error is compounded with panelist effect. Because of large N, the compounded residual error would be an acceptable estimate of residual error. It is worthwhile to mention the myths that Chambers and Setser (1993) reported and they provided a very good review of literature regarding the myths that sensory scientists should review and decide for themselves whether it is a myth or a reality. All descriptive panels require extensive training. Consensus descriptive methods give different results than methods employing independent judgments. One scale type is better than another type. Children require simple measurement scales to understand the tasks. At least 8 to 10 panelists are necessary to obtain valid descriptive data. Discrimination among samples requires at least 7 to 9 scale categories. Color codes bias consumer studies. Sample quantity influences degree of liking. It is hoped that in the next decade, real myths will be discarded and more sensory research activities will be based on validated standard operating procedures. This can only be achieved if myths and beliefs are openly discussed and in print. Sensory professionals should not take the issue for granted.
SENSORY MYTHOLOGY
15
REFERENCES CHAMBERS, E. IV. and SETSER, C.S. 1993. Myths and monsters of sensory methodology. Presented at “Advances in Sensory Food Science,” Rose Marie Pangborn Memorial Symposium, Jarvenpaa, Finland. PANGBORN, R.M.1979. Physiological and psychological misadventure in sensory measurement, or the crocodiles are coming. In: Sensory Evaluation Methods for Practicing Food Technologists, IFT Short Course, Institute of Food Technologists, Chicago, IL.
CHAPTER 4 CONTRASTING R&D, SENSORY SCIENCE, AND MARKETING RESEARCH APPROACHES HOWARD R. MOSKOWITZ During the past decade Sensory Science has experienced a crisis of identity in its application. Both sensory scientists working in R&D and market researchers working in the marketing department, or in their own stand-alone department, deal with consumers. As a consequence, marketing and Sensory Science often compete internally to obtain consumer data, and to be seen as the primary source of such data for marketing decisions. Both groups fight for recognition, budget, responsibility, and ultimately the individuals compete against each other for professional growth and opportunities.
How Sensory Science Got into the Consumer Test Business Forty years ago sensory scientists were content to run expert panels, and in some forward-looking companies the sensory scientist also ran the in-house consumer panel. Whereas there would be no conflict between sensory scientists and market researchers on the “expert panels,” all too often there would be severe conflicts in the case of consumer data. The in-house consumer panelists were not exactly ndive consumers chosen to represent the ultimate consumer response, as is the case for market researchers. Rather, the in-house consumer panel was designed as an early stage mirror of what an uninstructed population might say later on. Cost and convenience drove the use of this in-house population. Panelists were cheap to recruit or at least the cost of recruiting them and using them never appeared on the balance sheet, and they were immediately available at the R&D center. The in-house consumer panel thus achieved a level of acceptance based more upon practical considerations of budget and convenience than upon its actual contribution to the business. This happy state of affairs changed subtly over the years. It soon became evident under scrutiny that the in-house consumer panels actually cost a lot of money, that they did not represent the target consumer population, and that the panelists would test relatively few products without becoming bored. That system would break down as the demands on it grew. A sensory professional might be able to recruit panelists at the R&D center to do two tests a day, but could clearly could not scale the “production” up to larger base sizes, with more products, in more geographical locations. In the meanwhile, R&D was reluctant to surrender to marketing research the newly found importance in the corporation attached to providing such early stage sensory data.
78
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
A compromise solution consisted of hiring external panelists, the so-called church groups or affinity groups. These external panelists were not rigorously screened as market researchers might do, but rather they belonged to a group such as a club. The sensory scientist in charge of the testing program would contract with the external organization to screen panelists for consumption of specific items, such as yogurts, snacks, meats, and the like. The researcher would then pay the organization a sizable lump sum (in relative terms, e.g., $1,OOO) for 50 panelists to show up and participate for a two-hour test session. Panelists would then show up, evaluate the product, fill out the questionnaire, and a classification questionnaire. The entire approach was simple, costeffective, allowed the sensory scientist a wider latitude in the panelists used, and got the sensory scientist into the consumer testing business. Typically the panelists would be local residents, living within driving distance of the technical center, but the actual evaluations would be done near the panelist's homes (e.g., at a local church, from which came the expression "church panels"). These church panelists represented the first skirmish between sensory scientists now venturing outside the confines of the R&D walls, and consumer market researchers with the responsibility to test consumers. The final step in the evolution of R&D Sensory Science into consumer researchers occurred when Sensory Science was given a larger budget for tests. The budget typically did not match that given to market researchers to execute similar consumer product research. One possible explanation is that the sensory scientist typically positioned the service as an in-house service, which never appeared on the budget line. This strength would later become a weakness. This foray into consumer testing generated some unexpected consequences. One consequence was that the sensory scientist evolved into a research supplier. Typically, the sensory scientist, a well-educated, technically-oriented professional, usually designed the study and analyzed the study himself, using a field service to recruit the panelists and to execute the actual testing. The second consequence was that Sensory Science began to compete with consumer researchers who specialized in the business of consumer data acquisition. No longer content to do local tests, and fortified with an independent budget, the sensory scientist wanted to travel across the U.S.and other countries, testing consumers from different markets. Sensory Science finally found its foothold in the consumer testing business.
Turf Battles Other
- W h y Sensory Science and Market Research Fought Each
In corporations there are always battles to protect one's "turf. Early stage consumer research constitutes one of these turf issues. Whereas during the 1960s and 1970s sensory scientists had confined themselves to internal panels, in the
CONTRASTING R&D,SENSORY SCIENCE, AND MARKETING RESEARCH
79
mind of the market researcher by the time 1980 rolled around Sensory Science was beginning to encroach on the market researcher’s territory. It was becoming clearer to the market researcher that the early stage “product tests” were in for some strong competition by a more quantitatively oriented, scientifically driven group of researchers - the sensory scientists. Up to then, market researchers pretty well enjoyed a monopoly position. They were slowly losing it. Market researchers doing consumer evaluations, e.g., so-called product tests, would issue long but not very actionable reports, detailing the performance of products on a check list of attributes, and perhaps a small analysis trying to identify what the product developer should do. It should be noted that many of these “inactionable” reports were written by contract suppliers, many who simply did not understand anything about the product, and rather manipulated symbols, reporting the results in a automata-like fashion. Enjoying a temporary monopoly, market researchers often failed to attend to the needs of their ultimate client, the product developer. Desiring a professional appearing product, the market researcher issued long, pedantic tomes, with tables, statistical tests, and precious little profound insight. However, the lack of insight was balanced by marketingoriented language, which said little, but with a lot of words. In more than one meeting, and after hearing the results of the report with its presentation of verbal information without product knowledge, this author (HRM) was wont to exclaim a line from Shakespeare’s Romeo as follows: “They jest at scars that never felt a wound. Within a few years battle royals broke out in various companies as market research on the one hand and sensory scientists on the other secured opposing positions, and accused each other of not serving the ultimate client. That client was hard to identify. Market researchers claimed that they were doing their job successfully because their reports were “marketing-oriented” (read this as being filled with marketing jargon). Market researchers claimed that the marketers could not possibly understand the underlying statistics and the scientific aspects of the sensory scientist’s work, which work they felt should be relegated to the inside of the technical laboratory and not at all made public. Sensory scientists countered by stating that the market research reports, often written by an external agency doing and analyzing the work, were filled with simplistic tabulations, and superficial insights. Sensory scientists felt that the market research reports embodied a profound lack of understanding of the research problem, and were written by individuals who were manipulating verbal symbols. The fact that a market research analyst could take a table of information about several products and comment on the numbers was, to the sensory scientist irrelevant, since the market research analyst often had no real idea about the specific topic that was being studied. For instance, if the test was to determine the sensory impacts of various treatments, then the market research analyst might blithely write about the effects of these treatments in a cold, ”
80
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
detached manner. The analyst might, never have any idea what these treatments were, why they were done, and where these treatments fit into the process of product design, development, and distribution in the marketplace, respectively. Before the early 1990s corporate department silos of marketing and R&D effectively divided sensory scientists and market researchers. The groups usually reported to different vice-presidents, had different budgets, were pitted against each other as part of corporate power struggles, and were never forced to work together. However, a house divided against itself cannot stand. The same applied to consumer research. During the past ten years marketing research and Sensory Science have come to respect each other more, and to work together, sometimes being forced to cooperate by disgruntled management who had no idea what the aforementioned battle royals were all about. Management recognized the need to get coherent, early stage data that would be valid, representative, etc. Management recognized in many cases that by giving the sensory scientist larger budgets but forcing them to work with market research, the ultimate product (useful data) would better serve the corporate needs. The famous English writer and wit, Samuel Johnson, was undoubtedly correct when he pithily remarked that “nothing so focuses a man’s mind as the prospect of being hung in a fortnight. ” Such focus applies today to the sensory scientist and the marketing researcher alike. It is too early to tell whether these battles are finished. As we start this new decade we see some rapprochement. There is yet a cold peace between market researchers and sensory scientists, rather than a warm relationship. Perhaps that will change.
W h y Market Researchers Advance More Rapidly than Sensory Scientists Let’s look at the sensory scientist, the reward structure, and how the sensory scientist progresses through the corporation. As noted above, the sensory scientist typically positions himself as the low cost supplier. When a sensory scientist begins a project it is usually with a constrained budget. The sensory scientist is all-too-often looked upon as a clerical-type individual who can get tests done relatively efficiently and inexpensively. The sensory scientist is rewarded for efficient use of time, and often required to do his or her own setup, recruiting, data analysis and reporting. Indeed many sensory scientists pride themselves on this capability of being the lost cost data supplier, if not necessarily at the end of the career certainly just after they begin their employment. The management reinforcement given to the sensory scientist is seductive. When an in-house sensory researcher does a major project “solo, ” he is often praised. Such praise motivates, especially young researchers, who then believe that they can do almost anything that outside researchers can do. To some extent they are right, since the in-house sensory researcher is educated,
CONTRASTING R&D, SENSORY SCIENCE, AND MARKETING RESEARCH
81
familiar with statistical computer programs for data analysis, and can act fairly expeditiously as a general contractor for a research project. Over time, however, the sensory research continues to be publicly praised for coming in under budget. Praise for new ideas is less forthcoming ... perhaps because budgets are easy to measure, budget constraints are the first things to impose in a tight year, and a positive bottom line, heroically achieved through one’s own labor is easy to see and to reward. In recent years some of the negative aspects of achievement and the relegation to a “maintenance” organization have been lifted from the sensory researcher by the availability of small budgets with which to do field work such as panelist recruitment, data acquisition, etc. Nonetheless, for the most part and despite corporate promises, the sensory scientist continues to be looked upon as the low cost data provider. To some degree the sensory researcher has brought this upon himself, because he has fallen into the trap of believing that saving money is better than coming up with critical answers. Many in-house sensory scientists, unable to compete on an intellectual or servicekapabilities platform with outside research specialists, take the position that they should be used because they are less expensive. Furthermore, because many sensory scientists are at the bottom of the research ladder, they do not go to meetings, are not fertilized by new ideas, and have little or no budget to explore ideas. They are trapped by their own efficiency. Things are getting better, but it may take a new generation of researchers, with a grander vision, to overcome the “death spiral” of low cost as a raison d’Ctre in the corporation. The market research industry differs from the sensory research business. Typically, the in-house researcher is assigned a budget that can be used to contract with outside researchers who actually do the work. Consequently, it pays for the market researcher to minimize the cost of data acquisition but also maximize the intellectual power of analysis. The astute, forward-thinking market researcher can use this budget to hire the best brains in the industry, or at least the best brains that can be afforded within the limited budget. In contrast, the sensory scientist is hired to be the brains. Any attempt to hire additional people to analyze the data is seen as the sensory scientist’s shortcoming, and is duly punished, either overtly or covertly. Given these corporate reward and punishment schemes, it should come as no wonder that the novice sensory scientist and the novice market researcher should turn out so different, or that the market researcher enjoys a somewhat faster track to the higher echelons in the corporation. Sensory scientists are rewarded for doing more of what they are doing; market researchers are rewarded with new responsibilities, and often move out of the research field altogether as they obtain a broader business perspective from their successful projects. Furthermore, market researchers develop networks of smart outside consultants, because they interact with these consultants on a project basis.
82
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
Sensory scientists, with rare exception, do not contract out to full service research consultants who bring out new ideas. Sensory scientists typically contract with field services for data collection. They do not get new analysis ideas for these meetings - other than valuable ideas about efficiencies and cost savings.
R&D Sensory Science Versus Market Research Specifics in Actual Research Studies
- How They Deal with
(1) Attributes in the Questionnaire. Sensory scientists differ from marketing researchers in the attributes they select for tests, the rationale for the selection, and the interpretation of the attributes. Sensory scientists generally work with product developers. In the Sensory Science literature one rarely comes across a “laundry list” of attributes. Most of the attributes are straightforward, easy to understand, and assess the non-evaluative characteristics of products. For example, if the sensory scientist works with chocolate candy, then the sensory attributes will refer to the appearance, aroma, tastehexture, of the chocolate part of the candy, and perhaps if the candy is filled, to the filling part. The attributes are usually quite well-thought out. The sensory scientist will also ask questions about liking. Typically the questionnaire stops there. The attributes are manageable, limited in scope, typically can be referenced against known physical standards of identity for the attribute (even if those standards are not used in training), and can be easily understood by the product developer. One gets the impression from reading sensory scientists reports that the data are “cut and dry,” that the ambiguity in the meaning of the attributes has been reduced as far as possible, and that there is a minimal number of attributes, sufficient to answer the problem. Sensory researchers are clearly proud of their ability to tighten up their questionnaire, and are reluctant to put the panelist through an agonizing interview. To a great extent the look and feel of the sensory scientist’s questionnaire comes from the narrow focus that the researcher has - viz., to provide feedback to the product developer. Finally, a determining factor in the way that sensory scientists create questionnaires and use attributes come from the reality that the sensory scientist sits in the R&D facility, knows the internal “client,” and is responsible for the validity and utility of the data. Having to see one’s clients day after day focuses the sensory scientist’s mind on the minimal amount of valid information that one needs to answer the R&D client’s questions. More data than needed could end up creating unnecessary problems. Less data than needed could end up creating an experiment that did not work. Sensory scientists quickly learn to optimize the nature of the questionnaire so that it does the job, but parsimoniously and efficiently. We can contrast this tendency to minimize with the conventional questionnaire used by marketing research. Marketing researchers are also beholden to
CONTRASTING R&D, SENSORY SCIENCE, AND MARKETING RESEARCH
83
other corporate constituencies such as marketing and advertising. Although part of the questionnaire solicits panelist responses to the product (e.g., liking, amount of, etc.) a lot of what the marketer wants to discover involves the “image” of the product. These image attributes can vary from questions about when the product should be used, to questions about the “personality” of the product (e.g., brand personality), to other descriptive attributes, to fit to concept, etc. Market researchers are not scientists, unlike many sensory researchers. Market researchers often lack the discipline to know what they are looking for when they set up the study, and feel that it is always safer to include more questions rather than fewer. Furthermore, market researchers responding to the different disciplines are pulled in many different directions. It is easier for them to give into the disparate requests than to fight their constituents, saying no to one or another request. Finally, marketing researchers often contract out the fielding to an external research agency, and don’t have to worry that the questionnaire is long, repetitive, and provides little relevant data. As long as the test is conducted “validly” so that the right panelists are selected, the panelists properly complete the questionnaire, the market researcher often remains unconcerned and uninvolved. The end result of the questionnaire comprises a set of tabulations of products and an extensive report card that is handed over to another department. (2) Criteria To Select Panelists. Sensory scientists often fuss about screening panelists to participate. Sensory screening may comprise threshold tests, ability to scale intensities, or tests of the ability to name common food flavors in a solution or name a common texture when blindfolded. Marketing researchers also screen panelists, but their screening simply determines whether or not the panelist was actually contacted, and whether or not the particular panelist belonged in the sample. Many market researchers dealing with product testing for the first time are overwhelmed by the attention paid to panelist performance, including selection and monitoring, discrimination testing and the like. Whereas market researchers are less stringent than sensory scientists in their choice of panelists, it is not clear who is right. Indeed, discrimination testing of panelists does not necessarily lead to better panelist performance on a scaling task. In a paper written more than 20 years ago, Moskowitz et al. (1980) showed that panelists who passed a double-triangle discrimination screening performed as well, but not better, than did other panelists on a scaling task. The scaling task required the panelists to rate the perceived sweetness of a set of beverages varying in the amount of sweetener. That is, just because a panelist performs well on the discrimination task does not mean that the panelist will perform any better when the task is to rate the perceived magnitude of the stimulus on the same attribute.
84
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
(3) Scales. Sensory scientists tend to be more particular about the scales that they use. For example, when it comes to the different types of liking scales it is the sensory scientist who worries about whether or not the liking scale should be balanced, comprising equal number of scale points for likes and dislikes, or unbalanced comprising perhaps more scale points devoted to liking than to disliking. Market researchers are less concerned about scaling issues, and the proper treatment of scale values. It is not that the market researcher is disinterested in scales as much as the market researcher is more interested in the actual interpretation of the end results - e.g., is the product acceptable or not? Furthermore, most market researchers compute percentage of respondents who assign a specific scale value (e.g., 4 and 5 on an anchored five-point purchase intent scale). Thus, market researchers are accustomed to looking at percentage data, not at metric data. With percentage data many of the scaling issues become less relevant. This focus on the different properties of scales, metric versus percentage, becomes critically important when the research task is to identify aspects of the product that need improvement. Market researchers will report the percent of panelists who feel a product has too much or too little of a specific attribute. Sensory scientists, in contrast, who must work with the product developer, will report the magnitude of over-delivery or under-delivery of an attribute. Product developers need information regarding the magnitude of change they should make to the product, nor information regarding the proportion of people who feel that the product needs a change in a specific direction. Marketing research information simply indicates a problem, but does not prescribe the solution. Part of the indifference of market researchers to the scaling issue comes from their intellectual heritage. Many of today’s market researchers come from educational backgrounds in the social science, rather than from the physical sciences, or food science. The issue of scaling in social science is not as critical as the issue of scaling in psychology. Psychologists seek the best possible scales in order to measure perception, and use external criteria of validity to calibrate heir scales. For instance, a sweetness scale should have values proportional to the concentration of a sweetener. Sociologists and others in the softer social sciences do not have these measures of external validity, and so the “proper” scale or the “right” scale is not typically an issue. This indifference to the “right scale” transfers to their practice of market research. The bottom line is that as long as the scale discriminates and is amenable to statistical treatment, the scale is acceptable to the marketing researcher, especially if the scale possesses face validity. Face validity simply means that the scale appears to measure what it is claimed to measure. One of the interesting and personally relevant aspects of this difference between sensory scientists and marketing research in attitudes to scaling comes from the author’s experience with magnitude estimation (Stevens 1975).
CONTRASTING R&D, SENSORY SCIENCE, AND MARKETING RESEARCH
85
Magnitude estimation refers to a class of scaling procedures wherein panelists assign numbers so that the ratio of the numbers matches the ratio of perceptions. Magnitude estimation is widely used in experimental psychology, especially by psychophysicists interested in the relation between stimulus magnitude and perceived sensory intensity. When introducing magnitude estimation to the business community the author discovered two radically different attitudes. Sensory scientists fought long and hard against the scale, more on emotional grounds than on rational grounds. To the sensory scientist it appeared that by accepting magnitude estimation as a scaling method they were giving in to research methods that were imported from other fields. There was little in the way of analysis of the scaling data in order to determine whether, in fact, magnitude estimation performed well in an applied setting. In contrast, market researchers were also skeptical, but not on the basis of scaling issues. Marketing researchers more willingly accepted that the scale lived up to its publicity, and accepted the literature from psychophysics, which, in fact, the sensory scientists refused to accept. To marketing researchers the key issue was whether the procedure could be implemented in a field test, with untrained panelists, with interviews administered by unsophisticated interviewers. Another concerned the discrimination power. Would the discrimination power suffice for the business issue to be answered? Marketing researchers asked more pragmatic, less emotional-laden questions. (4) Base Size. Marketing researchers are schooled in large samples of panelists, with relatively little information from each panelist (especially in product testing). Marketing researchers often feel comfortable only when the panel size exceeds 100 or more, and feel uncomfortable making decisions with smaller panel sizes. Part of this discomfort with small sample sizes comes from the feeling that with a base size of 100 few critics can complain about sampling error. Also, with a base size of 100 consumers the market researcher can “cut” the data in different ways to look at many subgroups in the population (e.g., product users, product non-users, older vs younger panelists, etc.). With base sizes exceeding 300-400 panelists, the market researcher can look at many other subgroups of panelists in the population. To the typical market researcher it is well worth limiting the scope of the project to one-three samples, in order to afford this large base size. The actual data shows that researchers may not need the very large base sizes recommended by market researchers. In fact, the sensory scientists may have been right all along - beyond 30-50 panelists the data stabilize, and the decisions made with the base size of 50 often parallel those made with larger base sizes. The author published one article on base size (Moskowitz 1997), showing that with base sizes of 50 one can obtain results similar to the results obtained with hundreds of panelists. Indeed, in a conjoint measurement task for
86
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
concept development the author again demonstrated that the results with 40-50 panelists match those obtained with 420 panelists. The study involves the reaction of consumers to test concepts, and the estimation of the “utilities” or persuasive powers of the individual elements in those concepts (Moskowitz 2000). An example of this stability appears in Fig. 4.1.
1
t
........
0.7
..................
0.5 0.4
I
0
100 Base Size In Sub Sample
FIG. 4.1. SIMPLE CORRELATIONS BETWEEN THE UTILITIES OF THE ELEMENTS IN A SUB-SAMPLE (OF VARIOUS SIZES). AND THE UTILITY OF THE FULL SAMPLE OF 420 RESPONSES Five sub-samples for each base size were pulled. In some cases the correlations were very close for a given base size so that the points overlap.
The issue of base size can, like so many other controversies, be traced to differences in intellectual heritage. As noted before, market researchers trace their heritage to public opinion polling. Many market researchers simply want to know whether or not a product is or is not acceptable. Thus the market researchers opt for large base sizes because the large base size reduces the sampling error around the mean. In contrast, the sensory scientist wants to develop quantitative relations between stimuli and subjective responses. The relation can be developed with data from even one panelist. The large base size stabilizes the average, cancels the “noise” in the system, and allows the relation to come through more clearly.
CONTRASTING R&D, SENSORY SCIENCE, AND MARKETING RESEARCH
87
(5) The Impact of Off-Theshelf Computer Software on the Fight for Control. It should come as no surprise that in a corporation there is the fight for control over the processes of “primary” data acquisition, viz., from consumers. As the resources shrink, and as the food industry becomes ever more competitive, sensory scientists and market researchers often fight viciously with each other for the opportunity to control high visibility projects. These projects include category appraisal and product optimization, where the consumer input is critical for strategic guidance and for new corporate initiatives. Not surprisingly, the fight for control is often spurred on by the availability of software, such as that used in experimental design and brand mapping. R&D sensory scientists, market researchers, and market research suppliers (also known as “vendors”) are all emboldened by the availability of technology that could help them maintain their jobs and their statushcome. This fighting is expected to continue and intensify over the next decade as the food industry becomes ever more competitive, and as the opportunities for advancement dry up. In contrast, in other industries, such as telecommunications we do not see this type of hostile competitive behavior between the laboratory-based scientists and the market researchers, perhaps because the beckoning opportunities are so enormous that it makes little sense to compete in order to hold on to one’s territory. There is more to be gained in going towards a big future. Such a cooperative future does not appear to be the case in the food industry, at least for the next few years.
REFERENCES MOSKOWITZ, H.R. 1997. Base size in product testing: a psychophysical viewpoint and analysis. Food Quality and Preference 8, 247-256. MOSKOWITZ, H.R., JACOBS, B. and FIRTLE, N. 1980. Discrimination testing and product decisions. J. Marketing Res. 17, 84-90. STEVENS, S.S. 1975. Psychophysics, An Introduction To Its Perceptual, Neural, And Social Prospects, John Wiley & Sons, New York. ALEJANDRA M. m 0 Z
The discussion of the roles of sensory and market research professionals has always been a controversial topic. In the past, all companies experienced this conflict between the two groups of professionals. However, there have been some changes in the past decade. In some companies actual partnerships exist between the two groups in regular project work. In those cases, both functions understand the value of each group’s work and collaborate in projects without friction. In other companies, however, the conflicts and rivalries continue. Why
88
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
does this conflict exist in many companies? Why has it been possible for some companies to create an amiable working relationship between both groups? These conflicts exist when professionals believe that there is an overlap of work and responsibilities. Each group thus believes that the other party trespasses their working boundaries, and thus invades their professional space. Specifically, sensory/consumer scientists and marketing research professionals both work with consumers, the same source of information. Differences, conflict and even anger arise when one group believes that the other group is addressing the same issues. This closeness of mission, source of raw materials, and methods inevitably creates turf conflicts. Rivalries will inevitably continue as long as there is no clear understanding and public delineation of the expertise and responsibilities appropriate to each group. Tension will persist as long as there is no respect for each other’s work. Tension will persist as long as professionals in each group fail to acknowledge, willingly or unwillingly, the values of the work done by the other group, and its importance both to the scientific and to the business communities, respectively. In a provocative and insightful paper, Carter and Riskey (1990) discussed some of their views on the competitiveness vs. cooperation of both groups. These authors indicted that neither group can be successful without the success of the other. Both groups should cooperate, and clearly understand and work within the purview of their accountabilities. The marketing research accountability is the consumer target because the objective is to find and reach the consumers to whom the product is most appealing. The sensory evaluation accountability is the product target because the objective is to determine when a company has the best product for the consumer target. Market research and sensory consumer research rivalries often occur when both groups run guidance tests with consumers. This aspect should not even be an issue if the accountabilities of each group are examined. As stated by Carter and Riskey (1990), sensory research should be responsible for the bulk of the product testing, given this group’s accountabilities. Product tests are a small piece of the market/marketing research accountability, which usually sum up to only 35 % of a marketing research department’s efforts. The remaining 65 % should be devoted to market tracking (30%),advertising, copy and promotion research (20%),and idea screening and concept testing (15%). It is vital to recognize the true differences between studies run by the sensory scientist versus those run by the market researcher. Both professionals use similar tools, but the objectives differ. In most cases, the consumer tests from both groups should differ in objectives, philosophy, methodology used, number of samples, type of recruitment, questionnaire design, data analysis and uses of data. If in fact market researcher and sensory consumer tests are so different, then it makes sense that both groups should work together to complement their results, as discussed below.
CONTRASTING R&D, SENSORY SCIENCE, AND MARKETING RESEARCH
89
Stage of Development Market research and sensory professionals should work together in the same project at different stages in order to meet different objectives. Tests run by the two should then complement each other; i.e., one test should not be considered to be the substitute for the other. In most cases, sensory research guidance consumer tests might well be conducted first in order to obtain preliminary sensory information. This sensory information should be used by product developers in order to improve products. Once all the sensory product issues have been addressed and the final samples made ready for national testing, the project should be handed over to market research for further testing. Ideally, this sequence should apply to many of the research projects. There are some exceptions discussed at the end of this section. As Carter and Riskey (1990) indicate, a new product development process could be visualized as a chain of interrelated phases, with sensory research and marketing research cooperating and complementing, rather than competing.
Objectives The objectives of the two groups should differ, reflecting their different business goals. Sensory scientists who execute consumer tests at the preliminary stages of a project ought to adopt a research guidance perspective. Sensory consumer research guidance tests should focus on and be limited to aspects of the products’ sensory properties and integrity. Thus for the most part, sensory consumer tests deal with the assessment of liking/preference and sensory attributes. Given this viewpoint, therefore, sensory-based consumer tests are best utilized at the project’s preliminary stages to obtain product information for product guidance, or to screen products. Market research tests are best conducted after the sensory consumer test data have been analyzed and acted upon. Sensory consumer research guidance test results should be used to screen and/or improve products. Market research test results should be used in order to further understand the product’s sensory attributes uncovered by the sensory research, to address selected sensory issues, but mainly to address non-sensory aspects such as purchase intent, market potential, etc. It can be said that market research tests collect information that can be classified as “active” responses. Not only do market researchers measure liking attributes but they also measure other, more decision-oriented attributes, such as purchase intent, brand comparison, etc. Another aspect which differentiates sensory and market research tests is the perspective or framework of the research conducted by each group. Sensory scientists conduct these tests with a research perspective. The tests are usually conducted by sensory groups. The tests are geared to gather information or to build a database for specific research or quality control applications. Some of
90
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
these applications include the study of consumer/descriptive/instrumentaldata relations, or the study of relations between consumer perceptions versus ingredient concentrations or sensory attributes. Given the foregoing distinctions in objectives and perspectives, these additional differences appear to hold between consumer sensory and market research tests.
Project Stages. Most sensory consumer tests are usually conducted at the project preliminary stages, market research tests at latedfinal stages. Type of Products. Sensory consumer tests may involve more prototypes and products than do market research tests. Market research tests may include improved, final prototypes and competitor’s products. Number of Samples. Sensory consumer tests may include many samples/prototypes, many of which are optimized. Market research tests often examine fewer improved samples. It is unusual for a market research test to include unimproved alternatives, although many market researchers do work at the basic sensory level in some of their projects. These early-stage projects are the exception for market researchers, not the norm. Participants. Sensory consumer tests may be conducted with employees or naive consumers. A limited number of subjects participate (50-200). Consumers are often recruited based on product usage and on a limited number of other screening criteria. Market research tests are virtually always conducted with naive, unpracticed consumers. The number of consumers recruited may be large (100-SOO).Market researchers often impose more stringent recruitment criteria, such as product usage, income, members in household, lifestyle, attitudes and other criteria. Test Venue. Sensory consumer tests usually occur in only one or two markets. Market research tests may occur in more markets, because the goal is to represent the full range of the consumers. It is in the test venue, and in the nature of the participants that one sees the difference between market research and Sensory Science studies of the same product. Questionnaires. As emphasized above, the objectives of the sensory scientist and the market researcher differ from each other. Therefore, the difference in objectives manifests itself in the structure of the questionnaire. For example, the sensory consumer tests may be confined to ratings of liking, preference and attribute diagnostics. The market research questionnaires may be
CONTRASTING R&D. SENSORY SCIENCE, AND MARKETING RESEARCH
91
much longer because the market researcher incorporates many more attributes, such as use, image, purchase intent, brand comparison, etc.
Data Analysis. The data analysis done by sensory professionals may be more extensive than the analysis done by market researchers. The sensory professional often applies a larger number of statistical analyses, including simple inferential statistics, analysis of variance, correlations, and modeling. The majority of market researchers come to the data from the point of view of describing the results. Many, albeit not all, sensory scientists approach the data with a deeper interest in significance of differences, relations between variables, etc. Expense. Because of the larger number of marketshest locations and number of consumers used, most market research tests are more expensive than sensory consumer tests. The foregoing differences in the Sensory Science versus market research scenarios and sequence of tests apply to most routine research projects. The role of both groups can be clearly delineated in the way described above. These distinctions between the tests run by the sensorykonsumer insights groups versus those run by the market researchers are not as clearly defined for some projects, particularly the advanced research projects, such as product optimizations, category reviews, and studies that address consumer understanding. These projects involve a large number of consumers, a large number of products, should be conducted with naive consumers, ideally need to be conducted in one or more test locations/markets, and are expensive. Traditionally, these tests have been conducted by market researchers, mainly because of the high costs of such tests, and the need to combine the marketing and the technical perspectives in the analysis and interpretation. Market research groups with larger budgets can easily fund these projects, and work with outside vendors. However, more and more sensory/consumer insights groups are getting involved in these projects. Ideally, these complex projects should be undertaken by both groups in collaboration, allowing each group to contribute its own unique expertise. This partnership can ensure optimal results, provided that the turf issue does not destroy the collaboration. More and more companies are building strong partnerships between sensory and market research groups. These companies should represent a model of business-oriented research for those organizations still dealing with the conflict between both groups. A solution is possible. Sensory professionals should strive to clarify the type and value of consumer sensory research guidance tests, and foster the interaction between the sensory and market research groups.
92
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
REFERENCES CARTER, K. and RISKEY, D. 1990. The roles of sensory research and marketing research in bringing a product to market. Food Technol. 44( 1l), 160, 162.
MAXIM0 C. GACULA, JR. The subject of this chapter is well-known to both Sensory Science and marketing research departments, the problems and issues of which need no further introduction. Moskowitz and Muiloz detailed the issues and probable solutions for understanding and integration of functions between sensory and marketing research departments. From the author's (MCG) perspective, two factors should be emphasized in the next decade. (1) Management education of the roles of Sensory Science and marketing research in the various stages of product development. If management is not convinced of these roles, then the problems will continue to exist and fester. The different consumer product industries operate in a variety of different ways. Thus the roles should be defined with the ultimate goal of producing a winner, consistent with the way the specific company operates. (2) The Sensory Science department is more effective in supporting company Research and Development than supporting Marketing Research. In this supporting role, emphasis should be given to cost and risk, and to the problem solving capabilities of sensory evaluation department. It would be impractical and costly to conduct marketing research tests with consumers, when the stimuli comprise 16 prototypes (2 x 2 x 2 x 2 factorial) involving four ingredients. It would be difficult for marketing research to solve problems dealing with product defects. The recent paper by Van Trijp and Schifferstein (1995) is excellent reading material for the management (Item 1 above). From this paper, MCG expanded and modified the content of a table in this paper. The modified table is shown in Table 4.1. Some of these contrasts are also mentioned by Moskowitz and MMoz. Perhaps, one of the hotly debated issues is the item on Criterion: Internal vs. External Validity. One of the reasons for this issue is that one often must extend internal results of a study to achieve external validity. Experience shows that some Research Guidance Panel results are externally validated and some are not, especially for the newly introduced product. Thus, test results of
CONTRASTING R&D. SENSORY SCIENCE, AND MARKETING RESEARCH
93
this criterion cannot be generalized and one should stay within the bounds of this criterion.
Item Primary focus
R&D Sensory Science Approach Focus on product and production processes; sensory properties of products; sensory optimization.
Marketing Research Approach Consumer behavior; matching expectations;packaging; label; advertising and positioning; confirmation; purchase intent; perceived quality; demographics. Internal validity; results within External validity; results can the confine of R&D pending be generalized to target popurelease to large-scaleconsumer lation. test. Trained or expert for Descrip- Naive consumer; panelists tive Analysis; in-house and/or representative of the target consumer Research Guidance population; importance of sample size; location of test. test. Core attributes of product both Augmentedproduct to enhance physical and sensory aspects; packaging, brand name, price, R&D prototypes; reformulate advertising and distribution of if sensory defects are found generic to enhance durability, before release to marketing utility, and convenience; R&D consumer test. screened prototypes used. Specialized terminology; Consumer terminology; Focus “Word Meaning” study to Group. understand consumerterminology. Strictly controlled under R&D Moderately controlled or environments; use of intensity product use conditions; use of scales; use of hedonic for hedonic. JAR scales, and Research Guidance Panel Test purchase intent. (Internal Validity).
I Product
Terminology
Test methods
I
I
One item in Table 4.1 that needs further discussion is Terminology. The sensory scientist is constantly faced with the choice of attributes to be used in Research Guidance Panel tests. Is the meaning of the attribute understood in the same way by the different consumers in the study? Aside from the Focus Group, a quantitative method known as Word Meaning Analysis reported by Jones and Thurstone (1955) is useful to answer this question. This method quantifies the meaning of a word in order to understand perception of products or stimuli
94
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
during evaluation. Word Meaning Analysis is extremely useful in understanding consumer terminology - a key to a successful sensory evaluation program, and in questionnaire development as discussed in Chap. 11. Gacula and Washam (1986) applied this method to scaling word anchor for off-flavor as applied to shelf-life study. Table 4.2 shows the results of a Word Meaning Study for the word “mildness.” What does mildness means to the consumer? Of the 109 words and phrases used in the study, those particular wordslphrases that describe both ends of the scale appear in this table. The rating scale was an unstructured scale six inches in length where 0 = not mild and 6 = extremely mild. From Table 4.2, the words “harsh” to “gritty” can be used to describe a “not mild” perception, and “conditions skin” to “baby-soft” can be used to describe the extremely mild perception. An important marketing application of this result would be in advertising and in claim substantiation. As indicated by Moskowitz in Chap. 5 , understanding of the meaning of the word contributes to the validity of the data.
WON
TABLE 4.2. MEANING ANALYSIS OF THE WORD “MILDNESS” WordlPhrase 1 Mean score I Harsh 0.59
I
Bums Rough Irritated Itches Scaly Wrinkled Brittle Sandy Gritty
0.67 0.71 0.71 0.71 0.73 0.76 0.77 0.77 0.80
Scale: O=not mild, 6=extremely mild; Sample size N=150.
CONTRASTING R&D, SENSORY SCIENCE, AND M.4RKETING RESEARCH
95
There are other items not covered in Table 4.1 such as scaling issues, which are dealt with in Chap. 9, and many other issues encountered in practice by sensory scientists. It is hoped that this chapter will continually enhance serious and honest dialogue between Sensory Science and Marketing Research Departments. The detailed discussions by Moskowitz and Muiioz should clarify the existing issues, so that they can be seriously addressed and resolved in the next decade or so.
REFERENCES GACULA, JR., M.C. and WASHAM 11, R.W. 1986. Scaling word anchors for measuring off flavor. J. Food Quality 9, 57-65. JONES, L.V. and THURSTONE, L.L. 1955. The psychophysics of semantics: A Experimental investigation. J. Appl. Psychol. 39, 31-36. VAN TRIJP, H.C.M. and SCHIFFERSTEIN, H.N.J. 1995. Sensory science in marketing practice: Comparison and integration. J. Sensory Studies ZO, 127- 147.
CHAPTER 5 VALIDITY AND RELIABILITY IN SENSORY SCIENCE HOWARD R. MOSKOWITZ One of the key tenets of research is that when the researcher performs the same action he should obtain the same results. This is known as reliability, or reproducibility. Another key tenet in research is that the research should accurately reflect what it purports to measure. This is known as validity. Validity is simply that the research process truly measures what it says it measures. Reliability does not mean validity. These two very-important concepts in science are altogether different, yet they are linked in the minds of researchers. Reliability at the gross level is fairly easy to achieve after aggregating data across panelists. Care must be taken when running the study that the same steps are followed, especially if these steps influence the results. In most laboratories the practitioners are taught about the obvious factors that can influence reproducibility. Some of these factors are the selection of panelists, the correct preparation of stimulus materials, the correct explanation of the scale and attributes, the correct sampling of product, the randomization of the product to avoid systematic bias, and the like. At the individual level matters may become more complex. Averages across many people smooth and reduce disturbing secondary influences that affect reliability. On an individual level reliability may not be as clear, and may be much more difficult to establish, compared to the reliability at the group or aggregate level. Indeed, at the individual-level data, much more data may be needed from a single point in order to demonstrate reliability. The need for the additional data comes from the fact that at the group level the many data points from different people cancel out the noise, allowing the basic relation to emerge. At the individual level typically there are no such massive numbers of data points to allow error variability to cancel itself out. Validity is a much more nebulous concept, and harder to demonstrate. What does it mean to say that a sensory test is “valid”? For example, assume that the panelist is instructed to rate degree of liking. How does one establish that the rating for liking for a variety of products is really valid? What does validity of the liking scale mean? At the easiest level one can interrogate the panelist in order to determine whether or not the panelist understands the meaning of the attribute. If the panelist does not understand or cannot define the attribute, then perhaps we might be somewhat worried that the liking scale data are not valid, because the panelist doesn’t seem to understand the terms. But what happens if
98
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
the panelist understands the meaning of the term liking? What does the liking rating mean? Does it mean that the panelist will choose the higher liked product over the lower liked product all the time? Does it mean that the panelist will distribute the selection of products in some way proportional to the relative liking ratings assigned to two or more products? Does it mean that a higher liked product will sell more in the market? Does it mean that the higher liked product will generate fewer complaints? Or does it mean that the higher liked product will be rated as having more of other characteristics such as “quality.” Rarely do sensory scientists have to think about the validity of their measures, perhaps because they are not in contact with the final marketing measures. Sensory scientists are always on guard to demonstrate the reliability of their measures. Indeed, the first check of “validity” that a marketer makes is to determine whether the data are “reliable”; viz., did the same answer emerge this time as last time. Yet, one of the key aspects is overlooked - do liking ratings predict future behavior such as market performance? The relation of liking to sales is clearly an important area, and an aspect that, if established, also establishes the validity of the liking scale. There are a variety of topics falling under the general rubric of validity.
Face Validity - the attribute scale appears to measure what it says it measures. For instance, when it comes to liking, the rating scale dealing with liking should require the panelist to rate liking or something that appears to relate to liking such as purchase intent or quality. A scale that asks another attribute, such as novelty, would not possess the face validity necessary for someone to believe that the scale is really dealing with likes and dislikes. The scale is simply not face-valid. In truth the “novelty” scale might perfectly well predict degree of liking, but no one would accept data because on the surface the relation between novelty and liking simply does not appear to have validity at all. Construct Validity - the attribute being measured really captures the essence of what is being studied. In simpler terms, if the attribute is “liking,” then does the liking scale really capture the construct of product acceptability”? This is a particularly vexing question, because in Sensory Science it is easy to ask the panelists many questions, but really hard to know what is meant by many of the attributes. For descriptive panel work construct validity is simpler - the researcher uses reference standards, and defines the attribute in terms of the reference standard. That is, the attribute is defined in terms of the sensory characteristic present in the reference standard. The scale value is the magnitude of the sensory perception, which can be compared to the magnitude of the attribute present in the reference standard. It may well be that the reason for the popularity of descriptive analysis early in the growth of modem Sensory Science
VALIDITY AND RELIABILITY IN SENSORY SCIENCE
99
is that descriptive analysis enjoys both face validity and construct validity, whereas other scaling tools (e.g., liking ratings) possess no such construct validity. It is appealing to research scientists in R&D, ignorant of the psychophysics literature, to point at a set of reference standards and claim that the descriptive analysis is based upon such objective, concrete standards.
Predictive Validity - the attribute scale predicts other behaviors such as performance of the product in other tests of different types, e.g., performance of the product in the marketplace, etc. Predictive validity is probably the most important type of validity for sensory scientists. Corporations keep sensory scientists on staff because, presumably, the data provided by these analysts predicts performance in other tests. These data thus inform early development, putting that development on the right business track. To the degree that the sensory scientist provides data that predicts performance in other tests, which in turn predict market performance, the sensory scientist will be perceived as providing valid data. To the degree that the sensory scientists provides data that bears no obvious relation to later results, the sensory scientist will be perceived as providing irrelevant, or even worse invalid data. Unfortunately, it is quite difficult for the sensory researcher to provide solid data that has great predictive validity, especially since the researcher works with relatively small groups of panelists, and tests a large number of products, with each product rated on many scales. It is difficult to tie this sensory-oriented data directly to market performance. Thus the predictive validity for sensory research no longer becomes the relation to market performance, but rather the perception of the other corporate members (e.g., marketing, general management) that the sensory researcher provides valid data to guide development. Predictive validity for sensory research truly becomes a matter of perception, not reality. In the End
- Is Validity Really Just a Matter of Opinion?
S.S. Stevens at Harvard University was fond of stating the “in the end validity is a matter of opinion” (Stevens 1966).As startling, confrontational, and dismissive as this assertion may sound, it has merit. Validity and reliability, two frequently used words in the corporate environment, obey different “rules of the game. ” Reliability is easy to demonstrate. Various methods can generate a statistical index that shows the reproducibility of the data. Whether the data are “correct” or “incorrect” is not a problem. Reliability is primarily a statistical concept; methods to achieve reliability are primarily methodological. There is no appeal to truth. Indeed, if the researcher makes the same fundamental mistake again and again the researcher is doing things reliably, albeit incorrectly.
100
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
In contrast validity is more complex. At its simplest level validity invokes the relation between the data that one collects in the current study and other data that are assumed to be correct. Predictive validity is a good example of this type of relation. Validity is established if one can use the current data to predict performance on another criterion test. At a more complicated level validity requires that one truly understand the meaning of the data that one collects, so that the questions asked in fact relate to some underlying “truth. Many sensory scientists can acquire data to demonstrate predictive validity. Few sensory scientists can acquire data to demonstrate construct validity because it may be hard even to formulate the proper criterion of validity, and in turn devise the proper operational tests to confirm or to disconfirm validity.
REFERENCE STEVENS, S.S. 1966. Personal communication.
MAXIM0 C. GACULA, JR. In this chapter, Dr. Moskowitz expressed his ideas on the validity and reliability of rating scales, in particular the liking scale, as used in sensory evaluation of products. As far as I know, no published work is available on this subject. It is a challenge since there is no wrong response for hedonics. Moskowitz formalized three types of validity in Sensory Science: face validity, construct validity, and predictive validity. First, let us consult Webster’s dictionary definition of validity: one definition is state of being “supported by objective truth or generally accepted authority.” For reliability, it is the state of being “suitable or fit to be relied on.” In validation studies of test methods, reliability is defined as a measure of the degree to which a test can be performed reproducibly within and among laboratories over time (National Institute of Environmental Health Sciences 1997). This definition can easily be adapted to Sensory Science. I agree with Moskowitz that reliability does not mean validity. The sensory scale and method used may be valid for such application, but it is possible that the results are not reliable. That is, the results may be precisehepeatable but is not accurate for the intended purpose. However, how do we measure accuracy in Sensory Science? The closest measurement of accuracy would be a reference standard in descriptive analysis. Without a standard or reference point, accuracy cannot be measured. From the definition, reliability would have two components: repeatability (within laboratory or within test variability) and reproduci-
VALIDITY AND RELIABILITY IN SENSORY SCIENCE
101
bility (between laboratories or between test variability). These components are simple to calculate and one can refer to ASTM (1990, 1992) publications. Some of Moskowitz’s statements on validity of a sensory test to rate degree of liking are open for discussion. If a sensory test is an accepted standard procedure, then by definition it is a valid test. Perhaps, it is the sensory attribute asked in the questionnaire that is invalid or inappropriate. Is there analytical or instrumental data that can validate degree of liking? Using predictability of results as a measure of liking validity is another issue of great interest to sensory professionals. There is no wrong answer for hedonic questions (like/dislike) which contributes to reluctance of predictive validity criterion. Thus predictive validity and construct validity may not apply for hedonics. In practice, however, the norms about a product or product category are used as a measure of validity. Hence one can make statements about face validity and predictive validity.
REFERENCES ASTM. 1990. Standard Practice for Use of the Terms Precision and Bias in ASTM Test Methods. American Society for Testing and Materials, Publication E177-90a, West Conshohocken, Penn. ASTM. 1992. Standard Practice for Conducting an Interlaboratory Study to Determine The Precision of a Test method. American Society for Testing and Materials, Publication E691-92, West Conshohocken, Penn. National Institute of Environmental Health Sciences. 1997. Validation and Regulatory Acceptance of Toxicological Test Methods: A Report of the ICCVAM. NIH Publication No. 97-3981. Research Triangle Park, NC.
CHAPTER 6 THE INTERFACE BETWEEN PSYCHOPHYSICS AND SENSORY SCIENCE: METHODS VERSUS REAL KNOWLEDGE HOWARD R. MOSKOWITZ Throughout its history, Sensory Science has enjoyed a profitable, occasionally symbiotic relation with psychophysics, that specialty of experimental psychology linking physical stimulus and subjective sensory perception. The reason for this close relation is quite simple - Sensory Science studies the consumer response to specific stimuli. The only difference is that the sensory scientist works with consumer products, whereas the psychophysicist has the luxury of working with consumer products, with model systems, or with consumer products that have been altered to exhibit interesting properties, albeit not necessarily commercial ones. Sensory Science and psychophysics share many aspects.
Are Sensory Methods the Same as Psychophysical Methods? The intertwining of Sensory Science and psychophysics begs the question whether or not the methods used by sensory scientists can be classified as variations in psychophysical methods. One simple answer is yes. Psychophysical methods ask the panelist to act as a measuring instrument. Examples of the measurement may be assessing whether or not two stimuli match each other; assessing the subjective magnitude of a sensory impression or a feeling about a stimulus, etc. The key differences are probably the subject matter and the point of view. Traditional or classical psychophysics deals with the properties and performance of the sensory system. The stimuli tend to be simple, pure, easy-tocontrol, although in the early days of psychophysics the stimuli often were complex, everyday things that were “ecologically valid.” The stimuli in psychophysics are often incidental, and simply chosen on the basis of convenience and ability to control. In contrast, the stimuli in sensory evaluation are the major concerns. The stimuli tested by sensory scientists are complex, food-related, and relevant for everyday life. The subject or panelist is simply a means for a bioassay of the stimulus. The actual functioning of the sensory system is not of particular interest to the sensory scientist, other than contributing to understanding how to run a better test and how to understand the response to the stimulus.
103
104
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
Convergent Evolution Histories
-
Similar Methods But Different Intellectual
Despite the similarities between Sensory Science and psychophysics, the two disciplines enjoy radically different intellectual histories. Experimental psychology has a long and distinguished history, beginning with Fechner and Wundt in the 1800s and evolving for nearly a century and a half. A lot of experimental psychology in general , and psychophysics in particular was done with a scientific world-view in mind. This world-view looked for invariance or general, unchanging rules of perception. Thus, a lot of the early procedures developed by psychophysics were created to discover these general rules assumed to lurk within reach of the researcher. If the data could be encompassed by an equation then the data were even of more interest. Psychophysics is thus permeated with the search for organizing principles and "laws." Many of its methods, such as the cumulation of just noticeable differences to erect a psychological scale of magnitude, may seem archaic by today's sophisticated thinking (Fechner 1860). As a consequence of these attempts to understand the fundamental basis of perception, psychophysicists eventually created a corpus of knowledge now part of the public, refereed scientific literature. To a great extent the research reported by these scientists accumulated into a body of knowledge about perception functioning. The research is substantive - dealing with the topic of perception. Undergraduates and graduate students alike were to become familiar with the findings of these experiments. The actual procedures by which these findings were made were to become secondary, of interest in the experimental psychology laboratory, but not of particular interest to others with a basic interest in substantive matters. The education of students concentrated on the interconnections among these findings insofar as those data shed light on the way we perceive stimuli. We can contrast the foregoing with the development of Sensory Science. For many years Sensory Science comprised a potpourri of methods with relatively little content beyond the momentary application for which the method was used. Indeed, even the methods themselves were not quite well understood. They seem to be crafted half out of science, half out of convenience, with many methods having relatively little background in scientific theory. There was little science in the early methods. The focus was on measurement, without theory, in order to help make decisions, or in many cases to discover some physical correlate of a private subjective experience. Looking backwards today, we can see that the history of Sensory Science is the history of a set of test methods that found their use as adjuncts to more substantive research, especially in food science and chemistry. The objective of chemical research was to synthesize new molecules having interesting properties
METHODS VERSUS REAL KNOWLEDGE
105
- with the side issue being the taste or smell quality of the molecule and the threshold of taste (at which point the taste or smell was either just barely discernable, or the quality was just barely distinguishable). The objective of food science research was to understand the physical properties of food, and only secondarily to understand the sensory properties of those same foods, and the interrelation of the sensory and the objective domains. To be fair, in those early days Sensory Science was not considered a profession by itself. Many researchers felt that just about anyone with a sensory system in order could describe and scale the sensory properties of the food. Many of the practitioners were not professionally educated. Many of the reports in the early food science and cosmetic literature give the impression that the Sensory Science data were collected in an informal, almost haphazard way, comprising principally a description of the product characteristics, and little else. Just the words “taste testing” should suffice to indicate the relatively low esteem in which the field was held. With practitioners known affectionately as “taste testers” in the kitchen, it is no wonder that the field failed to develop. It was simply not sufficiently respected - no matter how much we try to deny this poor heritage. The Rapprochement
- 1970s and Later
Psychophysics and Sensory Science began to approach each other in the late 1960s and early 1970s. as a consequence of the appearance on the scene of many experimental psychologists, and the growing interest of psychologists in the chemical senses. The late 1960s witnessed the disappearance of job opportunities in academic psychology, and the inevitable migration of experimental psychologists into other areas where they could earn a living. Furthermore, in the 1960s, the chemical senses were the stepchildren of sensory researchers who had concentrated on the higher senses of vision and audition. Taste and smell were largely ignored, except perhaps by a few researchers (most notably Lloyd Beidler at Florida State University and Carl Pfaffmann at Brown University). By 1970, however, these forward-looking scientists produced a cadre of well-educated, scientifically grounded, students and post-doctoral fellows. Many of the students did their research in physiology, but a number became psychophysicists. The early 1970s saw the introduction of psychophysical methods to Sensory Science (e.g., magnitude estimation; Moskowitz and Side1 1971), and the participation of experimental psychologists in food science meetings (e.g., the Institute of Food Technologists). Experimental psychologists had typically attended these meetings, but it was only in the early 1970s that psychologists became highly visible. At that time it seemed natural for psychophysicists to
106
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
interact with food scientists and sensory scientists, especially for research tasks involving the study of the sensory responses to food. Part of the accommodation of psychophysics with Sensory Science came from the impetus provided by the U.S.Army, first at the Quartermaster Corps in Chicago, and later at the U.S.Army Natick Laboratories, in Natick (MA). The U.S.Army had always been interested in the acceptance of food, since food affects nutrition and morale. It was important for the army to understand what foods were liked and what foods were disliked. As part of its ongoing research program the army began to fund extensive food research programs, including within those programs the study of food acceptance on the one hand, and psychophysics of taste and smell on the other. Dr. Harry Jacobs in particular deserves mention and praise for the vision to hire a group of then young, newly minted psychophysicists working in the chemical senses. This cadre of young researchers, including the author, and two others, Linda Bartoshuk and Herbert Meisleman, was to become part of the Department of Defense food research program. Jacobs’ vision was to create a world-class organization of sensory researchers concentrating on taste, smell, and food preferences. Sensory scientists outside of experimental psychology, however, were both welcoming and rejecting. They welcomed interest and implicit professional validation by the psychophysicists, for as mentioned above the sensory scientist traditionally was viewed as a clerical individual who simply ran tests. Sensory scientists rejected what they considered to be novel ideas promoted by the psychophysicists, such as magnitude estimation scaling, because these were new, untested, and uncomfortable. Sensory scientists were more comfortable with procedure and method, rather than with substance and a corpus of knowledge. It would be 15 years, until the middlellate 1980s, before sensory scientists would truly welcome psychophysical thinking into the field, perhaps because by then a large number of psychophysicists had already become active.
REFERENCES FECHNER, G.T. 1860. Elemente der Psychophysik. Breitkopf und Hartel, Leipzig, Germany. MOSKOWITZ, H.R. and SIDEL, J.L. 1971. Magnitude and hedonic scales of food acceptability. J. Food Sci. 36, 677-680.
MAXIM0 C. GACULA, JR. This chapter deals with historical scenarios of the role of psychophysics in Sensory Science. For students and sensory scientists just starting a career in
METHODS VERSUS REAL KNOWLEDGE
107
Sensory Science, Moskowitz’s review is worth reading for I believe this subject is of great importance to Sensory Science. Sensory Science or sensory evaluation is an interdisciplinary field, the most important being food science, chemistry, psychology, statistics, physiology, and the subject matter areas of applications, i.e., dairy, cosmetics, foods, beverages, flavors and fragrances, wines, textiles and materials. As a result, leaders of today have varied educational training from various fields that contribute strongly to the development of Sensory Science. Moskowitz mentioned the U.S. Army Quartermaster Corps in Chicago that showed interest in food acceptance studies accommodating psychophysics and Sensory Science. It is worth mentioning that it was at the U.S. Army Quartermaster Corps that David Peryam, in collaboration with famous psychophysicists L.V. Jones and L.L. Thurstone, developed the 9-point hedonic scale of like/dislike (Peryam and Girardot 1952; Jones el al. 1955). By the 1980s, the role of statistics in Sensory Science was made clearly evident by the publication of books by statisticians and experimental psychologists (Gacula and Singh 1984; O’Mahony 1986). Also in the 1980s the knowledge that we learned from the interface of psychophysics, statistics, and Sensory Science led to the birth of two journals - Journal ofsensory Studies published by Food & Nutrition Press, and Food Quality and Preference published by Elsevier Science Ltd. The interfaces among the various fields will continue in the decades to come.
REFERENCES GACULA, JR.. M.C. and SINGH, J. 1984. Statistical Methods in Food and Consumer Research. Academic Press, San Diego. JONES, L.V., PERYAM, D.R. and THURSTONE, L.L. 1955. Development of a scale for measuring soldier’s food preferences. Food Res. 20, 512-520. O’MAHONY, M. 1986. Sensory Evaluation of Food, Statistical Methods and Procedures. Marcel Dekker, New York. PERYAM, D.R. and GIRARDOT, N.F. 1952. Advanced taste-test method. Food Eng. 24, 58-61, 194.
CHAPTER 7 DESCRIPTIVE PANELS/ EXPERTS VERSUS CONSUMERS HOWARD R. MOSKOWITZ The issue of, or more realistically the debate about, experts versus consumers may well be the most argument-provoking topic in Sensory Science. A great deal of the controversy comes from the varied intellectual histories of sensory scientists. Some of the controversy comes from financial interests of sensory scientists. Some of the controversy comes simply from a reluctance to change procedure. Quite often the debate generates more heat than light. Occasionally the debate is fascinating. Occasionally, however, the debate is pointless, because there are no data offered to back up the positions espoused by the debaters.
Intellectual History Sensory scientists often trace their intellectual heritage to experts, such as the perfumer, the wine taster, the tea taster, the coffee taster, the brewmaster, and other industry experts. In the early days of Sensory Science there was no specific training for an expert. Yet, through practice, indoctrination, and just by being around wines, teas, etc., the expert learned the meaning of attributes, learned how to taste products and score them, and observed the nuances of what made a good product versus what made a poor product. Over time this corpus of knowledge, never really codified but ever-present in the minds of the practitioners of a particular subject discipline, transformed the individuals into experts. They were experts because of practice, but also because other people in the company deferred to their judgment. The popularity of the A.D. Little Flavor Profile, beginning in the 1940s and extending until today, provided an additional impetus for accepting the expert panel (Caul 1957). Researchers at A.D. Little Inc., a science-based consulting company in Cambridge, MA, placed great emphasis on the training program. The subtext in all of their messages was that the consumer, untrained in Sensory Science, could not and should not be relied upon to provide reliable, valid data. The psychophysics literature was not considered to provide sufficient proof that, in fact, the consumers could do a good job in scaling. The Flavor Profile was followed by other general systems for eliciting attributes, training panelists, and then monitoring their performance. The most noteworthy of these are the QDA (Quantitative Descriptive Analysis) system (Stone ef al. 1974), and the Spectrum system (Meilgaard et al. 1987). 109
110
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
Are Experts More Sensitive
... and Anyway ... Does It Matter?
There is a commonly held misconception that experts are experts because their sensitivity is higher than that of the average consumer. This translates to a lower detection threshold (where the panelist can just detect the presence of the stimulus), or a lower recognition threshold (where the panelist can just correctly recognize the quality of the stimulus). Indeed, quite often one sees in the published literature words to the effect that the panelists were screened for “sensory acuity,” or low threshold. However, there is no evidence from the psychophysics literature that a person with a lower threshold perceives stimuli more accurately. That is, there is no correlation between threshold and perceived magnitude for many stimuli. A more productive way to talk about sensitivity may be based upon performance with scales that measure suprathreshold stimulus intensities. The important question is whether or not experts can use these scales “better” than untrained panelists. The word “better” is in quotes to indicate that the definition may be subjective. For instance, with training can the expert point reproducibly and validly to a reference standard to represent the sensory quality, ensuring that all experts operate within the same world of description. Furthermore, can the expert point reproducibly and reliably to a reference intensity level, ensuring that all experts scale the magnitude alike. We know from psychophysics literature that with as few as 10 panelists, one can recover a reliable sensory response function (e.g., the power function) relating physical magnitude to sensory judgment. Does this mean that a group of consumers can act as experts, at least on an aggregate level, where one need not look at individual data? See Stevens 1975; Marks 1974.
Can Expertise Be Developed Simply on the Basis of Experience? Although the label of expert has been given to panelists who go through a formalized training program, is expertise really a matter of experience? For example, consider wine tasting. Is a wine expert simply an individual with a great deal of experience tasting and talking about wine? In wine tastings quite often the various experts, or at least participants, will use different terms to describe the same wine, especially if these experts have not gone through a standardized training. Does this use of different terms make them non-experts? That is, does the lack of agreement among experts in the use of terminology reduce these individuals to the category of consumers. If not - then what makes them experts? Repeated sensory experience with a stimulus can change the nature of what one perceives (Moskowitz and Gerbers 1974), even if the experience is passive. We are all familiar with the experience of hearing more and more things in a musical piece as we hear the music again. We hear nuances that we missed before, because we did not pay attention to them. For wine tasting these nuances
DESCRIPTIVE PANELSlEXPERTS VERSUS CONSUMERS
111
may be the essence of the “expertise” - viz., the ability to notice and to describe different characteristics. For many foods the development of expertise comes from a group exercise, where the sensory notes are first identified by a few individuals, then described more fully, and finally represented by products that act as standards. At the start no member of the panel is an expert ... and indeed all too often the product category is new for the panelists. Only after the entire panel has been polled does the panel leader “buckle down” to the job. The job requires the panel to identify the key terms, develop physical reference standards for these terms, and then begin training/practice on so that the panelist can recognize and name the qualities and the sensory intensities represented by these standards. Expertise does arise from experience. However, it takes training and discipline to create a descriptive system using that experience. Furthermore, the role of the panel moderator or trainer focuses on the need to make the individual’s experience public and accessible. Thus we may say that expertise combines one’s own experience of sharpened attention with an enhanced ability to communicate that which is perceived.
REFERENCES BAZEMORE, R.A., ROUSEFF, R.L., NORDBY, H., BOWMAN, K., GOODNER, K. and JELLA, P. 1997. Discrimination of Grapefruit Varietal Differences with an Electronic Nose Equipped with Metal Oxide Sensors. Seminars in Food Analysis 2, 239-246. CAUL, J.F. 1957. The profile method of flavor analysis. Advances In Food Research 1-40. DIJKSTERHUIS, G. and PUNTER, P. 1990. Interpreting generalized procrustes analysis “Analysis of variance” tables. Food Quality and Preference 2, 255-265. MARKS, L.E. 1974. Sensory Processes: The New Psychophysics. Academic Press, New York. MEILGAARD, M., CIVILLE, G.V. and CARR, B.T. 1987. Sensory Evaluation Techniques, Chap. 8, pp. 119-142, CRC Press, Boca Raton, Fla. MOSKOWITZ, H.R. 1976. Multidimensional scaling of odorants and their mixtures, Lebensmittel, Wissenschaft und Technologie 9, 232-238. MOSKOWITZ, H.R. 1999. Inter-relating data sets for product development: The reverse engineering approach. Food Quality and Preference I I, 105-1 19. MOSKOWITZ, H.R. and GERBERS, C. 1974. Dimensional salience of odors. Annals of the New York Academy of Sciences 237, 3-16.
112
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
STEVENS, S.S. 1975. Psychophysics: An Introduction to its Perceptual, Neural and Social Prospects. John Wiley & Sons, New York. STONE, H., SIDEL, J.L., OLIVER, S., WOOLSEY, A. and SINGLETON, R. 1974. Sensory evaluation by quantitative descriptive analysis. Food Technol. 28. 24-34.
ALEJANDRA M. m O Z
Do We Need Both Populations and Why? For most sensory professionals, the tasks performed by consumers and descriptive panels/experts are regarded as clearly different (Stone and Side1 1993; Muiioz and Chambers 1993; Muiioz 1997; Lawless and Heymann 1998). However, the discussion about their differences is worthwhile since there has been recent debate and controversy on the role of each of these two populations in sensory testing (Moskowitz 1996, 1998; Dugle 1997; Hugh 1998; Chollet and Valentin 2001). There are many questions which arise when we consider the use of trained panelists versus consumers. What do we recruit consumers for and what can and should they do? Why do we have expertdtrained panelists? What can and should these experts/trained panelists do? Is there the need to have two populations in sensory testing? Can we just have one group of people and have them participate in all tests? If so, who should they be? Should the panelists be the consumers, since consumers represent the population of most interest to companies, the ones who buy the products, and thus the opinion that counts? Or should the panelists be the expertdtrained individuals who have been trained to evaluate sensory properties and who can so technically and specifically qualify and quantify sensory properties? This author’s opinion on the topic is somewhat different from Dr. Moskowitz’ views presented above. Dr. Moskowitz addresses this topic in terms of sensitivity, and concludes that there is a misconception that experts’ sensitivity is higher than the average consumer. This author disagrees with Dr. Moskowitz in considering sensitivity differences to determine the value of each of these group’s responses. Rather, the “core of the discussion” should be focused on the type of information, specifically on the difference in detail and technical nature between descriptive and consumer information. This assessment will lead to recognize the added value of descriptive/expert data for research guidance, since descriptive data make consumer data actionable, as explained below. ”
DESCRIPTIVE PANELSlEXPERTS VERSUS CONSUMERS
113
Descriptive/expert panelists, because of their training, have the ability to accurately and in detail describe the sensory properties of products, minimally affected by physiological errors. Consumers may not be able to accomplish this. Therefore, this author believes that we should carefully interpret and use consumer responses. Consumers will answer each of the attribute questions we ask. However, how sure are we that (1) they understood all terms, (2) they focus/evaluate that attribute and not others, and (3) that such a consumer response is not affected by psychological errors? As stated by Muiioz (1997) consumer attribute information: may not be technical and specific enough for research guidance may be integrated (several product attributes are combined into one term, such as “creaminess”, ”refreshing”) may be affected not only by intensities of the products’ characteristics, but by other factors, such as liking, expectations, physiological errors, etc. This author believes that we do need both sets of data and that through consumer-descriptive and consumer-analytical data relationships (covered in detail in Chap. 21), consumer responses can be compared to the technical responses of a trained panel or other analytical data and: (1) unveil those attributes that consumers may not understand and/or give a different interpretation (thus to be aware of the risks in using those consumer data for research guidance), (2) unveil those singular attributes that form a given consumer “integrated“ attribute, and (3) provide more “actionable” research guidance by decoding consumer responses into more technical descriptive information. This author has a different opinion than Dr. Moskowitz, because she does not address the differences in terms of sensitivity, but in terms of how “actionable” the resulting data are. Unfortunately, consumer data are not always actionable, thus descriptive data are needed. For example, consumers evaluate (and should evaluate): “creaminess,” “refreshing,” “home-made,’’ etc. However, what do product developers do if the results indicate that a product is not “creamy enough”? Naturally, the consumer is the one who should provide that information, but how do we interpret these responses? What should a product developer do to make the product more “creamy”? If, however, we have a descriptive panel who can evaluate the individual/singular characteristics of the product, we would be able to know that “creaminess” is related to (for example) butterfat and “diacetyln aromatic, oily/greasy mouthfeel, and thickness. The product developer can act on this information.
114
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
Therefore, consumer data are not better than descriptive data, and vice versa. Both data are important and together they provide the best information. Many authors have examined the relationship between the two data sets and the benefits of comparing/combining them (Shepherd er al. 1988; Greenhoff ef al. 1994; Muiioz er al. 1996; Helgensen er al. 1997; Pagliarini er al. 1997; Muiroz 1998; Elmore er al. 1999; Malundo er al. 2001). Most sensory professionals acknowledge a distinct difference between these two populations, and the roles of each in sensory testing. Therefore, most sensory practitioners consider that both populations are needed to conduct the gamut of tests in sensory evaluation. Each of them performs different tasks. Let’s examine the definitions of consumers and expertdtrained panelists to discuss their roles in sensory testing.
Consumers The consumer is the user or potential user of a product or service who is an untrained individual who purchases and/or uses a product, based on its benefits. In sensory testing we recruit consumers based on their product usage, and on other criteria important for the study: age, gender, income, having children, etc. (Resurreccion 1998). A consumer uses the product with different frequencies and thus is classified as a heavy, medium or light user. This classification determines the frequency of use of the product, but also the level of familiarity with the product. It is (often) believed that a heavy user is very familiar with the product and may be more discriminating to small variations/differences in its properties.
Trained/expert Panelists The trained/expert panelists - also called judges, assessors, descriptive panelists - are individuals (local residents or employees) who are trained in sensory method@) and sensory properties (ASTM 1992). At that point these individuals stop being considered “naive” consumers and become “experts/ trained panelists. ” This author prefers to use the terms trained panelists, judges, assessors, descriptive panelists to refer to this population and reserve the word “expert” to refer to an individual who is experienced and knowledgeable in a certain area or product category (i.e., perfumer, brew master, winemaker, etc.). Experts usually operate by themselves or with a small group of individuals who share similar, but not necessarily the same views on the topic or product category.
DESCRIPTIVE PANELSlEXPERTS VERSUS CONSUMERS
115
Role of Each Population As these traditional definitions indicate, these two groups of people comprise two very different populations. From the sensory point of view, the two populations are used for different sensory tasks. The trained panelists/assessors cannot be considered the “true, naive” consumers because they are trained to evaluate products in a technical and detailed way. These panelists have learned product attributes and evaluation techniques that may be very different from the ones used by naive consumers (Muiioz and Chambers 1993). In addition, they have been trained to focus in on product attributes differently than do consumers. For example, trained panelists may handle products in a very specific way in order to increase sensitivity and/or separate the perception of attributes in an easier way (e.g., control of number of manipulations or evaluation periods/times in texture evaluations). Therefore, it is inappropriate to consider these trained individuals to be “naive” consumers. Their responses are not characteristic of those from consumers, since the trained panelists are influenced by the knowledge they have obtained in training. In addition, the more training they get, the farther apart from consumers these trained panelists become. Consequently, experts/trained panelists cannot (and should not be asked to) provide typical consumer responses, such as liking, preference, purchase intent, or any other consumer response. Thus, descriptive trained/experts panelists are the population that is used to provide product evaluations that are:
free of (or minimally influenced by) biases and psychological factors, are detailed and technical, can be used to better interpret consumer responses. Consumers are naive, untrained individuals who have been recruited to participate in a test because they are users of the product or service tested, and meet additional criteria set by the researcher. These product users are the only people who are able to provide information on their likes and dislikes of the product, since they purchase and buy it regularly. They know what they like and dislike and what they want in a product. Most consumers are able to also give information on the reasons of their likes and dislikes. Some sensory professionals prefer not to ask questions on attributes to consumers and limit themselves to asking only liking and preference questions. The underlying rationale is that consumers are unable to provide reliable product direction, since they have difficulty understanding product attributes and scales (Stone and Sidel, 1993; Husson el al. 2001). However, most researchers add attribute/diagnostic questions in consumer questionnaires in order to understand consumer perceptions of these attributes.
116
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
Consumers can provide attribute information, as long as the questionnaire is properly structured, and the words are carefully selected. Attribute rich questionnaires have been a standard practice followed by all market researchers and most sensory professionals. Attribute/diagnostic information is then used for product guidance. In designing a consumer questionnaire one has to be aware of the few limitations that consumers have in providing product information (Muiioz 1997). (1) Consumers are not trained and thus are unable to express their opinions in a technical way. Therefore the researcher is limited to use simple consumer language. (2) Because of the nature of the consumer language used (simple and “integrated”), the information obtained may not be specific enough, may not be technical enough, and may be insufficient for product guidance. (3) Some consumer responses may be misleading for product guidance (if consumers do not understand one or more attributes included in the questionnaire). The results of those attribute ratings may indicate a direction but it may be the wrong direction.
Descriptive-consumer data relationships, which are discussed in Chap. 21 are often sought for in order to overcome some of these limitations. This practice involves linking consumer information to descriptive data in order to decode consumer responses. From this author’s perspective, both populations are needed in a research guidance program. Consumers and expert/trained panelists have different perspectives, and frames of reference for the evaluation of products, as summarized in the table below. However, the combined information provides the “complete” picture.
POPULATION
I
Trained Panelists
LIKING/ PREFERENCE Do noUshould not provide any consumer information
Consumers
ATTRIBUTES Technicallsingular and detailed (with no/minimal biases)
The only population that can Can provide attribute and should provide the liking/ information (caution in interpreference information preting is needed because of biases, context effects, possible misinterpretation, integrated terms. etc.) I
DESCRIPTIVE PANELSIEXPERTS VERSUS CONSUMERS
117
REFERENCES ASTM. 1992. ASTM Manual series MNL 13. Manual on descriptive analysis testing. R. Hootman, ed. ASTM, West Conshohocken, Penn. CHOLLET, S. and VALENTIN, D. 2001. Impact of training on beer flavor perception and description: Are trained and untrained subjects really different? J. Sensory Studies 16, 601-618. DUGLE, J. 1997. Note on “Experts versus consumers: A comparison”. J. Sensory Studies 12, 147-154. ELMORE, J.R., HEYMANN, H., JOHNSON, J. and HEWETT, J.E. 1999. Preference mapping: Relating acceptance of “creaminess” to a descriptive sensory map of a semi-solid. Food Quality and Preference 10(6), 465-476. GREENHOFF, K. and MACFIE, H.J.H. 1994. Preference mapping in practice. In: Measurement of Food Preferences, H.J.H. MacFie and D.M.H. Thomson, eds. Blackie Academics, London. HELGENSEN, H., SOLHEIM, R. and NAES, T. 1997. Consumer preference mapping of dry fermented lamb sausages. Food Quality and Preference 8, 97- 109. HOUGH, G. 1998. Experts versus consumers: A critique. J. Sensory Studies 13, 285-289. HUSSON, F., LEDIEN, S. and PAGES, J. 2001. Which value can be granted to sensory profiles given by consumers? Methodology and results. Food Quality and Preference 12(5-7), 291-296. LAWLESS, H.T. and HEYMANN, H. 1998. Sensory Evaluation of Food. Chapman & Hall, New York. MALUNDO, T.M.M., SHEWFELT, R.L., WARE, G.O. and BALDWIN, E.A. 2001. An alternative method for relating consumer and descriptive data used to identify critical flavor properties of mango (Mungiferu indicu L.). J. Sensory Studies 16, 199-214. MOSKOWITZ, H.R. 1996. Experts versus consumers: A comparison. J. Sensory Studies 11, 19-37. MOSKOWITZ, H.R. 1998. Consumers versus experts in the light of psychophysics: A reply to Hough. J. Sensory Studies 13, 291-298. MUROZ, A.M. 1997. Importance, types and applications of consumer data relationships. In: ASTM Manual 30. Relating consumer, descriptive and laboratory data to better understand consumer responses. A.M. Muiloz, ed. ASTM Press, West Conshohocken, Penn. MUROZ. A.M. 1998. Consumer Perceptions of Meat. Understanding these results through descriptive analysis. Meat Sci. 40, 287-295. MUROZ, A.M. and CHAMBERS, E. IV. 1993. Relating sensory measurements to consumer acceptance of meat products. Food Technol. 47(1 l), 118-131, 134.
118
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
MUROZ, A.M., CHAMBERS, E. IV. and HUMMER, S. 1996. A multifaceted category research study: How to understand a product category and its consumer responses. J . Sensory Studies I I, 26 1-294. PAGLIARINI, E., MONTELEONE, E. and WAKELING, I. 1997. Sensory profile description of Mozzarella cheese and its relationship with consumer preference. J. Sensory Studies 12 (4), 285-301. RESURRECCION, A.V.A. 1998. Consumer Sensory Testing for Product Development. Aspen Publishers, Gaithersburg, MD. SHEPHERD, R., GRIFFITHS, N.M. and SMITH, K. 1988. The relationship between consumer preferences and trained panel responses. J. Sensory Studies 3, 19-35. STONE, H. and SIDEL, J.L. 1993. Sensory Evaluation Practices. Academic Press, New York.
MAXIM0 C. GACULA,JR. “Experts versus consumers” is a central question over the last 40 years, particularly in the consumer product industries. Those were the years when samples of new products were displayed in a management meeting and evaluated by the meeting participants, these gatherings being called “product cuttings. ” Generally, participants in the higher positions and those who are familiar or expert with the product through years of experience decide which product prototypes will likely pass, or be selected for further testing. In the fashion industry, however, the experts or designers dictate the types of products to be manufactured and the consumer becomes the follower. Similarly, we have experts in the perfumery and wine industries, although, the use of Sensory Science in the winery is getting popular in close collaboration with wine experts (McCloskey e? al. 1996). It is a good beginning for this next decade. In practice, it is often difficult to evaluate performance of experts against consumers because of the many factors involved in the final acceptance and purchase of a product (Allison and Uhl 1964; Gacula e? al. 1986; Sheen and Drayton 1988). A product may be developed based solely on the experts’ initial evaluations, and later, elegantly packaged followed by strong advertising and reasonable pricing, and in the end become a successful product in the marketplace. However, under a properly designed study the performance of experts and consumers can be assessed. Initially, one must first provide a definition of “experts” and “consumers. Muiioz and Moskowitz clearly defined the roles and differences between experts and consumers, which are likely definitions that are fully understood in practice. With these definitions and ramifications, it is really the responsibility
DESCRIPTIVE PANELSEXPERTS VERSUS CONSUMERS
119
of the product developer (both marketing and product scientist) and the sensory scientist to interpret the results of a study. Such interpretation should be based on product types, scientific information and confirmation. We should note the general principle that ratings of sensory characteristics are best obtained from trained panels; likewise, preference/acceptance ratings are best obtained from consumers who are users of the products as expounded by Muiioz and Moskowitz. Moskowitz presented the psychophysical aspects of experts versus consumers. Muiioz on the other hand nicely addressed the practice of using experts and consumers in sensory evaluation and consumer testing. They both agree that expertltrained panels should be used for rating sensory characteristics and consumers for acceptance/liking ratings. It is informative to review some sensory evaluation work on this subject in the last 40 years to provide scientific assistance in making decisions for such an important and yet thought-provoking controversial subject. Early studies showed general agreement in preference between a consumer panel and a laboratory type panel. A laboratory panel consists of individuals familiar with the product. Such an agreement was substantiated by the study of Miller el al. (1955) using noodle soup, and using a consumer panel consisting of 600 respondents and a laboratory panel of 50 to 75 people. Murphy et al. (1958) studied the effects of six processing methods on the flavor of Maine sardines. The laboratory panel consisted of 43 experienced judges and the consumer acceptance survey comprised 1,000 to 1,070 tasters in 12 cities. Although the study did not prove that a laboratory panel predicts consumer preferences, it did lend support to the idea that laboratory panels have some predictive value. In two instances where the laboratory panel diverged from the consumer opinion, the laboratory panel’s judgment was more on the critical side. Paired samples of 15 different food products, i.e., blackberry jam, cheddar cheese, tomato juice, canned corn, etc., were used by Calvin and Sather (1959) to compare the household consumer panel (200 families) and student preference panel (120 to 185 students). The student panel may be considered as subset of consumer panel of younger ages. The study showed the following. (1) The agreement between the student and home panels was very good, r = 0.89 to 0.91, indicating that student panel can be used as a measure of home consumer preferences; (2) Mean hedonic score (1 to 9-point scale) and percentage preference for both student and home panels correlated highly, r = 0.96, indicating that either method may be used to measure preference. In another study, Sather and Calvin (1963) compared the student panel and trained panel for preference on dry whole milk. The results indicated that total
120
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
scores from the trained panels could be used as a basis for a scoring system that would predict consumer preference score. The trained panel scores for cooked, oxidized, stale and astringent flavors did not satisfactorily predict preference scores but could be used as an indication of possible flavor defects in dry whole milk. Shepherd ef al. (1988) studied the relation between consumer preferences (n = 206) and trained panel (n = 15) responses using dried tomato soups; the pattern of overall preferences differed between trained and untrained consumer panelists and thus demonstrated the inappropriateness of using trained panelists to provide measures of preference or acceptance. Gacula (1987) revisited this issue by relating a laboratory panel and consumer panel using hot dogs with % fat levels of 30, 33, 36,39, and 42. The panels were as follows: R&D Laboratory Panel (n = 60), locally recruited consumer panel (n = 84), Central Location Test Panel (n = 780, 77 panelists per fat level), and State University Panel consisting of students and faculty members. Owing to the difference in the experimental design and unit of measurement used among the four types of panels, the experimental data were converted to a new scale by expressing the mean values as deviations from the population mean (Gacula et al. 1971). This conversion standardizes the scale origin and does not affect rank order of means. Then the largest negative value was added back to the data in order to make them all positives. Using only data on overall preference, Fig. 7.1 shows the results for the R&D Panel and State U Panel. The plot shows that the R&D Panel was able to predict the likes (larger values) and dislikes (smaller values) of the University Panel, sample with 42% (0.00,O.OO) fat being the most disliked. The result for the R&D Panel and the Consumer Panel is given in Fig. 7.2, which shows good agreement between the two panels, particularly in disliking the 42% (0.00,O.OO)fat hot dog sample and liking the 30% (0.56, 1.06) fat. It should be noted that a hot dog is not a particularly complex product, sensorially, and consumers are generally familiar with its sensory properties. This contributed to the good agreement between the two panels A recent paper by Moskowitz (1996), using ratings of experts (1-9 scale) and consumers (0-100 scale) showed good agreement, as given by the high correlation between the two panels across 37 products. This finding refutes the notion that consumers are incapable of validly rating the sensory aspects of products, thus the result provides evidence that consumers can be used to assess the sensory characteristics of products. Moskowitz’s paper attracted critiques coming from Dugle (1997) and Hough (1998), which are very much welcomed in the sensory community. See Moskowitz (1998) for a reply.
DESCRIPTIVE PANELSlEXPERTS VERSUS CONSUMERS
1
RBD Panel vs. StateU Panel
I
0.40
0.35
$
3
*
0.20 0.15
0.10 0.05
~
1 ,
0.00 0.00
0.50
1.00
1.50
2.00
RBD Panel
FIG. 7.1. PLOT OF OVERALL PREFERENCE R&D VS. STATE UNIVERSITY PANEL
I
R 8 D Panel vs. Consumer Panel
1 1
0.60
I
,
1
-a
ia
0.50 0.40
0.30
g 0.20
~
I
0.10 0.00
0.00 ~
0.50
1 .oo
1.50
RBD Panel
FIG. 7.2. PLOT OF OVERALL PREFERENCE R&D VS. CONSUMER PANEL
121
122
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
After 40 years, the issue of experts versus consumers remains contentious. The issue still exists because the results of scientific studies could not be generalized as shown by the studies reviewed, From my viewpoint, the results are product-dependent. Products that have been in the market for a long time would likely produce similar results using experts and consumers (users of the product), because the sensory properties of the product have been fully defined by both panels as a result of continued product use. Products with complex sensory properties may result in low correlation between ratings of experts and consumers, likely due to a lack of uniform understanding of the sensory attributes of the product. Another issue that should be discussed is the optimal order of testing. This question is expressed as follows: Which should be done first, descriptive analysis (trained panel) followed by consumer test or the reverse? I believe that the common practice is descriptive analysis followed by consumer test, i.e., Research Guidance Panel. There are no rules about the sequence of testing. Performing the Research Guidance Panel test first has some advantage in the sense that it answers the basic management question on why a particular product or formulation is liked/disliked. Then the descriptive analysis would be able to define what is in the product or formulation that contributes to liking/disliking . Thus, there is time saved in the sense that the disliked products are not subjected to descriptive analysis. Then a decision has to be made regarding how many of the likedproducfs (top 3, top 4, etc.) are to be subjected to descriptive analysis. Such an approach would be useful in the product formulation optimization stage. At this stage several formulations are made. For example with a 2-ingredient study, the minimum recommended number of formulations is 5 (2 x 2 factorial with center point); for a 3-ingredient study the recommended number is 15 (central composite design); and for a 7-factor Plackett-Burman design the recommended number is 8, respectively. The selection of sensory attributes for descriptive analysis should be based on consumer responses. This approach should greatly reduce the number of samples or formulations for descriptive analysis work, and more importantly lead to a shorter product development cycle. This approach, which may not always apply to every study based upon the particular study objectives, makes good use of both trained panels and consumer panels.
REFERENCES ALLISON, R.I. and UHL, K. 1964. Influences of beer brand identification on taste perception. J. Marketing Res. 1 , 36-39. CALVIN, L.D. and SATHER, L.A. 1959. A comparison of student preference panels with a household consumer panel. Food Technol. 13, 469-472.
DESCRIPTIVE PANELSEXPERTS VERSUS CONSUMERS
123
DUGLE, J. 1997. Note on “experts versus consumers: A comparison.” J. Sensory Studies 12, 147-154. GACULA, JR., M.C., RUTENBECK, S.K., CAMPBELL, J.F., GIOVANNI, M.E., GARDZE, C.A. and WASHAM 11, R.W. 1986. Some sources of bias in consumer testing. J. Sensory Studies I, 175-182. GACULA, JR., M.C. 1987. Some issues in the design and analysis of sensory data: Revisited. J. Sensory Studies 2, 169-185. HOUGH, G. 1998. Experts versus consumers: A critique. J. Sensory Studies 13, 285-289.
LAWLESS, H.T. 1984. Flavor description of white wine by “expert” and nonexpert wine consumers. J. Food Science 49, 120- 123. McDANIEL, M.R. and SAWYER, F.M. 1981. Preference testing of whiskey sour formulations: Magnitude estimation versus the 9-point hedonic scale. J. Food Science 46, 182-185. McCLOSKEY, L.P., SYLVAN, M. and ARRHENIUS, S.P. 1996. Descriptive analysis for wine quality experts determining appellations by Chardonnay wine aroma. J. Sensory Studies 11, 49-67. MILLER, P.G., NAIR, J.H. and HARRIMAN, A.J. 1955. A household and a laboratory type of panel for testing consumer preference. Food Technol. 9, 445-449.
MOSKOWITZ, H.R. 1996. Experts versus consumers: A comparison. J. Sensory Studies 11, 19-37. MOSKOWITZ, H.R. 1998. Consumers versus experts in the light of psychophysics: A reply to Hough. J. Sensory Studies 13, 291-298. MURPHY, E.F., CLARK, B.S. and BERGLUND, R.M. 1958. A consumer survey versus panel testing for acceptance evaluation of Maine sardines. Food Te~hnol.12, 226-226. SATHER, L.A. and CALVIN, L.D. 1963. Relation of preference panel and trained panel scores on dry whole milk. J. Dairy Sci. 46, 1054-1058. SHEEN, M.R. and DRAYTON, J.L. 1988. Influences of brand label on sensory perception. In: Food Acceptability pp . 89-99, D .M .H , Thomson, ed., Elsevier Applied Science, London. SHEPHERD, R., GRIFFITHS, N.M. and SMITH, K. 1988. The relationship between consumer preferences and trained panel responses. J. Sensory Studies 3, 19-35.
CHAFI’ER 8 SAMPLE ISSUES IN CONSUMER TESTING HOWARD R. MOSKOWITZ Selecting the Sample(s) How does the researcher select the samples to test? This sounds like a simple question, but in actuality the question is quite deep and fraught with hazards all about. The question calls into play one’s point of view about the underlying motives of those involved in the project. For instance, it is a rare brand manager who doesn’t want to test the leading competitor. More than wanting to know how a sample performs on an absolute scale, the brand manager wants to know how the sample performs versus a product that has achieved demonstrable market success. The product developer wants to select samples for testing that represent the product itself, as well as the key competitors, but typically doesn’t voice any stronger opinion than that. The market researcher wants to select samples that are easy to procure, that have a reasonable market share and whose results from tests are easy to understand. The sensory scientist wants to select samples that represent an adequate sensory range. To a great degree, selecting samples in a test is a function of the nature of the test. All of these stakeholders want different tests for the simple reason that they have different agendas. There are many test designs where sample selection is simply not a problem. For instance, if the study objective is to compare current versus new product, e.g., a change has been made to a process or an ingredient, then there are only two products to assess - current and new. If the study objective is to evaluate a single product on attributes, e.g., directional scales, then there is only one product to study - the current product. It can be fairly asserted that difficulties arise in the selection of samples to the degree that the study departs from a simple study of simple products, onwards to a study whose aim it is to uncover patterns in the data leading to development or marketing insights.
Issues with In-Market Samples A great deal of sensory/product work is done with in-market samples. The difficulty with such samples is the natural variability of the product. Except if the product is completely fabricated, as are some dry still beverages, to which one adds water, the researcher will face the troubling yet inevitable variation from sample to sample. Some of this variation occurs because the sample comprises components that naturally occur. For instance, two pizzas can never 125
126
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
be the same, no matter how hard one tries, simply because a specific pizza is a complex array of components that can never be precisely duplicated. Other variation in samples occurs because the raw material may vary. Orange juice, for example, critically depends upon the composition of the original oranges. These oranges vary by market, and within market by season. Thus it falls to the quality assurance engineer to create blends of orange juices in order to maintain a reasonable amount of product identity. It is impossible to maintain 100% of such identity. Finally, there are the ever-present effects of the distribution and storage systems. These systems vary from market to market, year to year, and from sample to sample in an idiosyncratic way. Storage time and temperature change product characteristics. Faced with this array of variations, what is the prudent approach that the product tester should adopt? Clearly it makes no sense to demand 100% replication of sample, for study after study. Such demand is laudable but virtually impossible. Some practitioners feel that an expert panel should evaluate the test samples prior to the actual consumer evaluation, simply in order to ensure that the sample being evaluated will represent the target product to some reasonable degree. Other practitioners try to evaluate a number of batches from different sources, and average together data from these batches in hopes of achieving a representation of the actual product through the different samples. Still others try to factor out the variation due to sample as yet another variable. What remains after factoring out the batch variables and the panelist variables is the presumably true rating. It should be noted that this approach requires a bit of statistical legerdemain to discover, assign, and then root out all sources of extraneous variation.
Creating Samples in the Laboratory The growth of experimentation in sensory research has brought with it the increase in the number of “made to order” samples that are created in the development laboratory. Whereas 50 years ago the job of the sensory scientist was to measure and report that which was presented, and what was presented was positioned as a ‘yait accompli,” today things have changed. The sensory scientist is quite often a partner in the product development process, pushing the developers to create new and different prototypes. It is the sensory scientist who is more often than not trained in the methods of experimental design, and who recognizes the need for systematic formula variation. This newly found power also brings with it responsibility. Specifically, how does the sensory scientist work with the samples that are created? Should the sensory scientist both prescribe the product prototypes and judge which specific prototypes are appropriate for the test? It is an easy matter for the sensory scientist to set up protocols in order to select products from the current market
SAMPLE ISSUES IN CONSUMER TESTING
127
(see Alejandra Muiloz’s section following). It is not so easy to pass judgment on the selection or rejection of prototypes that have been systematically varied. The sensory scientist does not know, nor does anyone else know, whether the variation and departure from specification for systematically varied products reflects poor quality control and thus criteria should be tightened, or whether the variation is simply natural variation that must be taken into account and lived with. Indeed, in this author’s opinion, one of the worst possible things that a sensory scientist, or any product researcher, can do is cull through many prototypes, picking only those that represent the experimental design. The results of that study can mislead - nature is variable and must be dealt with as variable. The researcher, however, has suppressed the variability, and works now with an unnatural set of products, even with the structure of an experimental design. It’s Too Much Work To Do It - But Not Too Much Work To Do It Again One of the big problems in product testing is the absolutely negative response to the prospect of having to create large numbers of samples. Many researchers involved with product optimization, with the stated objective to understand the dynamics of the product, simply find it downright unpleasant to make the many prototypes that are needed. Certainly creating prototypes is not as easy as buying these prototypes. Yet, there is so much to be learned from the systematic creation of these stimuli. This negativity begins with the oftenoverloaded product developer. It soon, however, becomes part of the viewpoint of the sensory scientist, who believes that just as much information can be learned from the expert panel analysis of in-market products. The popular escape hatch is that a profound analysis of one or two prototypes can take the place of stimulus creation. In other words, better to armchair the easy-to-do than to do the experiment itself (Stevens 1968). This is a recipe for disaster. The problem often manifests itself as well in the nature of the products that the developer is asked to make. Most sensory scientists deal with small sets of variables (e.g., 2-3 variables at a time), which they can understand thoroughly, and which are tractable for presentation on graphics, such as contour plots. Most product developers and sensory scientists shy away from the larger scale problems, comprising 5 , 6 , 7 or even 10 or more independent variables. There are simply too many variables with which to work. The inevitable outcome of this problem of avoiding the creation and analysis of samples is that R&D and sensory scientists end up with small sets of disparate experiments, each dealing with one facet of the stimulus set. A single larger set of stimuli, created perhaps with more effort, and analyzed with tools beyond the standard equal intensity contour plots, may actually prove more valuable for the business problem at hand. In the years to come, as business
128
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
issues become more complicated, we may expect to see far more samples being created, and far less reliance on simple, tight, robust, yet not particularly informative studies on a smaller set of easy to make samples. Marketing and management will inevitably come to realize that it is the data from much larger sets of samples, comprising the effects of many variables, that teach principles for future product development, create new combinations of product characteristics, and reveal new segments.
Perfect Samples Versus Samples Going Through the Distribution Chain Quite often the selection of samples can be biased through positive motives. For instance, on more than one occasion the author has witnessed the biased selection of “representative samples” from different lots. What makes this so unusual is that the samples were almost always at the higher end of the quality spectrum. That is, when corporate employees select products, there appears to be an in-born bias to select those products that match one’s own internal image of the “gold standard” for the product. There is rarely, if ever, the conscious selection of products having clear defects, even though everyone on the team knows that these defects exist, and that these defects are just as much part of the current “operational” standard of the product as are products with no defects at all. Care must be exerted not to set the implicit criteria for inclusion so high that the task of selecting products eliminates most of the samples. One way to accomplish this objective sends a set of batch samples through the distribution chain. Presumably these batch samples would meet the quality standards at the start of their trip through the chain. Rather than selecting out the products that successfully traverse the chain, the researcher might then simply select either every product, or every nth product, regardless of the nature of the defect. This strategy would then produce a more reasonable sampling of the products as they actually are.
How Many Samples per Session? How many samples can a panelist really taste before becoming fatigued? This is a continuing question in Sensory Science, for a number of reasons. Economics dictate that the panelist test as many products as possible, in order to obtain as much data from the same panelist. If the panelist can test all of the products, then the researcher can estimate both the “product” effect and the “panelist” effect in the test, and remove the panelist effect, leaving only the pure product effect. In a sense, by having the panelist test all of the products the researcher can suppress some of the inevitable noise in the test. Having the panelist test many products in a single session or in a set of sessions enjoys the additional benefit of allowing the researcher to develop relations between the liking ratings assigned by that particular panelist and other
SAMPLE ISSUES IN CONSUMER TESTING
129
relevant variables, such as sensory intensity. Each particular sensory attribute will, in turn, generate its own curve for the panelist. The strategy of having each panelist generate an individual sensory-liking curve has been used by the author to create a very useful scheme for “sensory preference segmentation,” based upon the location of the individual’s optimal liking along a set of sensory attributes (Moskowitz et al. 1985). Psychophysical researchers have, for the past 60 years or even longer, submitted panelists to time-intensive experiments in which the panelists are exposed to many simple test stimuli. These stimuli were originally lights and tones, but the resurgence of interest in the chemical senses has generated long evaluation sessions to assess odorants and tastants (Moskowitz 1981; Moskowitz and Klarman 1975). These studies have worked quite well in terms of execution. When the panelists are paid there does not seem to be any report of fatigue. The studies also generate solid, valid, reliable data. Validity has been established by including within the test set an array of systematically varied test stimuli that, in other and simpler tests, have generated a specific curve relating the sensory attribute rating and physical stimulus. The same curve reappears, even when this reference set is embedded within the context of a larger set of other stimuli, leading the reader to conclude that the panelists have not lost any ability to discriminate stimuli, despite the fact that the panelists are exposed to a larger test set. The recovery of the known relation occurs even when the test stimuli are different in many qualities, and when the panelist assigns many attribute ratings to a single stimulus (e.g., profile the sensory intensity level on various qualities, as well as rate overall liking). Why then is there a continuing skepticism on the part of applied researchers in the commercial realm, when the basic scientific work never appears to concern itself with fatigue, or at least sets up the test conditions so that fatigue is avoided? We can compare this happy state of affairs in basic research to what happens in applied research, conducted by a market researcher or a sensory scientist. For the most part applied researchers deal with real food, with real consumers and with real attributes, so there is a realistic product rather than a sugar solution. Yet, there is as much heat as light generated in discussions about the panelist’s ability to rate a large set of stimuli in a test session. No data are adduced in favor of the conservative position - viz., a panelist can only test two or three stimuli before losing his sensitivity. The argument for the conservative position is rarely substantiated by published data, but is vehemently argued by the practitioner, based upon ill defined “experience. ” In the applied business environment quite often it is unnecessary to scientifically demonstrate the validity of one’s statements, biased or not. It is often simply a matter of who has the budget, that individual’s particular biases, and the desire of others not to interfere. The academic world, happily, is not subject to that particular problem,
130
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
or at least that particular problem expressed in those particular terms. Academics are delighted to argue with each other, using empirical data as the weapons. Some of the “experience” with so-called fatigue may come from self-reports of individuals such as marketing professionals, who, during the course of an extended “cutting” with many prototypes, complain about fatigue. Yet, the scientific literature belies this experience. First, the auditory and the visual system have no problem in fatigue, especially when it comes to food. Second, the kinaesthetic system has no problem either, unless the food is so hard to chew that it actually takes several minutes to chew before swallowing. We are thus left with the chemical senses - taste and smell. For taste, one requires only a short rest period between samples in order to regain sensitivity. It requires extensive continual flow of a constant stimulus over the tongue for an extended period before the person reports that the taste impression has ceased. This type of extended stimulation can only be done by artificial forms of stimulation that force a continuing stream of stimulus over the receptor, quite unlike what happens in real life (Meiselman 1971). Despite the reality that people can and do function quite well even while tasting many samples in a test session, there must be some reasonable number of samples, beyond which the panelist really does fatigue, and no longer performs quite as well as he performs with fewer samples. How many samples does it take to tire the motivated panelist - 5?, lo?, etc. The answer is not clear, and resides in the nature of the product, and the motivation of the panelist, as well as in the test conditions. In the worst situation, the panelist may (1) test four highly spiced products, one after the other, in descending order of spiciness with the spiciest test first, and the least spicy tested last; (2) rush the evaluation and not let the mouth rest; (3) maintain residual of one product while testing the other; (4)in the end lose sensitivity because there has been no time for the panelist to recover. One might conclude in this type of situation that the panelist cannot test four highly-spiced foods. One would be right - under these far from optimal conditions, the panelist simply cannot test the four samples and assign accurate ratings. Let’s modify the evaluation activities in light of what is known about sensory functioning, and with the desire to maximize the amount of valid information we obtain. First, let us have the panelist rate the same four samples, randomized in order of spiciness. Force the panelist to wait five minutes between samples, and both to rinse with water and use a cracker to remove any solid residue. The panelist may not like the wait, but the panelist can perform this task quite well. The result will reveal no appreciable loss in sensitivity. There may be a minor change in the threshold under such a situation, but in the real world of product testing we deal with suprathreshold stimuli, where the sensory characteristics are clearly distinguishable.
SAMPLE ISSUES IN CONSUMER TESTING
131
In situations concerning large sets of products (e.g., 20+) it is clearly more difficult to expose a single panelist to the full, large set. One could recruit panelists to return for a second day, but that requires some incentive, whether the incentive comes from money (reward for the panelist to participate), or from orders, rank or coercion (punishment of the panelist for not participating). In general, it is hard to work with larger sets of products. One strategy determines the maximum number of samples that a panelist can taste in a single session, without truly losing sensitivity or more importantly losing interest. Extended sessions for which the panelist is paid can last as long as four hours, with the payment concomitantly high. In those four hours the panelists can rate a dozen sausages, sixteen or more soft drinks, etc. As long as the panelist is sufficiently motivated to do the task, the panelist can evaluate what may seem to be an awfully large set of samples, yet do the evaluations in a valid fashion. Validity here can be defined as the ability of the panelist ratings to track the physical changes in the product. One can validate these sensory ratings because the panelist acts as a measuring instrument. It is difficult to validate liking ratings, because there is no a priori measurement of validity or at least an external correlate, as there is in sensory profiling. Table 8.1 (Moskowitz 2000) shows a list of products that the author has subjected to this type of analysis (viz., correlation of sensory attribute rating with physical stimulus level), and has discovered that the panelist can rate the products, assigning answers that appear to be valid.
Strategies: Blocking Versus Total Randomization If the panelist evaluates all of the products in a fixed order, then there is likely to be bias. The degree of the bias is unknown. Yet, we know from practical experience that there are carryover effects, so that the experience with one product affects the experience with another product. The researcher usually does not have the finances to test all of the products in every possible pennutation so as to minimize the order effect. Nor can the researcher restrict studies to using the limited number of experimental designs in which order has been balanced across all products. All too often there are other considerations, such as different numbers of products that must be accommodated, even if there are no easily available designs to cover the randomization of such products. Statisticians are happy to provide the researcher with different designs to eliminate order bias, but usually at the price of severely limiting the number of products and the order of the products. With roots in the psychophysical tradition, this author considers the issue of stimulus order in a different way. Order, it is agreed, can play a secondary disturbing role, hiding the effects of the products and introducing its own bias. The psychophysicist’s tactic to remove the order bias is simply to randomize the order of products, so that each
132
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
panelist tests the products in a different order. If the panelist can only test a subset of the full set of products, then each panelist should test this randomized set, subject to the constraint that all samples are seen an equal number of times. Within the limits of reasonableness the samples are distributed so that pairs of samples are never tested too frequently together. The statistical considerations of perfect balance can be modified and reduced to a set of considerations that can be accommodated in a practical, cost-effective manner. For most stimuli tested by sensory scientists, however, the effects of carryover and order, beyond the product tried first, are relatively minor. A short rest to remove residual material and allow re-sensitization will work quite well. TABLE 8.1. MAXIMUM NUMBER OF PRODUCTS THAT SHOULD BE TESTED (MAX), MINIMUM WAITING TIME BETWEEN SAMPLES (WAIT), PRE-RECRUIT, FOUR HOUR, CENTRAL LOCATION TEST
Source: Moskowitz (2000)
SAMPLE ISSUES IN CONSUMER TESTING
133
Occasionally difficulties arise when the samples require preparation, so that they cannot be perfectly randomized. There may not exist a proper randomization scheme that can accommodate the particular peculiarities ensuing when one of these five situations occurs.
(1) The products require different preparation times. (2) One or more products may be missing from the shipment. (3) The products may require different types of preparation. (4) The product has an unusually strong taste that requires that it be tested later in the sequence. ( 5 ) There are a limited number of samples of one product (e.g., a hard to create prototype), but unlimited quantities of other prototypes. In each of these situations the prudent course of action accepts a randomization scheme that attempts to balance total frequency of appearance of all of the samples and minimizes any fixed order of appearance. Each has the opportunity to appear equally often in every position. In concluding this section one can evaluate the true effect of the number of samples and the order of samples by procedures such as the analysis of variance. A good determination about the number of products that a single panelist can evaluate in one session requires each panelist to go through a set of products on different days, with the same product category, with one day the evaluation comprising two products, another day the evaluation comprising 10 products, etc. The specific products should be randomized across panelists. The one-way analysis of variance provides a measure of the signal-to-noise ratio of the product versus the number of samples. Another good test, this time looking at the order effect, uses the two-way analysis of variance, with one factor being the order, and the other factor being the product. If the same panelists evaluate the products in a limited set of orders, one order per day, with the orders randomized, then one can determine the effect of order and of product. The author’s bet is that the product differences will be far greater than the order differences.
Strategies: The Training Sample Psychophysicists studying sensory processes often use training samples, although not necessarily by that name. These training samples are relevant in two different ways. (1) They introduce the panelist to the task. Quite often panelists really don’t know what is expected of them. Training samples or better “orientation samples” whose data can be discarded introduce the panelist to the task. By
134
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
using a training sample along with the questionnaire, the researcher ensures that the panelist understands the task, learns how to complete the questionnaire, and can clarify aspects of the evaluation that are unclear. That introduction is useful, and can improve the data quality. (2) The data from the training sample are typically discarded, so that it never enters into the calculation. In many situations the training sample or first sample is biased. For example, in product tests the first sample tends both to be assigned a higher rating than it would be assigned when in the middle of a set of samples, and also exhibits higher variability in ratings across the panelists (viz., indexed by the standard deviation). By using a training sample and eliminating the ratings the researcher can reduce the variation. Training samples are not, however, always called for. There are some practical considerations that dictate no training sample. One of these considerations, in particular, occurs when the most meaningful response to the product occurs on the first exposure. If the product is a truly new product, then the first time that the panelist is exposed to the product provides the most relevant data. The second exposure would provide less relevant data. A training sample in such a situation would defeat the purpose by eliminating that initial exposure. One creative solution for this dilemma requires the panelist to practice with a product, but not one of the test products, in order to give the panelist some experience with the questionnaire and test situation. That experience should not bias the results, unless the questions “tip-off” the panelist about what to look for in the real test product. If these key questions are themselves important, then they should not appear for the training product either. As a consequence the training situation would involve a stimulus different in nature from the test stimulus, rated on attributes that are neutral. The training stimulus and the questionnaire would give no hint about the attributes that will be used in the actual evaluation. They would serve only as a “warm-up.”
REFERENCES MEISELMAN, H.L. 1971. Effect of presentation procedures on taste intensity functions. Perception and Psychophysics 10, 15-18. MOSKOWITZ, H.R. 1981. Psychophysical scaling and optimization of odor mixtures. In: Odor Quality And Chemical Structure, pp. 23-256, H.R. Moskowitz and C.B.Warren, eds. American Chemical Society, Symposium Series # 148, Washington.
SAMPLE ISSUES IN CONSUMER TESTING
135
MOSKOWITZ, H.R. 2000. R&D -driven product evaluation in the early stage of development. In: New Food Products For A Changing Marketplace, pp. 277-328, A.L. Brody and J.B. Lord, eds. Technomic Publishing Co., Lancaster, Penn. MOSKOWITZ, H.R. and GERBERS, C. 1974. Dimensional salience of odors. Annals of the New York Academy of Sciences 237, 3-16. MOSKOWITZ, H.R., JACOBS, B.E. and LAZAR, N. 1985. Product response segmentation and the analysis of individual differences in liking. J. Food Quality 8, 168-191. MOSKOWITZ, H.R. and KLARMAN, L. 1975. The tastes of artificial sweeteners and their mixtures. Chemical Senses and Flavor 2, 41 1-422. STEVENS, S.S. 1968. Personal communication.
ALEJANDRA M. m O Z Sample issues have a great impact on the quality and the cost of consumer tests. Too many unnecessary samples not only increase the cost of the test, but may create other problems such as the unnecessary fatigue of the consumers, carryover effects, etc. Conversely, an incomplete sample set will not provide the required information for the project. Sample issues, such as the number and characteristics of samples for the test, have to be carefully assessed in the design of a consumer test.
Selection of Samples for Testing The sensory/consumer scientist can be faced with either one of these scenarios when the test samples are selected:
Scenario 1. The necessary test samples are discussed in a planning/ preliminary meeting. The project objective and the consumer questionnaire are discussed, and all project participants interact to make decisions on the samples needed to meet the project objectives (Lawless and Heymann 1998). Ideally, all parties should participate in this meeting to present their objectives and use their diverse expertise to make product decisions. The characteristics of the test samples are discussed, and they are either acquired or produced to meet the specifications discussed in the meeting. This is the ideal scenario and provides the most benefits to the project. Scenario 2. The project leaders, such as the product developers/chemists or marketing/market researchers, produce or obtain a set of samples to be consumer tested. There is no scheduled planning meeting to discuss the project
136
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
objectives or the samples needed. The sensory consumer scientists are not involved at the early project stages, and do not have input into the test design and needed sample set. They are involved at later project stages when decisions have already been made on the number and type of samples. Sensory professionals design the test based on the set of available products. Obviously, scenario 1 is the best option and sensory professionals who do not work with their teams under these circumstances on a regular basis need to discuss the benefits of their early involvement with project leaders and managers. When scenario 2 occurs, sensory professionals need to screen and assess the available products to be able to recommend a test design. Ideally, this review should also be conducted with all team members. In this product review, the products presented for testing are assessed vis-h-vis the project objectives. The test is designed based on the available sample set. Often, this test design is not the ideal. All caveats and possible limitations need to be discussed up front. Whenever possible, and if the test is not to address all objectives, new/additional test samples can be acquired or produced and added to the existing sample set. A review of the test samples is always recommended, even under scenario 1 described above. Below is a discussion of some of the common circumstances that sensory professionals may face, in reviewing and selecting test samples. (1) Duplicate Products. In the sample set there may be products that present very similar sensory characteristics, Sensory professionals can easily provide this assessment, when applicable. Duplicate products should be eliminated. The cost and quality of the test are affected by additional unnecessary samples. (2) Unnecessary Products in Set. In the samples set there may be products that do not meet the project objectives and need to be eliminated from the sample set. Sometimes, unnecessary samples may be presented by the test requester because they have not completed a product review to eliminate inappropriate samples. Other times, the test requester may add products to the main sample set to obtain information for another project or test objective. These products should be eliminated since they affect the quality of the test, and the type of information obtained for the other samples. If deemed appropriate and if the test design allows, these additional sample@) may be evaluated at the end of the test. If tested last, these additional samples do not have an effect on the quality of the main test samples evaluated first, and thus the main study. However, the evaluation of these additional products may be affected by the first and most important
SAMPLE ISSUES IN CONSUMER TESTING
137
set of products. The resulting data for these additional samples need to be carefully assessed.
(3) Products that Do Not Meet Project Objectives. Many times the products presented for testing are inadequate for the test objectives. Samples may be missing, not be different enough, not present the sensory characteristics of interest, etc. Under this scenario, sensory professionals need to recommend that new samples be produced/acquired. When not enough time has been allotted prior to the test execution, the request for additional samples may be problematic. The test may need to be postponed. If the test is conducted, a succinct discussion of the limitations that apply due to inadequate test samples used, needs to be held. Number of Samples Included in Test There are projects (e.g., optimization, category appraisal projects) that will generate a large number of samples (above 10). The product review discussed above should be conducted in order to select the needed samples, and the best design to test this large number of samples, including the best sample presentation, the number of samples a consumer will evaluate, etc. (Muiioz er al. 1996; Moskowitz and Market0 2001). The maximum number of samples evaluated by consumers depends on the type of product, the task required of the consumer, the total time allowed for the participant to evaluate products, and the rest periods given in between samples. Both physiological and psychological fatigues need to be considered when determining the maximum number of products to be evaluated by a consumer. There are many advantages to test all samples by consumers. In this case, the sensory professional needs to determine if the consumer is able to evaluate all samples in one session, or if more sessions are needed to complete all sample evaluations. However, often, the best test strategy is the evaluation of a subset of samples, using a balanced incomplete block design (BIBD) (Cochran and Cox 1957). ( 1 ) Product Type. For bland foods (e.g., crackers, some soups and beverages) fatiguing effects are not an issue, and a relatively large number of samples can be evaluated in a session by consumers. Fewer samples per session should be presented when evaluating other types of foods, such as foods with high intensity of certain flavors or chemical factors (e.g., spicy or bitter foods), and most personal care products. There are some products that require one session for every product or pair of products evaluated by consumers (e.g., hair care and dental products, lotions and creams).
138
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
(2) Task Required. Fewer products per session can be presented when there is an involved task to be completed by the consumer. This may include completing longquestionnaires, several questionnaires (e.g., product related surveys, attitudes, etc.), several product evaluations for the same product, preparing products, completing involved handling and evaluation protocols, e.g., evaluation of dishwashing and laundry detergents, etc.
(3) Total Available Time and Rest Periods. In most multi product tests, rest periods decrease stimuli fatigue effects, and therefore allow more products to be evaluated within a session. Rest periods in between samples should be as long as possible to reduce fatigue and adaptation. In addition, the use of rinsing agents (Allison ef al. 1999) further decreases stimuli adaptation and minimizes carryover effects, thus allowing more products to be evaluated in a session.
To determine the number of samples, the way they will be evaluated/test design, the type of rinsing agents and rest periods needed in a test, the sensory professional needs to screen and become familiar with the products. Generally speaking, small sample sets are typically tested with products that fatigue the senses. Potential carryover effects should be compensated for and anticipated. It is recommended that when carryover effects are suspected, a larger consumer data base per sample be used to allow for data segmentation, when order effects are found. In addition, special test designs should be considered. MacFie ef al. (1989) discussed designs to balance the effect of order of presentation and firstorder carryover effects in consumer tests. When multiple products can be evaluated, a decision has to be made regarding the use of balanced incomplete block designs (BIBD) (i.e., the consumer evaluates only a subset of all products) (Cochran and Cox 1957; Wakeling and Buck 2001), or complete block designs (i.e., the consumer evaluates all test products). The use of a BIBD is the best way to test a large number of samples from the psychological and physiological point of view. Consumers only evaluate a subset of samples and do not get mentally and physiologically fatigued. The disadvantage of using BIBDs is that the study becomes more expensive, since a larger number of consumers need to be recruited and compensated to complete the test. When the consumer evaluates all products, the test can be conducted in one or several sessions. The test becomes less expensive than using a BIBD. However, the consumer may be mentally and physiologically fatigued and the quality of the data may be affected. When the test is conducted in only one session, the sensory professional needs to ensure that enough rest periods are provided to the consumer. Conducting the test in more than one session alleviates the problem of fatigue. In this case, most professionals assume a no
SAMPLE ISSUES IN CONSUMER TESTING
139
session effect in the statistical analysis, and all samples across test days are compared. Different designs and presentation schemes should be assessed to select the best one for the sample set to be tested (Gacula and Singh 1984; Gacula 1993; Wakeling and Buck 2001; Wakeling ef al. 2001). To summarize, the best decision on the number of samples is made when the sensory professional reviews the sample set prior to making any decision on the test design. It is difficult to recommend a maximum number of samples that a consumer can evaluate, since this decision is product dependent, as explained before. Generally speaking, in cases where multiple samples can be evaluated (i.e., mild food and beverage products), a consumer could evaluate four to seven samples per hour, if the type of questionnaire is not excessively long.
Lot And Code Uniformity Another important sample issue to consider in consumer testing is the uniformity of the products, since large amounts of products are needed in consumer tests. The within sample variability may be easier to control for samples produced as prototypes or in the pilot plant. However, for some products the opposite occurs and prototypes may present a large within sample variability, since its production is not an established and controlled process. When commercial products are used, all possible attempts should be made to ensure that the same code or production date is used for the complete test. This may be difficult to achieve in some cases, when a very large amount of product is needed. The test administrator should contact warehouses and attempt to get the same product code/production date. In addition to the above precautions, the within sample variability must be assessed before the test is designed and conducted. At that time, decisions regarding sample selection, purchasing and production are made to achieve the desired product uniformity. Alternatively, information on the lack of product uniformity needs to be taken into consideration in the data analysis and interpretation.
Pretesting/Trial Whenever possible, sensory professionals should conduct a small scale test (pretest) prior to the execution of the actual consumer test, to ensure that all test conditions are adequate (Resurreccion 1998). This pretest assesses all test conditions ensuring that the actual test will be executed as planned. The sample issues discussed herein, as well as other test design parameters, are pretested in this trial. Upon completion of this pretest, modifications can be incorporated to optimize the consumer test execution.
140
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
REFERENCES ALLISON, A.A., CHAMBERS, E., MILIKEN, G.A. and CHAMBERS, D.H. 1999. Effects of interstimulus rinsing and time on measurements or capsaicin heat in tomato salsa. J. Sensory Studies 14, 401-414. COCHRAN, W.G. and COX, G.M. 1957. Experimental Designs. John Wiley & Sons, New York. GACULA, JR., M.C. 1993. Design and Analysis of Sensory Optimization. Food & Nutrition Press, Trumbull, Conn. GACULA, JR., M.C. and SINGH, J. 1984. Statistical Methods in Food and Consumer Research. Academic Press, San Diego. LAWLESS, H.T. and HEYMANN, H. 1998. Sensory Evaluation of Food. Chapman & Hall, New York. MACFIE, H.J.H, GREENHOFF, K., BRATCHELL, N. and VALLIS, L. 1989. Designs to balance the effect of order of presentation and first-order carryover effects in hall tests. J. Sensory Studies 4, 129-148. MOSKOWITZ, H.R. and MARKETO, C. 2001. Selecting products for category appraisal studies. A fewer products do almost as well as many products. J. Sensory Studies I6(5), 537-549. MUROZ, A.M., CHAMBERS, E. IV. and HUMMER, S. 1996. A multifaceted category research study: How to understand a product category and its consumer responses. J. Sensory Studies 11, 261-294. RESURRECCION, A.V.A. 1998. Consumer Sensory Testing for Product Development. Aspen Publishers, Gaithersburg, MD. WAKELING, I.N. and BUCK, D. 2001. Balanced incomplete block designs useful for consumer experimentation. Food Quality and Preference I2(4), 265-268. WAKELING, I.N., HASTED, A. and BUCK, D. 2001. Cyclic presentation order designs for consumer research. Food Quality and Preference 12( l), 39-46.
MAXIM0 C. GACULA, JR. I mostly agree with the presentations of Muiioz and Moskowitz, which are based on many years of experience in sensory and consumer tests of various types of products. The inclusion of the psychophysical basis of sensory evaluation in Moskowitz's section should be educational to both sensory scientist and market researcher. The following delineates and expands my thoughts on the issues.
SAMPLE ISSUES IN CONSUMER TESTING
141
Shelf-Life Samples Moskowitz mentioned the distribution chain of products, which is perhaps the most important factor in the determination of shelf-life. Products subjected to a shelf-life study should simulate the production of the product and its distribution chain, including distance of travel and storage conditions during travel, especially for perishable products. In shelf-life studies the selection of samples starts at time of production, when the samples are randomly obtained within a production shift, tagged, followed through the distribution channel up to when the tagged products reach the store, and finally purchased on the shelf with prior arrangement from the store management. In this type of study, production and environmental variability should not be controlled as they are part of shelf-life. Doing it this way yields shelf-life estimates that are robust, and should be the ultimate goal in shelf-life studies. The shelf-life of a product is printed on the package. The information is generally given by various names, depending on the products, such as “use by date,” “freshness date,” etc. In procuring samples for consumer testing, it is important that the code dates of the products to be tested be numerically close to each other, regardless of the lots it came from. Lot variability should be expressed in the estimate of shelf-life because this is a part of reality that should not be controlled.
Presentation of Samples In the absence of a statistician, the logical presentation of samples is total randomization. It is assumed that by randomization, order or carryover effects will cancel out. As stated by Moskowitz and Muiioz, carryover effects are product dependent, i.e., highly spiced or flavored products have high carryover effects. In the estimation of product differences, the estimated differences are confounded with carryover effects. If carryover effects are approximately uniform from presentation order to the next, then there is no problem with the total randomization method because these effects cancel each other. The problem occurs when carryover effects are unequal from one order of presentation to the next. In this case, the estimate of product differences are biased, the magnitude of which is unknown. The estimate of carryover and treatment/product effects for a two-product test can be easily calculated. The statistical procedure is described by Gacula et al. (1999) for case studies of data from deodorancy, toothbrush, and male shaving products, respectively. The extent of carryover effects in personal care products has been reported by Gacula ef al. (1986). Although carryover effects were found to be not statistically significant in three data sets (dishwashing products, household cleansing products, solid antiperspirant), which supports Moskowitz’s beliefs, the presence of carryover effects in
142
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
the data as a systematic and/or random source of variation should be recognized in the design of sensory and consumer testing studies. In most consumer studies, several products are evaluated and the issue of the number of samples to be evaluated per session becomes important. This issue gets even more complicated because not all samples or products are evaluated in one setting. The balanced incomplete block design (BIBD) as mentioned by Mufioz has been used in practice, but is not recommended because of the problem of context effects. Sensory integration varies to a great extent between individuals. As a result, sensory data are relative, skewed and difficult to replicate. “Relativeness” calls for a control or reference sample in each evaluation setting. It is important to remember that the presence of a reference sample contributes to a large extent in stabilizing sensory data. The problem of context effects can be minimized by modifying the BIBD with the incorporation of a reference/control sample in every block, which was recognized over 20 years ago (Gacula 1978, 1993). In my view, the BIBD augmented with control should be used instead of the traditional BIBD. Augmented BIBD has been successfully used in the consumer product industries. In some applications, the reference sample is identified to the respondents which is acceptable. An important study by Mayer and Mulder (1989) comparing the statistical efficiency of incomplete block designs against the randomized complete block designs in sensory evaluation experiments showed that with three and four samples per session, incomplete block designs were 3 1% and 2% more efficient, respectively, than randomized complete block designs. When five or more samples were tested, the incomplete block designs were markedly less efficient. It can be added that this result can be statistically verified. The statistical efficiency of BIBD decreases as the design approaches the randomized complete block design (all samples evaluated by a respondent in one setting). We need more published studies similar to that of Mayer and Mulder (1989), and hopefully more will appear in the next decade. The problem of balancing the order of sample presentation is another important issue. For two products, it is not an issue as the orders can be either AB (sample A followed by sample B) or BA. For more than two samples, the Latin square designs as discussed in many textbooks (Cochran and Cox 1957; Gacula and Singh 1984) are used to balance the order of sample presentation. In particular, the paper by MacFie et al. (1989) gave designs to balance the effect of presentation order and first-order carryover effects for 4 to 16 samples to be studied. In consumer testing, respondents come at random to the test site and the implementation of augmented BIBD and Latin square should present no serious problem.
SAMPLE ISSUES IN CONSUMER TESTING
143
Other Sample Issues Moskowitz mentioned training samples. This author (MCG) is a proponent of Total Quality. As such, the author believes that training samples may sanitize the data and may not reflect the natural variability of the product and consumer reactions under actual use. Total Quality ideas emphasize that uncontrolled variability during product use is a reality and should not be controlled in order to make the product evaluation results more robust. Robust results are more repeatable and realistic. Use of “duplicate products” as indicated by Muiioz is a good practice for estimating pure error in analysis of variance (Aust ef al. 1985; Gacula 1987). The data from duplicate samples are used to estimate pure error making the test statistics more powerful. The use of ‘‘unnecessq products” in a test would be open to question - how valid and reliable would the data collected be, since the ratings would be relative to the main products of the test? There are products that can be manufactured in laboratory conditions. Generally, the laboratory production facilities are almost a “carbon copy” to that of the plant facilities. Then there would be no problems using these samples for consumer testing work. This route is usually used in ingredient optimization studies that utilized various types of experimental designs, i.e., central composite design, fractional or full factorial, etc. Perhaps, one of the difficult choices in sample preparation is “blinding.” Products in the market have obviously different packaging designs, different product shapes, different sizes, and brand names are identified on the package. The effects of physical differences between products on consumer responses produce bias results when not concealed, which are well-documented in the literature (Gacula et al. 1986; Holstius and Paltschik 1983; Moskowitz 1983). For antiperspirant and toothbrush products, blinding is accomplished by taping the brand names, however other physical differences remain but are not as biasing as brand identification. As a final remark on sample issues, the BIBD augmented with control and the Latin square are the recommended designs in either laboratory or consumer testing situations. A statistician should be consulted if some subsets of the design failed to meet design requirements. REFERENCES AUST, L.B., GACULA, JR., M.C., BEARD, S.A. and WASHAM 11, R. W. 1985. Degree of difference test method in sensory evaluation of heterogenous product types. J . Food Science 50, 51 1-513.
144
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
COCHRAN, W.G. and COX, G.M. 1957. Experimental Designs. John Wiley & Sons, New York. HOLSTIUS, K. and PALTSCHIK, M. 1983. Brand names and perceived value. European Research (UK) 5, 151. GACULA, JR., M.C. 1978. Analysis of incomplete block designs with reference samples in every block. J. Food Sci. 43, 1461-1466. GACULA, JR., M.C. and SINGH, J. 1984. Statistical Methods in Food and Consumer Research. Academic Press, San Diego. GACULA, JR., M.C., WASHAM, R.W., BEARD, S.A. and HEINZE, J.E. 1986. Estimates of carry-over effects in two-product home usage consumer tests. J. Sensory Studies 1 , 47-53. GACULA, JR., M.C., RUTENBECK, S.K., CAMPBELL, J.F., GIOVANNI, M.E., GARDZE, C.A. and WASHAM 11, R.W. 1986. Some sources of bias in consumer testing. J. Sensory Studies I, 175-182. GACULA, JR., M.C. 1987. Some issues in the design and analysis of sensory data: Revisited. J. Sensory Studies 2, 169-185. GACULA, JR., M.C. 1993. Design and Analysis of Sensory Optimization. Food & Nutrition Press, Trumbull, Conn. GACULA, JR., M., DAVIS, I., HARDY, D. and LEIPHART, W. 1999. Carry-over effects in sensory evaluation, Case studies. pp. 142-147, P.R. Killeen and W.R. Uttal, eds., Proc. 15" Annual Meeting of the International Society for Psychophysics, Tempe, Ark. MACFIE, H.J., BRATCHELL, N., GREENHOFF, K. and VALLIS, L.V. 1989. Designs to balance the effect of order of presentation and first-order carryover effects in hall tests. J. Sensory Studies 4, 129-148. MAYER, D.G. and MULDER, J.C. 1989. Factors influencing the efficiency of incomplete block designs in sensory evaluation experiments. J. Sensory Studies 4, 121-128. MOSKOWITZ, H.R. 1983. Product Testing and Sensory Evaluation of Foods. Food & Nutrition Press, Trumbull, Conn.
CHAPTER 9 HEDONICS, JUST-ABOUT-RIGHT, PURCHASE AND OTHER SCALES IN CONSUMER TESTS HOWARD R. MOSKOWITZ Affective scales come in a variety of types. One type measures degree of acceptance, pure and simple. The traditional nine-point hedonic scale represents one of these simple scales. The panelist is instructed to record the degree to which he or she likes or dislikes the product. There are the usual dissenters from this simple balanced scale who feel that the scale may need unbalancing, such as to give greater discrimination power for the acceptable products where the “action is.” Other dissenters may feel that another scale with more or with fewer scale points is better (e.g., a seven point scale, because panelists never use the 1 and 9; or conversely an eleven-point scale, for the exact same reason). Whether the scale is bigger or smaller, magnitude estimation (Moskowitz and Side1 1971) or category (e.g., Peryam and Pilgrim 1957), the simple liking scale provides exactly what it promises - degree of acceptance, providing of course, that the critic believes that panelists can validly scale how much they like or dislike a product. The standard purchase intent scale, usually comprising 5 points from definitely not buy to definitely buy, deals with an entirely different question. The panelist is asked to state a behavioral intent - would the panelist buy or not buy the product. By putting the action into the scale the researcher begins to deal with behavior, and not just with an attitude. Of course products that score high on purchase intent scales may, in fact, not be purchased when they are marketed. The failure for the purchase intent scale to predict actual purchase is more a validity issue rather than a reflection on the scale, per se.
Average Ratings Versus Percent Using a Scale Point The hedonic scale is considered to provide a measure of the “amount of” or “degree of” liking. The hedonic scale emerged from food research at the U.S. Army Quartermaster Corps. in Chicago. It is pretty straightforward to scale liking - the panelist likes or dislikes the sample, and to what degree. The hedonic scale is easy to use, intuitive, and makes no pretense of measuring anything other than an individual’s liking of the product. The purchase intent scale is more problematic. This scale often has five points, ranging from “definitely would not buy” to “definitely would buy.” Although on the surface the purchase intent scale seems simple, it is intuitively more difficult to understand. According to what criterion is the panelist supposed 145
146
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
to judge purchase intent? If we combine the product with a very high price, then the typical panelist may say that he loves the product (high liking) but would never buy the product (at the price offered). In contrast, if we combine a very low price with a very poor product, then quite often the panelist would say that he doesn’t really like the product because it achieves a low hedonic value, yet the very low price attached to it makes the product a bargain. The purchase intent scale is not strictly tied to the product. It brings along other, cognitive factors as well. The reader should note that similar cognitive problems pervade other scales, such as appropriateness for a specific end use. A product may be very well-liked in and of itself, but that high degree of liking does not transfer to appropriateness for every occasion. Researchers working with the hedonic scale typically use efficient statistics, such as the mean and the standard deviation, which are based upon the metric values. That is, the actual scale value itself, and the average of the scale values are important. Even down to a score of one person the scale is meaningful, because it shows how a particular person feels about a particular product. Of course, conventional wisdom typically prescribes a larger, more representative base size (e.g., 30+ panelists, for the law of large numbers to begin operating). Consumer researchers working with the purchase intent scale analyze it differently from the way that they analyze the liking scale. Often, and for the past 40 years, consumer researchers have used the percentage of panelists scaling a product as 4 (probably purchase), or 5 (definitely purchase) as the statistic of interest. Occasionally the researcher may also calculate the bottom two boxes, corresponding to probably and definitely not purchase, and incorporate these into the analysis. The researchers thus convert the metric value of the scale to an incidence value - viz., the proportion of the panelists who use each scale point. To some degree we can trace this use of the scale back to a different intellectual tradition - the sociology tradition, where interest focuses on the proportion of people who would do something, rather than on the intensity of the feeling of those people. It should come as no surprise that the purchase intent scale is used when the ultimate goal is to estimate purchase behavior, and less frequently used when the goal is to guide product developers. To some degree this difference in intellectual history is problematic. Quite often consumer researchers are asked to guide product developers. It is difficult for a product developer to make much sense out of the percentage of panelists who feel that the product is “extremely good,” or “definitely would buy,” versus panelists who feel that the product is “moderately good,” or “probably would buy. That is, incidence statistics confuse a product developer, because they do not give clear direction. We will see similar types of problems emerging for product developers who have to use “just about right” scales. Metric information, not incidence information, is useful to guide product developers. ”
CONSUMER TEST SCALES
147
The Just-About-Right Scale - Do Panelists “Know” Their Ideal Point? One of the most popular scales for product development is the “just right scale.” This scale comprises two anchored ends (far too little; far too much), and a middle point (just right). The scale (abbreviated as JAR) is widely used to identify problems with products, and to guide development. When the scale is used in the field, for data collection, it appears to engender few problems. Consumers appear to be able to use the scale correctly, aver that they have no problems understanding the concept of “too much” versus “too little,” and can often intuitively guess about the answer. Analytically the JAR scale appears to allow for statistical manipulation, analyses of differences, of significance, etc. Sometimes the researcher centers the scale by subtracting the midpoint - a manipulation that makes the scale easier to use, but does not have any impact on the meaning of the numbers. The JAR scale is widely used by market researchers as well as sensory scientists. The market researcher, working with large samples of panelists typically reports the results in terms of the proportion of panelists who say that the product has too much of a characteristic versus the proportion of panelists who say that the product has too little of a characteristic. However, it is perfectly legitimate to report JAR data in terms of the magnitude of the deviation from “just right.” In management meetings the JAR scale is easy to understand, and makes a nice presentation. The product is presented as the column, and the rows correspond to the attributes. It is easy to see where the product appears to be out of specification, at least according to the consumer. When a group of products are arrayed with the attributes as rows and the products as columns it is easy to identify which particular products have specific problems. So far all is well and good. The problem comes about when the researcher analyzes the data, and then attempts to formulate a recommendation based upon the results. Exactly what do ratings on the JAR mean? Does the panelist have available an internal “ideal” product against which the test product can be compared? After all, in reality the JAR scale requires the panelist to perform three tasks simultaneously - evaluate the product, compare the product to an internal “ideal” product, and then report what characteristics are out of kilter. Another problem arises when the developer attempts to create a product that has few deviations from “just about right.” Is this product highly acceptable? That is, when the developer follows the researcher’s suggestions and creates a product that is “on target,” then will this product be as good as it can be? Will this be the optimum product? When working with the JAR scale the researcher and the product developer are likely to encounter two categories of problems: biases, and contradictions/ tradeoffs.
148
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
Biases. When the panelist rates a product using the JAR scale, all attributes are not created equal. For example, when rating “authentic chocolate” flavor of a cookie, no product ever possesses too much ”authentic chocolate flavor.” Some rare products may be scored in the middle (just about right), but the odds are quite high that across many cookies the vast majority of chocolate cookies will not have the requisite amount of the attribute. Consequently, the report will come back to the developer that the product needs “more authentic chocolate flavor. ” Many novice product developers will add more chocolate flavor until the cookie becomes extraordinarily bitter tasting. A similar problem emerges when the developer works with the fattiness of meat. Ask panelists about fattiness and most will say that today’s products are simply too fatty, and that they would prefer products with less fat. Many sausages would be rated as having too much fat. The ideal point for fat is lower than the level currently available. Yet, the novice product developer soon discovers that despite the JAR scale, the panelists actually prefer the products with more fat. Contradictions and Tradeoffs. When the panelist rates a product using the JAR scale the panelist rates each attribute separately. That is, when asked about the flavor, the panelist may respond that the flavor is too strong. When asked about other qualities of the flavor, the panelist may say that the flavor is too weak. To the panelist, and on an attribute-by-attribute basis, there are no possible contradictions, since the panelist scales each attribute separately. A product developer faced with the JAR data must make tradeoffs, identifying what particular product features need to be delivered, and what product features, although off-target, need not be delivered in the final product. The contradictions and tradeoffs can be seen most clearly in experimental designs where the panelist does not know that the ingredients or process conditions have been systematically varied. In some instances (Moskowitz 1994; Moskowitz 2001), the panelist rated the product on the JAR scale on a variety of attributes. The panelist also rated the product on liking. By creating the product model the researcher could identify the combination of ingredients that maximized liking. This optimal combination did not generate estimated JAR values of 50 (just about right) on an anchored 0-100 scale (0 = far too little, 50 = just right, 100 = far too much). Furthermore, by using the method of reverse engineering it was possible to identify a product that delivered an expected profile of 50 on the different JAR scales. This was not the optimally accepted product, although it was an acceptable product. In general the JAR scales appear to work best when the attribute is truly descriptive, and does not have evaluative aspects. As an example, the attribute “real chocolate” is probably evaluative even though it sounds very much like a sensory descriptive attribute. The attribute “darkness” is virtually always a sensory descriptive attribute. Typically texture and appearance attributes do best
CONSUMER TEST SCALES
149
with JAR scales, whereas flavor attributes, and especially emotion-laden flavor attributes tend to do poorly, although not always - it is a function of the product and the attribute. Attributes that do poorly in some ways have a judgmental or hedonic aspect attached to them. Although people may like fatty products or salty products it is deemed socially inappropriate to express this desire, resulting in an incorrect JAR value. In contrast, it is neither positive nor negative to prefer a larger-sized product to a smaller-sized product. REFERENCES MOSKOWITZ, H.R. 1994. Food Concepts and Products: Just In Time Development. Food & Nutrition Press, Trumbull, Conn. MOSKOWITZ, H.R. 2001. Sensory directionals for pizza: A deeper analysis. J. Sensory Studies 16, 583-600 MOSKOWITZ, H.R. and SIDEL, J.L. 1971. Magnitude and hedonic scales of food acceptability. J. Food Science 36, 677-680. PERYAM, D.R. and PILGRIM, F.J. 1957. Hedonic scale method of measuring food preferences. Food Technol. 11, 9- 14.
ALEJANDRA M. m O Z Acceptance/Liking Scales In most sensory consumer questionnaires acceptance/liking (also called hedonic) scales are included. In some cases, preference is asked with or without acceptance/liking scales. The most well-known and researched hedonic scale is the 9-point hedonic scale (Peryam and Pilgrim 1957). A considerable amount of research was done in structuring this scale, in terms of number of categories and anchors in each category. Currently, the 9-point hedonic scale continues to be used in its original form. However, other versions of this popular scale have also been used. Valid variations of the original scale include: fewer categories, fewer word anchors, and a different format (category versus line). Some scales may include several modifications (e.g., 7-point hedonic line scale). In addition, the scale has been modified for children’s tests. In this chapter, the main modifications and caveats are discussed. Fewer Categories. This modification is supported by many sensory scientists, and criticized by others (Parducci and Wedell 1986; Riskey 1986).
150
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
The most popular shortened versions of the 9-point hedonic scale are the 7- and 5-point hedonic scales (Fig. 9.1). In these scales the “dislikeand like extremely” categories have been eliminated. Furthermore, there are practitioners who use a three point hedonic scale. (a). 7-point scale
(6). 5-point scale (with all anchors
Like very much Like moderately Like slightly Neither like nor dislike Dislike slightly Dislike moderately Dislike very much
Like very much Like moderately Neither like nor dislike Dislike moderately Dislike very much
(c). 5-point scale (with end anchors) 0 Likeverymuch 0 0 0 0 Dislike very much
FIG. 9.1. CONTROVERSIAL VARIATIONS OF THE 9-POINT HEDONIC SCALE
Some researchers prefer to use fewer categories in the hedonic scale, since they believe that consumers do not use as many as nine categories to rate acceptance. Some of these researchers believe that consumers may even get confused with too many categories. The 7-point scale has wide acceptance because there are fewer categories, and the end anchors, like/dislike extremely, are eliminated. Some practitioners believe that consumers rarely use the end anchors, since very few products are extremely liked or disliked by consumers. Many of the children’s hedonic scales developed have fewer than 9 categories (Head ef al. 1977; Chen ef al. 1996). Practitioners using fewer categories should be careful with the anchors they use. Most of the research in the development of the 9-point hedonic scale was focused on the anchors, to ensure that each category was equidistant (Peryam
CONSUMER TEST SCALES
151
and Pilgrim 1957). It is therefore inappropriate to use scales where this principle is violated. Figure 9.2 shows inappropriate hedonic scales. Some of the mistakes in these scales are the lack of equidistant categories, unbalanced scales (the number of liking categories is not the same as the disliking categories), and the lack of uniformity in the anchors used.
Like extremely Like very much Like slightly Neither like nor dislike Dislike slightly Dislike very much Dislike extremely
Like extremely Like slightly Neither like nor dislike Dislike slightly Dislike extremely
Like extremely Like very well Like moderately Like slightly Neither like nor dislike Dislike slightly Dislike moderately Dislike strongly Dislike extremely
Like extremely Like very much Like moderately Like slightly Neither like nor dislike Dislike
FIG.9.2. EXAMPLES OF INAPPROPRIATE HEDONIC SCALES
Number of Word Anchors. Occasionally, hedonic scales can be structured without all word anchors (Fig. 9.3 a-d). This practice is definitely recommended when the appropriateness of each word/anchor is questioned. Bendig and Hughes (1953) discussed the effect of the amount of verbal anchoring in scales. Scales with only end anchors are appropriate, as long as these end anchors are true opposites. Sensory professionals should ensure that this requirement is met (Riskey 1986). With the use of only end-scale anchors, the lack of equidistant categories is avoided, and scales can be expanded to more points and d). (e.g., 10 or 11 point scales) (Fig. 9 . 3 ~
152
(4
(b)
(c)
(4
(el
VIEWFQINTS AND CONTROVERSIES IN SENSORY SCIENCE
n
u
dislike extremely
U
O
f
o
u
ri
l
O
0
0
0
0
0
1
2
3
4
dislike extremely
~
U
IJ
111
0
5
6
7
8
n
like extremely
0
like extremely
10
9
neither like nor dislike
I
~
I
-
l
~
dislike dislike dislike dislike extremely very much moderately slightly
(0
II
like very much
0
I
u
U
dislike VCrY much
dislike extrcmcly
LI
I
-
neither like nor dislike
like extremely
I
~
I
-
like like slightly moderately
I
~
I
like very much
like extremcly
I
I
dislike ex1remely
like extremely
FIG. 9.3. EXAMPLES OF ACCEPTED VARIATIONS OF HEDONIC SCALES
Omission of the Intermediate Point. Some practitioners prefer not to use the intermediate point in the hedonic scale (“neither like nor dislike”). This practice is only possible when fewer word anchors are used, especially when only the end anchors are placed on the scale (e.g., Fig. 9.3 a-c). Practitioners who omit the middle anchor believe that this anchor may be biasing to consumers. For example, some consumers who do not want to commit to an answer may select the “neither like nor dislike” category more frequently. Jones er al. (1955) provided evidence that the presence of this category decreased the
CONSUMER TEST SCALES
153
efficiency of the scale since the neutral category encourages a certain degree of complacency in judgments; i.e., people use this category to express a marginal stimulus without cognitive repercussions (Gridgeman 1961). In reporting data using a scale without a midpoint, researchers should not reach any conclusions regarding the meaning of “a midpoint.” For example, a score of five (in a 9-point hedonic scale) should not be interpreted as “neither like nor dislike,” since this scale category was omitted.
Hedonic Line Scale. The 9-point hedonic scale was developed as a category scale. Some practitioners who prefer line scales use a hedonic line scale, shown in Fig. 9.3 e,f. A line scale can be used with other modified scales discussed above (Lawless 1977; Rohm and Raaber 1991; Hough ef al. 1992). A line scale anchored with all the words of the original 9-point hedonic scale (i.e., a structured hedonic line scale), is a 90 point scale, since consumers are able to mark their response anywhere on the scale (Fig. 9.3e). Therefore, scores with decimals are obtained (e.g., score of 7.4). This author’s personal experience is that consumers choose the categories anchored with words or numbers, and not the points in between anchors. In this case the structured hedonic line scale is used as a category scale, like the original 9-point hedonic scale. Some sensorykonsumer scientists prefer to use a totally unstructured hedonic line scale (Fig 9.30. Other professionals oppose this practice, since they believe that this scale is too abstract and consumers have difficulties with its use. Children’s Scales. Children’s hedonic scales are a modification of the 9point hedonic scale. Some of the modifications made for children’s scales include: the use of pictorial faces instead of numbers, of fewer categories (e.g., 5 or 7 categories), and of different anchors. For example, a well-known and extensively used scale is the one described by Kroll (1990). She reported that the proposed modified scale worked better with children than the 9 point or facial scales. The anchors for this published scale are: super good, really good, good, just a little good, maybe good or maybe bad, just a little bad, bad, really bad, super bad. Head er al. (1977), Birch (1979), Kroll(1990) and Chen ef al. (1996) have proposed and used several hedonic scales for children. Other Acceptance/Liking Scales There are other types of scales that sensory professionals use that do not have “liking” anchors. In general, consumers interpret and use them as liking scales. Examples include:
154
.
.
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
love/hate scales acceptablehot acceptable scales quality scales
These scales have been used when practitioners believe that consumers relate better to other words besides liking (lovehate, goodbad) when evaluating certain products. For example, the acceptablehot acceptable and the quality scales can be used in testing products that are generally disliked, such as medications and pharmaceutical products. A manufacturer of cough syrups may want to test two different flavors to find out which one is liked better. In such situations it is advisable to use an acceptablehot acceptable or a quality scale, since differentiation between products may only be obtained with these scales. A hedonic scale may only indicate that all products are disliked, thus fail to show any difference among products. Another variation of a liking scale is the FACT (food action rating scale) developed by Schutz (1965). This scale combines liking, frequency of consumption and motivation related statements. Sensory scientists should be aware that the data from the above scales reflect consumer liking responses, even though other anchors are used. It is also important that sensory scientists carefully design these scales to ensure balanced scales (equal positive and negative categories), appropriate anchors, and equidistant categories. Schutz and Cardello (2001), Rousset and Martin (2001), Bergara-Almeida el al. (2002) have recently published new ideas and/or scales to measure consumer acceptance.
Intensity Scales Used in Consumer Testing There are two types of intensity scales used in consumer questionnaires: absolute intensity scales and just-about-right scales (Fig. 9.4). Not all sensory professionals use intensity scales. Some believe that consumers are unable to rate attribute intensities (Stone and Side1 1993). Some practitioners have concerns not only about the ability of consumers to understand product attributes (see Chap. 10). but about their ability to rate intensities. It is claimed that consumers are not trained and that their intensity scores are misleading, regardless of the scale used. For example, it is well-known that consumers frequently give a low or a high score for some attributes, regardless of their intensity and the scale used. This is the case for attributes that are either desirable or very negative in a product. For example, attributes perceived to be “negative or unhealthy” by consumers are consistently scored too high. Consumers may like the intensity of the attribute (they may even want it higher), but they are compelled to indicate that the intensity is too high. Attributes falling
CONSUMER TEST SCALES
155
in this category are fatty/greasy, sweet, salty, rich, etc. Conversely, desirable and positive attributes in a product may always be rated low. Consumers want higher intensities and any amount is never enough. Examples of these attributes include garnish amount, chocolate chips, buttery, cheesy, moisturizing, fresh, etc.
(a) Absolute Intensity
u
LI
u
n
ri
ii
11
high
II
ti
II
[I
ri
11
WXY
0
1.1
11
low
ri
ru
u
crispy
1
2
3
4
5
6
7
8
9
none
10 high
(b) Absolute Intensity and Liking
r i r i LI
11
r_l
u
n
II 1 1
LI high
low
diclike cxlreniely
ri ri n r i r i
rl
ri
like exlrcmcly
(c) Just- about- right (JAR) scales I1 much too weak
ri definitely no( sour enough
c1 somewhat loo weak
C1 comewhat not sour enough
u Just right
0 Just right
II somewhat too strong
IJ somewhat too sour
[I much too strong
I7 definitely too sour
FIG. 9.4. DIFFERENT SCALE APPROACHES FOR COLLECTING ATTRIBUTE INTENSITY INFORMATION FROM CONSUMERS
156
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
Sensory/consumer scientists should unveil these consumer inconsistencies after several tests have been completed for a given product category. Alternatively, qualitative research techniques and consumer-descriptive (or consumeranalytical) data relationships can shed light and clarify these results. In general, those sensory professionals who ask consumers to rate intensities should be careful in interpreting these intensity results, and use other methods and data to unveil any inconsistencies.
Absolute Intensity Scales. The absolute intensity scales used in consumer testing are uni-polar scales (e.g., 0-10, none to extreme, low to high). They are similar to scales used in other sensory tests (e.g., descriptive tests). These scales can be categorical (e.g., 5 , 9 or 10 point numerical scales, category scales with verbal anchors), or line scales (structured or unstructured) (Fig. 9.4a). Variations of these scales are also acceptable (e.g., 7 point verbal scale). These scales are not bipolar, like the just-about-right scales, and are used to record perceived intensities of product attributes by consumers. Some practitioners criticize the use of these scales. It is argued that the absolute intensity scores provided by consumers do not have much meaning, since consumers do not have reference points for their scoring. Instead, these practitioners prefer to use appropriateness or just-about-right scales. In multi-product tests, it is appropriate to use absolute intensity scales, since consumers can easily record the relative differences of two or more products using these scales. To compensate for the disadvantage claimed for absolute intensity scales they can be used together with liking scales (Fig. 9.4b). The combination of liking and intensity scales offers information similar to that provided by the just-about-right scale. With both scales, the intensity measures provided by consumers can be put into context and easily interpreted (e.g., if the high or low attribute intensity is liked or disliked). Just-About-Right (JAR) Scales. These scales indicate the appropriateness of a given attribute intensity (i.e., the intensity of a product’s attribute relative to the consumer’s perceptual ideal level). The JAR scales are also regarded as a combination of liking and intensity questions. They are bipolar scales, where the mid point is anchored with the category “just about right” and each side indicates high or low intensity of the attribute. Figure 9 . 4 ~shows examples of just-about-right scales. These scales are very popular in industry since they are used for formulation/reformulation guidance (Epler et al. 1988; Vickers 1988; Stepherd et al. 1991). When the outcome of the test indicates that the attribute’s intensity is “just about right” it is assumed that no product changes are needed and that consumers like that attribute intensity. If the scores fall on either side of the
CONSUMER TEST SCALES
157
scale (too low or too high), the results are taken to guide the change needed: to lower or increase the attribute intensity. The main advantages of these scales are: consumers can easily use these scales, they provide direction for formulation and reformulation, and they are easily understood by researchers and management. There are however, several concerns with these scales.
Type of Data. It is important to study the generated data carefully, because of the nature of JAR scales, mainly their bipolarity. Some practitioners tend to treat these data as interval data and proceed to use parametric statistics (i.e., use of averages, standard deviations, ANOVA, t-test) without assessing the properties of the raw data. This practice is very dangerous, since the data may in fact not be uni-modal. Figure 9.5 shows the danger involved in averaging these data without preliminary inspection of their properties. The bi-modality shows two populations, one that found the attribute intensity too high, and the other one that found the intensity too low. The data should not be averaged, since this “average” is misleading. This metric does not represent the average population’s response, since none of the participants chose the category “just right.” This simple plotting or a similar simple analysis is recommended prior to any analysis of JAR data to unveil this bi-modality. If the data seem to be uni-modal, it is appropriate to average the results and use parametric statistics. When the data are bi-modal, as shown in Fig. 9.5, the data have to be treated differently. In this case, some practitioners simply report frequency distributions or the percentages across the three sections of the scale (i.e., how many people indicated the attribute intensity to be just about right, too high, and too low). Alternatively, non-parametric statistical tests are applied (Stone and Side1 1993; Lawless and Heymann 1998). The ASTM Task group El8 04 26 was formed to develop a standard guide to discuss the use, benefits and risks associated with the use of JAR scales. One of the most important tasks of this group is to present the various approaches to analyzing JAR scale data, such as McNemar, chi square, ANOVA, regression, penalty/drop analysis tests, etc. (ASTM 2003). Just-About-Right Category. The JAR scale is unique because of its anchors, especially the middle anchor. There are advantages and disadvantages arising from this unique middle category. In one respect, it provides information on the appropriateness of an attribute. In another, the middle category may be a venue for uncommitted consumers to choose the “just right” category more frequently. Concerns regarding the middle category have been discussed by several authors (Jones et al. 1955; Gridgeman 1961).
158
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
Too little
JUSI RIGHT
Too much
JAR SCALE CATEGORIES FIG. 9.5. EXAMPLE: DISTRIBUTION OF RESPONSES USING A JAR SCALE (5 POINTS)
Sensory scientists should be aware of this quandary. It is recommended that consumers are made aware of the need to use all of the scale categories in scoring products. This can be accomplished in the orientation prior to the test.
Anchors. The use of JAR scales requires that the attribute anchors be carefully selected. There are two common versions of this scale (Fig. 9 . 4 ~ ) . Some practitioners like to use opposite anchors (e.g., soggy and crispy), while others prefer to use the same term for both sides of the scale. Those practitioners using opposite attributes should be very careful when structuring the scale. Many inappropriate versions of the JAR scale exist because of the misuse of anchors. For example, incorrect end anchors are much too salty/much too sweet, or much too soft/much too crispy. These pairs of attributes are not opposites. True opposite worddterms (e.g., soft/firm, lightldark, dulhhiny) should be chosen when a JAR scale with opposite words is structured. Bi-polar scales with opposite word/terms are easily structured for appearance and texture attributes. Often, it is more difficult to find opposite words for flavor and fragrance attributes. In this case the use of the same word for both end anchors is recommended. Number of Categories. One of the main disadvantages of the JAR scales is the number of scale categories. The five point just-about-right scale is the version most frequently used. This scale has two categories for the high and two categories for the low intensity section, respectively. Two intensity categories
CONSUMER TEST SCALES
159
cannot provide sufficient information on intensity. Therefore, sensory practitioners should not use JAR scales to collect data on attribute intensities or product differences. Absolute intensity scales should be used if the objective is to study intensity differences and similarities among products. The JAR scales should be used mainly to investigate the intensity of a product’s attribute relative to the consumer’s perceptual ideal level. Often, many researchers use the 3-point JAR scale (or merge the scores for the low and the high scale categories, respectively). These researchers understand that JAR scales should not be used to record differences, but rather appropriateness of attribute intensities. Therefore, the information for the 3 categories (i.e,, too low, just right, too high) is considered sufficient. In summary, is it best to use JAR or absolute intensity scales? The choice depends on the researcher’s preference. Both have advantages and disadvantages. A sensory professional should keep in mind the limitations of each type of scale in designing consumer questionnaires, and choose the best scale for hisher applications.
Purchase Intent Scales Purchase intent scales (definitely would buy - probably would buy - may or may not buy - probably would not buy - definitely would not buy) are not used by many sensory professionals, yet frequently used by market researchers. However, often sensory professionals are asked to include purchase intent scales in their questionnaires. Sensory practitioners do not typically use this scale since it requires that the consumer express a decision on the product, which goes beyond the realm of the product’s sensory properties. Generally speaking, other factors have to be taken into consideration in order to answer a purchase intent question, such as price, convenience, advertising, promotions, etc. (Vickers 1993; Solheim and Lawless 1996; Guinard and Marty 1997; Bower and Turner 2001). The response obtained for purchase intent questions when no other information is provided to the consumer, may in fact reflect only a liking response. In most cases, statistics are not used to analyze purchase intent scales. Practitioners are mainly interested in frequency distributions, particularly for the “top two” boxes (definitely would buy, probably would buy).
REFERENCES ASTM. 2003. Standard guide for the use, benefits and risks associated with the use of JAR scales. ASTM Pennsylvania (in preparation).
160
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
BENDIG, A.W. and HUGHES, J.B. 1953. Effect of amount of verbal anchoring and number of rating scale categories upon transmitted infonnation. J. Experimental Psychol. 46, 87-90. BERGARA-ALMEIDA, S., APARECIDA, M. and DA SILVA, A.P. 2002. Hedonic scale with reference: performance in obtaining predictive models. Food Quality and Preference 13(1), 57-64. BIRCH, L.L. 1979. Dimensions of preschool children’s food preferences. J. Nutr. Education 11, 77-80. BOWER, J.A. and TURNER, L. 2001. Effect of liking, brand name and price on purchase intention for branded, own label and economy line crisp snack foods. J. Sensory Studies 16,95. CHEN, A.W., RESURRECION, A.V.A. and PAGUIO, L.P. 1996. Age appropriate hedonic scales to measure food preferences of young children. J. Sensory Studies ZI, 141-163. EPLER, S., CHAMBERS, E. IV. and KEMP, K.E. 1988. Hedonic scales are a better predictor of just-about-right scales of optimal sweetness in lemonade. J. Sensory Studies 13, 191-197. GRIDGEMAN, N.T. 1961. A comparison of some taste-test methods. J. Food Sci. 16, 171-177. GUINARD, J. and MARTY, C. 1997. Acceptability of fat modified foods to children, adolescents and their parents: Effect of sensory properties, nutritional information and price. Food Quality and Preference 8(3), 223-23 1. HEAD, M.K., GIESBRECHT, F.G. and JOHNSON, G.N. 1977. Food acceptability research: comparative utility of three types of data from school children. J. Food Sci. 42, 246-251. HOUGH, G., BRATCHELL, N. and WAKELING, I. 1992. Consumer Preference of Dulce de Leche among students in the United Kingdom. J. Sensory Studies 7, 119-132. JONES, L.V., PERYAM, D.R. and THURSTONE, L.L. 1955. Development of a scale for measuring soldiers’ food preferences. Food Res. 20, 512-520. KROLL, B.J. 1990. Evaluating rating scales for sensory testing with children. Food Technol. 44(11), 78-80, 82, 84, 86. LAWLESS, H.T. 1977. The pleasantness of mixtures in taste and olfaction. Sensory Processes I, 227-237. LAWLESS, H.T. and HEYMANN, H. 1998. Sensory Evaluation of Food. Chapman & Hall, New York. OLSEN, S.O.1999. Strength and conflicting balance in the measurement of food attitudes and preferences. Food Quality and Preferences 10(6), 483-494.
CONSUMER TEST SCALES
161
PARDUCCI, A. and WEDELL, D.H. 1986. The category effect with rating scales. Number of categories, number of stimuli, and method of presentation. J. Experimental Psychol.: Human Perception and Performance 12, 496-512.
PERYAM, D .R. and PILGRIM, F. J . 1957. Hedonic scale method of measuring food preferences. Food Techno1. 11, 9- 14. RISKEY, D.R. 1986. Use and abuses of category scales in sensory measurement. J. Sensory Studies I, 217-236. ROHM, H. and RAABER, S. 1991. Hedonic spreadability optima of selected edible fats. J. Sensory Studies 6, 81-88. ROUSSET, S. and MARTIN, J.F. 2001. An effective hedonic analysis tool: Weaklstrong points. J. Sensory Studies 16, 643-661. SCHUTZ, H.G. 1965. A food action rating scale for measuring food acceptance. J. Food Sci. 30, 365-374. SCHUTZ, H.G. and CARDELLO, A.V. 2001. A labeled affective magnitude (LAM) scale for assessing food liking/disliking. J. Sensory Studies 16, 117- 159.
SOLHEIM, R. and LAWLESS, H.T. 1996. Consumer purchase probability affected by attitude toward low-fat foods: liking, private body consciousness and information on fat and price. Food Quality and Preference 7, 137-143. STEPHERD, R., FARLEIGH, C.A. and WHARFT, S.G. 1991. Effect of quality consumed on measures of liking for salt concentrations in soup. J. Sensory Studies 6, 227-238. STONE, H. and SIDEL, J.L. 1993. Sensory Evaluation Practices. Academic Press, New York. VICKERS, Z. 1988. Sensory specific satiety in lemonade using a just right scale for sweetness. J. Sensory Smdies 3, 1-8. VICKERS, Z. 1993. Incorporating tasting into a conjoint analysis of taste, health claim, price and brand for purchasing strawberry yogurt. J. Sensory Studies 8. 342-352.
MAXIM0 C. GACULA, JR. A dialogue between our basic senses and the objects around us is quantifiable. That quantity is what we perceived which is affected by several psychological factors making it a difficult quantity to measure. Of course, measurement of subjectivejudgments has always been achallenge, often creating controversies in the 1900s that resulted in the development of psychophysical laws as we know them today (Weber, Fechner, Thurstone, Stevens). Edited books by Carterette and Friedman (1974) and McBride and MacFie (1990) are
162
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
valuable reading materials on this subject defining the roles of psychological studies in the development of today’s sensory evaluation methods. It is interesting that at the present time we are still facing issues on subjective judgments, but in different contexts - specifically the application of psychophysical judgment and measurement to the real world. In his book, Poulton ( 1989) reviewed scientific evidence about the different psychological biases encountered when quantifying subjective judgments. Of these biases, affective scales are most predisposed to context effects, such as the centering bias. In this chapter, Moskowitz and Muiloz discussed the three most commonly used affective scales in sensory evaluation and consumer testing: hedonic, purchase intent, and just-about-right (JAR) scales. Moskowitz’s section touches on both the underlying psychophysical and application aspects, whereas Muiloz’s section deals with the sensory evaluation aspects of the three types of affective scales.
Hedonic Scale The classic hedonic scale is exemplified by the 9-point like/dislike scale (Peryam and Girardot 1952; Jones e? al. 1955). Years of study and applications demonstrated that this scale is optimum for gathering hedonic data. It is simple to administer and easily followed by the respondent. In practice, the questions often raised on the hedonic scale, as pointed out by Moskowitz and Mufioz, concern the balanced nature of the like/dislike categories, the number of categories, and the unequal psychological distance between the category intervals. There is no published scientific evidence that an unbalanced hedonic scale is better than the balanced scale. It can be argued that decreasing the number of dislike categories would be somewhat equivalent to leading the respondents to score the sample as prescribed by the researcher. The unbalanced scale also assumes that many, if not all, of the sensory attributes in a product are liked, which is obviously incorrect. If we consider the hedonic scale as a “ruler”, then the balanced nature of the scale should be left alone. What is the optimal number of categories on a scale? Scaling issues have been studied over the last 50 years, both by psychologists and by consumer product sensory scientists. Raffensperger el al. (1956) suggested an 8-point balanced scale to measure tenderness. In general, rating scales with a small number of categories do not provide enough range for discrimination in many consumer products. On the other hand, scales with a large number of categories may exaggerate perceived differences and destroy the nature of the scale continuum.
CONSUMER TEST SCALES
163
Another point of importance in practice is the amount of “transmitted information” resulting from scales with varying number of categories. The measure of “transmitted information” is one way to look at discrimination ability between different stimuli rated on a scale. Unfortunately, there is no known published work on this subject in the consumer product industries. Therefore we resort to using psychological studies done over 40 years ago, hoping that the results can be generalized. Garner and Hake (1951) developed the statistical method to measure transmitted information from category scales. Later, Bendig and Hughes (1953) used the method to compare rating scales comprising 3, 5 , 7 , 9 , and 11 categories, respectively. In this study a total of 225 college students rated themselves on their knowledge of 12 foreign countries. Results indicated an increase in the absolute amount of transmitted information as the number of scale categories increased. In particular, the 9-point category generated more transmitted information than did scales comprising a smaller number of categories, and was recommended by these researchers. In fact, more transmitted information resulted with a scale comprising 11 categories, but the results were less reliable. Furthermore, increased verbal anchoring of the rating scale (verbally anchored either in the center, at both ends, or at both center and ends) resulted in a slight increase in the information transmitted by the scale. This is a valuable finding, because in Sensory Science most scales are verbally anchored. In particular, let us look at Fig. 9.6, which presents re-graphed data from results reported by Jones er al. (1955) for the hedonic scale. High values for transmitted information indicate discriminating responses to the food items included in the study. That is, the data show distinct and different distributions of responses for the various foods, with a high degree of agreement among the ratings for each. Figure 9.6 shows that the potential amount of information increases with increasing number of category intervals from a low of 5 to a high of 9 categories, respectively. In relation to the issues in this chapter, the Jones el al. (1955) study concluded the following. (1) Longer scales, up to nine intervals, tend to be more sensitive to differences among foods. These results support the findings of Bendig and Hughes (1953) favoring the 9-point scale. (2) Elimination of the neutral category appears to be beneficial. (3) Balance, i.e., an equal number of positive and negative intervals, is not an essential feature of a rating scale.
V I E W I N T S AND CONTROVERSIES IN SENSORY SCIENCE
164
0.28 0.27 0.26 0.25 0.24 0.23 0.22 0.21 r-
4
5
6
7
8
9
1
10
1
I
Number of Scale Categories I
I
FIG. 9.6. TRANSMITTED INFORMATION IN RELATION TO NUMBER OF SCALE INTERVALS 5. 7 (lower), 9: Balanced scale with neutral category. 6, 8 (lower): Balanced scale with no neutral category. 7 (upper): Unbalanced no neutral category. 8 (upper): Unbalanced with neutral category.
Are the results invariant using different numbers of categories? The answer is yes and supports the current practices as reviewed by Muiioz. The data shown in Fig. 9.7, re-graphed from Gacula (1987). show that the 7-point category scale exhibits a linear relation with the 9-point scale, on an individual panelist basis. Thus, similar direction of results would be obtained using the 7-point and the 9point scale, respectively. Using mean ratings, Parducci and Perrett (1971) obtained similar findings for the 6- and 9-point category scales for judging sizes of squares. However, one must consider the skewness of the stimulus distribution in relation to the number of categories on the scale. Parducci and Wedell (1986) reported that skewness varies inversely with the number of categories (Figs. 9.8,9.9,9.10). In these figures, the upper line signals negative skewness and the lower line signals positive skewness. The lines should be relatively straight for low degrees of both types of skewness of the stimulus distribution, and in these figures the 9-point scale is favored over the 3 and 5-point scales.
CONSUMER TEST SCALES
165
7-point vs. 9-point Scale
l 8
9
i
2 1 1
2
3
4
5
6
7
8
9
9-point Scale
FIG. 9.7. REGRESSION LINE FOR 4 PANELISTS ILLUSTRATING THE LINEAR RELATION BETWEEN THE 7- AND 9-POINT SCALES The diagonal line indicates the one-to-one correspondence between scales.
5.0
p '3
2 5 g
4.5 4.0 3.5
3.0 2.5
2.0 1.5 1.o
1
2 3 4 Log stimulus width
FIG. 9.8. THREE-POINT SCALE
5
166
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
1
2
3
4
5
Log Stimulus Width FIG. 9.9. FIVE-POINT SCALE
9 8
5
4
$ 3
L
2 1 Log Stlmulus Width
L
FIG. 9.10. NINE-POINT SCALE
Another important issue in practice indicated by Muiioz is the distance between category intervals. The distance can be computed as outlined in the book by Gacula and Singh (1984). Consider the data in Table 9.1. “Data 1” was extracted from the estimated scale values of the paper by Jones er al. (1955) and “Data 2” was copied from the Gacula and Singh book. Data 1 with over 900 respondents show that the ends of the scale have larger interval widths than the middle portion of the scale. For Data 2 (N = 96 respondents), and except for the 7 to 8 intervals, the width tends to be equal. Similar results of unequal
CONSUMER TEST SCALES
167
interval width were reported by Cloninger and Baldwin (1976) studying the 5 , 7, 9, 10, and 15-point scales. Graphically showing the results in Table 9.1 in terms of psychological and physical scales will explain clearly the practical significance of unequal interval width issue. Since the scale values were derived from +4 (greatest like) to -4 (greatest dislike), the psychological scale was obtained as: 5 derived scale value. Figure 9.11 depicts the graph for Data 1 generally showing equality of width interval from 3 to 7 categories.
+
TABLE 9.1. CATEGORY INTERVAL WIDTH FOR THE HEDONIC SCALE.
Q
10 9
0
7
$
6
El 5 0
a
4
= 3 n $ 2
0
1
2
3
4
5
6
7
8
9
10
Physical Scale
FIG. 9.11. LINEAR RELATIONSHIP BETWEEN THE PHYSICAL SCALE AND PSYCHOLOGICAL SCALE FOR HEDONIC Due to unequal interval width, the two scales did not coincide.
The graph for Data 2 appears in Fig. 9.12. The psychological scale was calculated as: [ 1 + cumulative average width]. Although it is linear, the interval
168
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
width is shorter. Being linear, the physical scale will still validly show the perceived order of the product stimuli. For information purposes, the result for the 7-point off-flavor scale used in shelf-life studies is given in Fig. 9.13. The psychological scale was also calculated as: [l + cumulative average width].
1; $
6 5 5 4
'8)
1
1
2
3
d 2 0
1
3
2
4
5
6
7
8
9
10
Physical Scale FIG. 9.12. LINEAR RELATION BETWEEN PHYSICAL SCALE AND PSYCHOLOGICAL SCALE FOR HEDONICS
0
1
2
3
4
5
6
7
Physical Scale
FIG. 9.13. LINEAR RELATION BETWEEN PHYSICAL SCALE AND PSYCHOLOGICAL SCALE FOR OFF-FLAVOR (1 = none, 7 = very strong off-flavor).
CONSUMER TEST SCALES
169
Note the almost equal width interval for categories 2 through 6 ranging from 0.52 to 0.61 (Fig. 9.13). Both ends of the scale have wider interval width of 0.88 between categories 1 and 2, 0.84 between categories 6 and 7. These observations are similar to those of Data 1 for the hedonic scale. When we apply statistics to sensory data, we assume that the subjective distances between category intervals are approximately equal, and that they are not necessarily equal to 1 as the assigned physical scale shows. Experience shows that in practice, product evaluation scores are in the 3 to 7 ranges for a 9-point scale and similarly for other scales. That is, extreme categories are least used, known in practice as “endpoint effects.” This effect results in respondents tending to avoid using the extremes of rating scales. In addition, products tested in consumer studies are mostly acceptable. Thus the scores are likely to occur in the middle portion of the rating scale. As a result of these effects, it is sometimes stated that the 5-point scale is functionally a 3-point scale, a 9-point scale is functionally a 7-point scale, etc. In this section of this chapter, we have shown that the hedonic scale is linear, optimal number of categories is 9, and the middle portion of the scale has approximately equal interval widths. Therefore this author (MCG) suggests that the current issues on the 9-point category hedonic scale should be put to rest.
Just-About-Right Scale Muiioz described the various aspects and ramifications of the JAR scales. These ramifications are mostly based on intuition, ease of use of the scale, ease of understanding the results, and objectives of the study. A JAR scale is mostly used by market research and sometimes in tests run with the Research Guidance Consumer Panel. The JAR scale can be construed to measure both intensity and acceptance, in the sense that the just right category located in the middle of the scale denotes an acceptance rating for a sensory attribute, whereas above and below the scale denotes intensity ratings. The reference point during evaluation is internal to the respondent. This is likely the reason why the JAR scale is sensitive to context effects, i.e., centering bias or range effect. A JAR score from one sample likely will change when this sample is compared to a sample in another experiment. Thus a JAR result is quite often experiment-specific. See Johnson and Vickers (1988) for a good example of centering bias in determining the optimum level of sweetness in lemonade. From the standpoint of design of experiments, context effects can be minimized by including a control or reference sample in a multiproduct comparison using the balanced incomplete block design with reference sample in every block (Gacula 1978; Gacula and Singh 1984). The internal reference “ideal” product indicated by Moskowitz would be present in all comparison subsets of the incomplete block design.
170
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
The basic question about JAR scale is its relation to the hedonic scale, which is a direct measure of acceptance (like/dislike). The JAR scale is considered by practitioners to be a measure of both intensity and acceptance in a single response (Stone and Side1 1993). A study that answers this question is reported by Epler et al. (1988) using sweetness in lemonade. Results show that the hedonic scale was a better predictor of optimal sweetness. Other studies (McBride 1985; Shepherd et al. 1985) have shown agreement in results between the hedonic and JAR scales under different experimental conditions. In my view, I would recommend the hedonic scale followed by descriptive analysis in order to determine the magnitude or intensity of sensory attributes. In other words, the strategy would be to determine the acceptance and intensity separately rather than combining acceptance and intensity in a single response as in the JAR scale. Such an approach eliminates the problem of interpretation of the “top two boxes” or “bottom two boxes.” These are the conventional measures usually used in the statistical analysis of the JAR scale. These measures do not provide degree of intensity useful to the product developers. Furthermore, the “just-right” category could be interpreted as a reference point for intensity (McBride 1985) rather than acceptance. Perhaps this is one of the reasons why JAR and the hedonic scale often do not agree. Another important issue is the method of statistical analysis, a complex topic beyond the scope of this book.
Purchase Intent Scale As stated by Moskowitz and Muiioz, this scale is mostly used in marketing research, where it is most appropriate. There is no added utility in using this in guidance panel tests, because the purchase intent scale is generally asked after product development has been completed, the resulting formula optimized, and the prototype submitted for a marketing research consumer test. As generally known, purchase intent responses are influenced by several factors, e.g., price sensitivity of the consumer, advertising, packaging, availability, brand loyalty, product quality, etc. Because of these factors and other behavioral influences, the criterion used by respondents for rendering purchase intent judgments and its integration is often unknown. As such, sensory scientists assisting product developers should not be involved in gathering purchase intent information. Marketing researchers and other behavioral model scientists are most qualified to obtain this information, and such information obtained in this way would be more actionable by management.
CONSUMER TEST SCALES
171
REFERENCES BENDIG, A.W. and HUGHES, J.B. 1953. Effect of amount of verbal anchoring and number of rating scale categories upon transmitted information. J. Experimental Psychol. 46, 87-90. CARTERETTE, E.C. and FRIEDMAN, M.P. (ed.). 1974. Handbook of Perception: Psychophysical Judgment and Measurement, Vol. 11. Academic Press, New York. CLONINGER, M.R., BALDWIN, R.E. and KRAUSE, G.F. 1976. Analysis of sensory rating scales. J. Food Sci. 41, 1225-1228. EPLER, S.,CHAMBERS, E. IV. and KEMP, K.E. 1988. Hedonic scales are a better predictor than just-about-right scales of optimal sweetness in lemonade. J. Sensory Studies 13, 191-197. GACULA, JR., M.C. 1978. Analysis of incomplete block designs with reference sample in every block. J. Food Sci. 43, 1461-1466. GACULA, JR., M.C. and SINGH, J. 1984. Statistical Methods in Food and Consumer Research. Academic Press, San Diego. GACULA, JR., M. 1987. Some issues in the design and analysis of sensory data: Revisited. J. Sensory Studies 2, 169-185. GARNER, W.R. and HAKE, H.W. 1951. The amount of information in absolute judgments. Psychological Review 58,446-459. JOHNSON, J. and VICKERS, Z. 1988.Avoiding the centering bias or range effect when determining an optimum level of sweetness in lemonade. J. Sensory Studies 2, 283-292. JONES, L.V., PERYAM, D.R. and THURSTONE, L.L. 1955.Development of a scale for measuring soldier’s food preferences. Food Res. 20, 5 12-520. MCBRIDE, R.L. 1985. Stimulus range influences intensity and hedonic ratings of flavour. Appetite 6, 125-131. MCBRIDE, R.L. and MACFIE, H.J.H. 1990. Psychological Basis of Sensory Evaluation. Elsevier Applied Science, London. PARDUCCI, A. and PERRETT, L.F. 1971.Category rating scales: Effects of relative spacing and frequency. J. Experimental Psychol. Monographs 89,
427-452. PARDUCCI, A. and WEDELL, D.H. 1986. The category effect with rating scales, number of categories, number of stimuli, and method of presentation. J. Experimental Psychol.: Human Perception and Performance 12,
496-512.
172
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
POLLACK, I. 1952. The assimilation of sequentially encoded information. Amer. J. Psychol. 6, 266-267. POULTON, E.C. 1989. Bias in Quantifying Judgments. Lawrence Erlbaum Associates, London. RISKEY, D.R. 1986. Use and abuses of category scales in sensory measurement. J. Sensory Studies 1, 217-236. SHEPHERD, R., FARLEIGH, C.A., LAND, D.G. and FRANKLIN, J.G. 1985. Validity of a relative-to-ideal rating procedure compared with hedonic rating. In: Progress in Flavour Research, J. Adda, ed. Elsevier, Amsterdam. STONE, H. and SIDEL, J.L. 1993. Sensory Evaluation Practices. Academic Press, San Diego.
CHAPTER 10 ASKING CONSUMERS TO RATE PRODUCT ATTRIBUTES HOWARD R. MOSKOWITZ Attributes Attributes constitute the fundamental form of information by which the researcher taps into the mind of the consumer. We often think of attributes as a list of terms to be rated by a panelist. We might better and more productively think of attributes as the keys to the mental kingdom of sensory attribute perception. Through attributes we can understand how the panelist perceives the product. Through attributes we can learn rather quickly why a particular product is acceptable, or not acceptable. Through attributes we can discover holes in a product category - locations in the sensory “space” where there do not yet exist products, and where it may be possibly advantageous to locate a product. Attributes come in a variety of shapes, sizes, and forms. The attributes typically cover the different senses. Occasionally the attributes may be quite general (e.g., overall taste), sometimes a little bit more specific, but only slightly so (e.g., sweetness), and occasionally far more specific (e.g., the specific sweetness profile corresponding to aspartame). The topic of creating sensory attributes enjoys a long, controversial, conflict-ridden history, dealt with in a number of ways in this volume, and especially in the topic of “experts versus consumers.” The controversy is reminiscent of Jonathan Swift’s “Battle of the Books,” a well-known piece of literature in the history of ideas. We will talk about consumers in this chapter. However, it is worth contrasting experts versus consumers, and then moving on to consumers and attributes. Experts are individuals who have been taught a descriptive language. An expert can be trained - no one, normally sensitive to stimuli and cognitively able to understand the nuances of language, can be really excluded from the category of trainable for expertise. Typically the training for an expert comprises teaching the individual a new descriptive language, namely attributes, along with some type of anchoring system, namely a set of references, so that the terms are the same from individual to individual, and occasionally a scaling system. We can contrast this with consumers. The conventional consumer study presents the respondent with a list of terms or phrases, and instructs the respondent to rate these terms on a specific scale. There is no training, although there may be an explanation. Most consumers report little or no problem with 173
174
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
the task, unless the t e r n are meaningless. The lack of report does not mean that the consumer respondents understand the terms and the scale. Rather the lack of report means that the task is performed rather easily. With this introduction to the consumer world, let us now go more deeply into the attributes that a consumer might use in a study.
What Do We Know About Sensory Attributes? We would do well to look at sensory attributes from a historical perspective. A century ago the basic objective of Sensory Science, then called psychophysics, with regard to sensory attributes was to understand the basic attributes of perception. In the early part of the Twentieth Century, the Structuralist School of psychology led by E.B. Titchener looked to the introspective examination of one's own sensations in order to identify sensory attributes. These attributes were the basic ways by which the human being was thought to structure and categorize his sensory experience. It was assumed at that time that through introspection one could identify the basic dimensions of perception. Thus one reads not only of the four basic tastes, the pitch and loudness of tones, but also of the complexity of olfactory and visual surface perception. For some senses, like hearing, it seemed easy to disentangle the different sensory qualities. For others, like olfaction, this simple attempt proved frustrating. Not that there was a mass of unanalyzable sensations. Just the opposite - there was a plethora of different sensations and it was unclear which ones were related to each other. The history of psychological and philosophical research into sensory attributes was eloquently and forcefully presented by E.G. Boring, in 1942 in his famous book "Sensory And Perception In The History Of Experimental Psychology" (Boring 1942). Psychological research has led to different schools of research and practical application. A host of experimental psychologists after Titchener spent years trying to understand the different sensory attributes of perception, with the hope that this knowledge would provide them with an understanding of how the sensory system operates. Much of this history appears in the psychologyjournals of the first part of the Twentieth Century. For the most part, beyond the act of introspection as a valid way to understand what we perceive, the applied researcher has little to gain from this extensive literature. We can pick up the history of research into food and other related attributes by looking at the different types of studies and papers that appear during the middle third of the Twentieth Century. The reader should note that these dates are deliberately left vague, since in science there are rarely clear demarcations separating one intellectual period from the next.
ASKING CONSUMERS TO RATE PRODUCT ATTRIBUTES
175
The Last Third of the Twentieth Century, and the Proliferation of Attribute Systems - What Can We Learn from Consumer Research? The last third of the Twentieth Century led to two vastly different groups of individuals interested in the sensory attributes of foods. One camp of researchers was interested in the creation of measurement systems for perception, and for private sensory experience. These were at once psychophysicists in experimental psychology, statistical psychologists interested in mapping and modeling perception from a quantitative viewpoint, and food scientists, interested in understanding the features of a food product beyond the simplistic notion of flavor strength. Many of these systems were developed with the expert in mind, but eventually many of the attributes found their way into consumer studies through the efforts of marketing researchers. We can learn a great deal about consumer-relevant attributes by understanding how these pioneers in descriptive analysis developed their systems. The real history of attributes in product testing and sensory evaluation comes from the food scientists, and their professional brethren, whether the brewmaster, the winemaker, and even the perfumer. All of these professionals focused on the attributes of sensory perception. They were especially interested in the sense of taste and smell, and to a lesser degree in the senses governing the perception of texture. These three groups of people needed to describe a complex perception, without the benefit of a standard system. These three groups of people spent a great deal of time arguing about the basic dimensions, but just as much time creating descriptive systems for quantifying these perceptions. It should come as no surprise that each of these professional groups developed not one, but several different systems for classifying the perceptions, and then scoring them. Brewmasters often proffered their own systems - although there were attempts to create comprehensive systems. Each winemaker had his own system, often unwittingly parodied today when people act like experts at a wine tasting. Various descriptive systems for wine, with different attributes, may talk about the same characteristic. Unless one has had experience in tasting wine and working with these different systems, one will not know that terms from different systems actually refer to the same sensory attribute. Finally, each perfume company has its own proprietary system for describing the attributes of a fragrance. Some of this comes from the culture of the individual perfume house and some naturally arises from the fact that the perfume is a complex olfactory sensation that begs for and needs extensive description. What is fundamental to these three groups of people is their continuing struggle to come up with a list of agreed-upon terms that could be used later on to communicate with product development. Consumer researchers face the same problem.
176
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
Researchers have tried to cope with the need for description. Some researchers, like the late Andrew Dravnieks, attempted to create general, or large-scale systems for the description of olfactory attributes, recognizing that these would have to be practical, and could be based on theory (Dravnieks 1974). The large-scale systems as general descriptors were doomed to enjoy a relatively short lifetime, because no general system can account for the vast number of different odor sensations in a single product category. That is, a system that attempts to be general, and to apply across categories, cannot easily work within one category, where there may be just as many descriptors, albeit for a limited set of products. Other researchers have just given up hope of finding a general set of descriptor terms, either for all odor perception, or even for a certain product category. Rather than relying upon a fixed set of terms, these researchers have adopted a more pragmatic stance. They have recognized that the descriptive profile is simply a tool by which to create a numerical “signature” of a product. To these researchers the unspoken organizing principle is that as long as the panelist can behave in a manner that is similar to other panelists, one can analyze the sensory profile for an odorant, as obtained from a group of panelists. One can then compare two or more odorants in order to determine the degree to which these specific odorants differ from each other. The nice thing about the pragmatic approach is that it works, even though there is no claim about any theoretical justification for including or excluding specific odor descriptors. Today (early 21st Century) much of the brouhaha about attributes has abated. Consumer researchers are far more concerned with using the attributes than with finding the most appropriate attributes. Even those who create long lists of attributes recognize that the selection of attributes is just as subjective as the use of attributes, and that there are no primary attributes for a specific product category. The attributes are used to understand the product, always with the recognition that: (1) the list is probably incomplete - it’s hard to get every single attribute pertaining to a product, (2) the use of the attributes is subjective and may require training - we all live in different perceptual worlds, or at least show significant inter-individual variability, and
(3) the ultimate use of the attributes is pragmatic - usually for statistical reasons, or to communicate something about the stimulus.
ASKING CONSUMERS TO RATE PRODUCT ATTRIBUTES
111
Quantifying Attributes Modern day psychophysics as presented by S.S. Stevens and his associates and students (Marks 1974; Stevens 1975) is founded on the premise that the untrained human being can act as a valid judge of sensory magnitude. The psychophysics literature is replete with articles that show the functional relation between the physical stimulus and the rated intensity. Indeed, direct scaling of sensory magnitude by whatever scale one uses has been demonstrated again and again to generate lawful relations between the physical intensity of the stimulus and the rated intensity. The psychophysical studies cover children (Bond and Stevens 1969), brain damaged individuals (Jones et al. 1978), and individuals in a variety of cultures (Moskowitz e? al. 1975). The studies need not use numbers, but rather can use other stimuli whose magnitude can be adjusted so that the perceived sensory intensity of the stimulus so adjusted matches the sensory intensity of the criterion stimulus being judged. Studies have used tones and noises of different sound pressure levels (Moskowitz 1971(a)) and physical forces applied to a handgrip dynamometer (Berglund 1974). The studies have used limited scales with defined endpoints, or open-ended scales (magnitude estimation) (Marks 1968). There have been literally thousands of publications on the topic of perceived sensory magnitude, beginning first, as in all science, with a study of the basic rules, namely psychophysical functions relating the physical stimulus level to the perceived level. Once those early studies picked the “low lying fruit,” the later studies used psychophysical scaling to study changes in sensory perception with different conditions (e.g., mixtures, body state such as hunger and satiety, etc.). Other problems were addressed by these scaling studies, such as the impact of different types of instruments that presented the stimuli (Meiselman 1971), different solvents for odorants (Moskowitz, unpublished), and the effects of range of the stimuli on the psychophysical function (Marks 1974). All in all, direct scaling over the past 50 years has proved that untrained consumer panelists can validly judge sensory magnitude of stimuli, whether they test a range of products in an evaluation session or they rate only one product. Given this history, it continues to come as a great surprise to this author (HRM) that many individuals, whether practitioners in Sensory Science or data users respectively, aver that the untrained consumer panelist simply cannot judge the sensory magnitude of a product. The statement is never put as boldly as that - it is simply assumed that without training the panelist cannot validly rate the perceived magnitude of a stimulus. The statement is all-too-often made in rather a hushed, secretive voice, rather than being openly stated in a public meeting. This issue, of course, is closely intertwined with the belief that the only valid measuring instrument is a well-trained panelist or “expert,” and that a consumer
178
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
can only report whether or not he likes or dislikes a product. The data from classical psychophysics show that panelists, trained or untrained, can rate both the sensory intensity of a stimulus and the overall liking of that stimulus, whether in the same session (Moskowitz and Klarman 1975) or in different sessions (Moskowitz 1971a). Furthermore, the data clearly show that the ratings of sensory magnitude accurately track the physical concentration (viz., high linear correlation), whereas the ratings of sensory liking show an inverted U shaped curve (at least for the liking of sugar sweetness versus concentration; Moskowitz 1971b). All in all it appears that much of the belief that a panelist cannot validly rate sensory intensity comes from a bias on the part of the researcher, rather than being based upon scientific evidence.
Can Panelists Scale Multiple Attributes in a Single Session, at the Same Time? It is clear from the psychophysical literature that panelists can rate one attribute at a time and that their ratings “track” physical concentrations. Nevertheless, the issue continues to arise as to whether or not during the course of a product assessment the untrained panelist can shift attention from attribute to attribute in a complex stimulus. The argument goes along the line of the following: “it may be easy if the stimulus is uni-dimensional, such as the sweetness of sugar in water, but it may be hard if the stimulus comprises multiple aspects that must be attended to in order.” Even Stevens (1968) firmly believed that the untrained panelist could not easily shift his attention from sweetness to liking. Stevens’ preconceived notions influenced the author (HRM) to run parallel studies, wherein the respondent would evaluate sensory attributes in one study, and hedonic attributes in the other. The issue was not ability to scale, but rather the expected confounding effects from cognitive factors. Evidence for this overriding belief in “one attribute at a time” can be seen from the nature of the prodigious output by Stevens and his co-workers and students, since the 1950s. Most of the traditional psychophysical scaling looking for sensory laws of perceived magnitude investigated one attribute at a time. Looking back 35 years, it is rather remarkable how the opinions of one’s professor could so influence the design of these early studies on the sense of taste. One disproof of Stevens’ conjecture comes from rather straightforward experiments run by the author in the fields of taste and smell. For taste, the studies comprised mixtures of two artificial sweeteners in 42 combinations, as well as five glucose stimuli. The panelists rated sweetness and bitterness. Panelists had no problem rating the sweetness and bitterness of the components. Furthermore, the data showed clearly that for the five glucose solutions interspersed and thus tested randomly among the 42 stimuli comprising artificial
ASKING CONSUMERS TO RATE PRODUCT ATTRIBUTES
179
sweeteners, panelist ratings generated the same reproducible curve for glucose sweetness versus concentration. This is clear evidence that panelists can rate mixtures and simple stimuli on multiple attributes, and their ratings track known variations in the stimulus, despite the randomization of the stimuli done to disguise order of magnitude of concentration (Moskowitz 1972; Moskowitz and Klarman 1975). A second type of study comes from the analysis of mixtures of odorants in vapor phase (Moskowitz 1982; Moskowitz ef al. 1977). In one study (Moskowitz 1982) there were two odorants, each at four concentrations (16 mixtures, four pure stimuli comprising Odorant A, four pure stimuli comprising odorant B). The panelists rated the 24 stimuli in randomized order on a large number of attributes. The ratings showed clearly that panelists could easily rate the 24 stimuli, that their ratings correlated highly with odorant concentration, and that panelists had no problem properly distinguishing among the components by means of their attribute ratings. That is, one could see clearly from the ratings of the four pure stimuli of each odorant component that panelists could distinguish both odor quality and odor intensity. The notion that the panelist could assess more than one attribute at a time was not, of course, limited to the domain of psychophysics. Rather, multiattribute rating systems abounded in food science. The Flavor Profile (Caul 1957), the QDA system (Stone ef al. 1974), and the Spectrum system (Meilgaard ef al. 1987), all called for multiple attributes to be rated. Furthermore, Bartoshuk and colleagues (e.g., Bartoshuk 1975), working on the perception of complex tastes, always had panelists rate the amount of the four basic tastes in a test stimulus.
Is There a “Proper Scale” for the Sensory Attribute? Before the 1950s the issue of scaling perceptual magnitude was treated in an off-hand manner. Scaling was considered to be of secondary importance. As long as the scale was sufficiently large to accommodate multiple stimuli of clearly different magnitude, it seemed neither particularly relevant nor fruitful to discuss the nature of the scale used. For the most part the research prior to the work by S.S.Stevens looked for scales as convenient numerical tools. Most of the analysis used linear regression and correlation, so that the major role of scaling was simply to get a set of numbers. The actual scale values and their properties were not germane. Psychophysical studies of scale properties, and the search for the proper sensory metric and for “sensory laws” passionately attacked this laissez-faire attitude. The 1950s and early 1960s were formative years in modern day psychophysical scaling. A great deal of effort was expended to understand the way panelists assign numbers to match the perceived intensity of a stimulus. The psychophysicists searched for the appropriate” or “proper” scale for measure-
180
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
ment, and did not really care much about the applications of such scales in the every-day world. There was, however, interest in erecting a proper scale of brightness and another of loudness because these were relevant scales of interest to the Optical Society of America, and to the Acoustical Society of America, respectively. In the research done a half century ago, four of the “burning” issues were (1) the number of points that the scale should have, (2) whether the scale needed to be labeled at each point, (3) whether the panelist needed a reference standard against which to adjust the ratings, (4) whether the panelist needed to have an assigned first number (the modulus), etc. The researchers of 50 years ago had an agenda. It was not scaling, but rather the search for sensory regularities, sensory laws. Since the researcher used an array of physical stimuli with known magnitudes, it was easy to determine how well the panelist ratings matched the physical stimulus. Going one step further, the researcher could also develop a mathematical model relating physical magnitude and subjective magnitude. Here are four clear outcomes from this research that impact on the producttesting world today:
Valid Ratings Emerged Without Undue Training. Panelists certainly could scale the perceived magnitude of a stimulus. Of that there was no doubt. Panelists expressed no problems when asked to assign the ratings. The panelist had to be instructed as to what to do (viz., the “rules of the game”) but there was no indication that the panelist needed extensive training. Lawful Psychophysical Relations Emerged. If the panelist used one of the methods (called magnitude estimation), the outcome was an equation of the form: Log Magnitude Estimation = Log(B) + n ( b g Stimulus Intensity). This is written more frequently as the “power law” of sensory magnitude: ME = B(Stimu1us Intensity)”. By itself the power law is not particularly interesting, since anyone can fit logarithmic equations to the data as long as there are no zeros. What was remarkable, however, was that the slope of the equation, B, was a reproducible parameter from study to study for the same physical stimulus (e.g., the loudness of sounds, with physical magnitude being sound pressure; the sweetness of sucrose with the physical magnitude being molarity). Possibility of a True Ratio Scale. Given the repeatability of this power function (or log-log) relation, many psychophysicists began to accept the fact
ASKING CONSUMERS TO RATE PRODUCT ATTRIBUTES
181
that the magnitude estimation method was, in fact, the correct method for scaling sensory magnitude (italics HRM). There would be later discussions about the nature of the so-called “ratio scale” values generated by magnitude estimation. Spreading the Faith. Other researchers, outside of psychophysics, did not necessarily accept the method of magnitude estimation as the correct scaling procedure. However, the lack of universal acceptance certainly did not deter psychophysicists from adopting magnitude estimation quite widely, for many uses. Should There Be a Single Standardized Scale of Intensity €or All Sensory Attributes? One of S.S. Stevens’ favorite aphorisms was that “standardization is the last refuge of the unimaginative.” Psychophysicists look for general laws, and vary their methods to seek these laws. It is knowledge that is the goal, and standardization that is the necessary evil. In contrast, business-oriented researchers are very fond of standardization because the creation of standards reduces uncertainty, and promotes harmony. Sensory scientists, belonging to an area of business that only in the past three decades has been recognized, now face the question of methods standardization. It is three decades since Sensory Science became a committee (E-18)in the American Society For Testing And Materials (ASTM). One of the continuing issues of committee E-18 is the standardization of language. Scales, however, have not been a major issue up to now. In the 1950s Franklin Dove (1953) proposed that sensory magnitude be standardized in terms of specific ratios from threshold, in units that he called DUnits. The D-Units make sense at first glance, but in fact are hard to develop and to interpret. Experimental psychologists, especially modem day psychophysicists, use the magnitude estimation scale in a wide array of applications, primarily for the study of sensory functioning. Magnitude estimation has been used in business applications to measure sensory intensity (Moskowitz 1983), but enjoys less attention now because it is difficult to administer in the field requiring a lengthy orientation, and difficult to analyze requiring specialized programs. If there are to be standard scales of sensory magnitude, then what should they be? Magnitude estimation has been used extensively (Fishken 1988; Moskowitz 1983), but in actual execution it causes field problems, and takes longer to execute than one might like, especially if one must operate within a budget. Although psychophysicists occasionally use loudness of noise as a measure (Stevens 1966), and sensory scientists often use unmarked line scales
182
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
(Baten 1946), these are impractical scaling procedures. For practical purposes the scales should be numerical, not analog. It is hard to carry around sound potentiometers to present noises of different loudness, and almost impossible to field studies with large numbers of panelists, all of whom must adjust their own machine so that the loudness of the sound matches the characteristics of the products. The electronics alone make this prohibitive. Although sensory scientists swear by unstructured line scales as being bias free, it is hard to see how data can be efficiently and inexpensively obtained by unstructured line scales unless every panelist uses these line scales in conjunction with a computer. We are left then with the fixed-point category scale. Over the past 20 years the author has used anchored 0-100 scales, and has found little problem with panelist comprehension. Everyone is familiar with test scores that range from 0-100, or with percents. Furthermore, the scale is robust, providing reliable answers if the same products are tested by the same panelists at different times. The scale is also valid, because the ratings of sensory magnitude “track” the physical changes in the product, in studies where the panelist rates a set of products that have been systematically varied on one or several physical characteristics. Other practitioners feel that panelists cannot use the 100-point scale, and instead believe that a standardized scale should comprise fewer points, and perhaps points that are all labeled, rather than featuring anchors at both ends. Labeling appears to be a favorite practice among sensory scientists, although there is no clear evidence that labeling does anything to the ratings. A 9-point scale, a 7-point scale, etc., is more attractive to these practitioners, if a standardized scale were to be adopted. The only problem is that with such few scale points and with many stimuli there are too few scale points to cover the sensory variation. Imagine the case with 12 clearly different stimuli. With a 9point scale the panelist must assign two perceptibly different stimuli the same scale values because the panelist has run out of numbers. In this scenario, with few scale points, the 12 stimuli would be assigned different numbers only if each of the panelists made different errors (viz., assigned different pairs of stimuli the same scale value). With all panelists acting identically (assigning the same two stimuli the same scale value) there would be 12 stimuli allocated to only 9 scale points, and no scale differences between certain pairs of stimuli. This is clearly a paradox - viz., perfect reliability, known discrimination, yet failure to evidence this discrimination. The upshot of the attempt to standardize the rating scales is that, in the long term, each practitioner will adopt a scale that is most convenient to use. The odds are high that there will be no single agreed-upon scale, simply because in Sensory Science there is no consensus as to what is a valid scale of magnitude and what is an invalid scale. In contrast, for most other scientific disciplines
ASKING CONSUMERS TO RATE PRODUCT ATTRIBUTES
183
there are scales that are clearly valid, and scales that one can argue to be invalid for one or another reason.
REFERENCES AMERINE, M.A., PANGBORN, R.M. and ROESSLER, E. 1965. Principles of Sensory Evaluation of Food, Academic Press, New York. BARTOSHUK, L.M. 1975. Taste mixtures; Is mixture suppression related to compression? Physiology and Behavior 14, 3 10-325. BATEN, W.D. 1946. Organoleptic tests pertaining to apples and pears. Food Res. 11, 84-85. BERGLUND, U. 1974. Dynamic properties of the olfactory system. Annals New York Academy of Sciences 237, 17-27. BOND, B. and STEVENS, S.S. 1969. Cross modality matches of brightness to loudness by 5 year olds. Perception and Psychophysics 6, 337-339. BORING, E.G. 1942. Sensation and Perception, In: The History Of Experimental Psychology, Appleton Century Crofts, New York. CAUL, J.F. 1957. The profile method of flavor analysis. Advances In Food Res., 1-40. DOVE, W.F. 1953. A universal gustometric scale in D units. Food Res. 18, 427-453. DRAVNIEKS, A. 1974. Personal communication. FISHKEN, D. 1988. Marketing and cost factors in product optimization. Food T e ~ h o l 42(11), . 138-140. JONES, B.P., BUTTERS, N., MOSKOWITZ, H.R. and MONTGOMERY, K. 1978. Olfactory and gustatory capacities of alcoholic and Korsakoff patients. Neuropsychologia 16, 323-337. MARKS, L.E. 1968. Stimulus Range, Number of categories, and the form of the category scale. American J. Psychol. 81, 467-479. MARKS, L.E. 1974. Sensory Processes: The New Psychophysics. Academic Press, New York. MEILGAARD, M., CIVILLE, G.V. and CARR, B.T. 1987. Sensory Evaluation Techniques, pp. 119-142, CRC Press, Boca Raton, Fla. MEISELMAN, H.L. 1971. Effect of presentation procedures on taste intensity functions. Perception and Psychophysics 10, 15- 18. MOSKOWITZ, H.R. 1971(a). Intensity scales for pure tastes and taste mixtures. Perception and Psychophysics 9, 51-56. MOSKOWITZ, H.R. 1971(b). The sweetness and pleasantness of sugars. Amer. J. Psychol. 84, 387-405. MOSKOWITZ, H.R. 1972. Perceptual changes in taste mixtures. Perception and Psychophysics 11, 257-262.
184
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
MOSKOWITZ, H.R. 1982. Psychophysical scaling and optimization of odor mixtures. In: Odor Quality and Chemical Structure, H.R. Moskowitz and C. Warren, eds. ACS, Symposium Series No. 148, 23-56, Washington. MOSKOWITZ, H.R. 1983. Product Testing And Sensory Evaluation Of Foods: Marketing And R&D Approaches. Food & Nutrition Press, Trumbull, Conn. MOSKOWITZ, H.R., DUBOSE, C.N. and REUBEN, M.J. 1977. Flavor chemical mixtures: a psychophysical analysis, In:Flavor Quality, Objective Measurements, R. Scalan, ed. pp. 29-44, American Chemical Society, Washington, D.C. MOSKOWITZ, H.R. and KLARMAN, L. 1975. The tastes of artificial sweeteners and their mixtures. Chemical Senses and Flavor, 41 1-422. MOSKOWITZ, H.R., SHARMA, K.N., KUMARIAH, V., JACOBS, H.L. and SHARMA, S.D.1975. Cross cultural differences in simple taste preferences. Science 290, 1217-1218. STEVENS, S.S. 1966. Matching functions between loudness and ten other continua. Perception and Psychophysics 2 , 5-8. STEVENS, S.S. 1968. Personal communication. STEVENS, S.S. 1975. Psychophysics: An Introduction to its Perceptual, Neural and Social Prospects. John Wiley & Sons, New York. STONE, H., SIDEL, J.L., OLIVER, S., WOOLSEY, A. and SINGLETON, R. 1974. Sensory evaluation by quantitative descriptive analysis. Food Technol. 28, 24-34.
ALEJANDRA M. -0Z The design of consumer questionnaires requires careful consideration to both the questiondattributes and the scales to be included. A detailed discussion of consumer scales and scale issue was presented in Chap. 9. This chapter deals with the subject of attributes. Specifically, the type of questions that consumers are asked in quantitative tests. The topic is very relevant since in a quantitative study, the researcher selects the attributes to be included in a consumer questionnaire. Consumers who participate in a quantitative test are asked to rate these pre-selected attributes. Thus the strategy to collect product information from consumers is very different than the one followed in qualitative studies. In qualitative studies, consumers discuss products using their own language, number and type of terms/words. This opportunity is not provided to the consumer in quantitative tests, except in sections where open-ended questions are included. Therefore, the discussion of attributes is key
ASKING CONSUMERS TO RATE PRODUCT ATTRIBUTES
185
for quantitative tests, because attributes are selected by the researcher and consumers merely address them. First, practitioners are advised to use all available tools and historical documentation to decide on the attributes to include in a questionnaire. This means, that for new product categories and whenever possible, focus groups should be conducted to obtain some information regarding the consumer lexicon used to describe the product category or products tested. Second, for existing products, previous consumer questionnaires and studies should be reviewed and taken into consideration. Third, and very important, the project objective and the nature of the products should be assessed to decide on the attributes to include in the questionnaire. Practitioners have different philosophies regarding the inclusion of attributes in a consumer questionnaire (Stone and Sidel 1993; Husson er al.2001). In general, there are three different philosophies among sensory professionals regarding attributes to be included in a consumer questionnaire. (1) Asking Only Overall Liking Questions. Some professionals believe that consumers can and should only be asked overall liking (Stone and Sidel 1993). This type of consumer questionnaire is rather simple, since it has only one question: Overall liking. Other general liking questions may be added to the questionnaire, such as liking of overall sensory dimensions, liking of appearance, flavor, fragrance, texture and skinfeel. Professionals who include only overall liking questions without any attribute questions believe that consumers are not capable of providing any attribute liking and intensity information. They consider that consumers provide answers to attribute information without understanding the meaning of such attributes. It is also believed that adding any attribute questions to the questionnaire biases the consumer. Specifically, these professionals believe that the overall liking questions are influenced by the attributes mentioned, and that consumers may then focus in on attributes to which they may not otherwise have paid attention. In addition, sensory professionals holding this point of view consider that consumers are unable to understand many product attributes, and thus may provide unreliable information. Researchers belonging to this school of thought rely on descriptive analysis results. The relation between the expert’s descriptive data and the consumer’s overall liking rating provides product attribute information and research guidance to product developers or chemists. (2) Asking Overall and Attribute Liking Questions. Other professionals may go a step further, and ask other liking questions on specific attributes, such as liking of the strawberry flavor, liking of the crispness, liking of the absorbency, etc. (Daw 1997). No attribute intensity questions are asked (e.g., intensity of strawberry flavor, crispness intensity, etc.).
186
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
Professionals holding this second opinion, and thus who include overall liking questions and liking of attributes in the consumer’s questionnaire, do not believe that attributes will bias the overall consumer liking. More information is obtained from consumers in these tests, since information is obtained about the product attributes consumers like and dislike. These professionals, however, do not include attribute intensity questions. Their belief is that consumers are unable to provide intensity information, since they are not trained. They believe that consumers do not have any frame of reference for using an intensity scale and provide intensity/strength attribute information, Therefore, it is considered that the intensity information obtained from consumers is unreliable. These professionals also use descriptive data to obtain information on the levels that consumers like or dislike.
(3) Asking Overall and Attribute Liking, and Diagnostic Questions. Another group of professionals includes the attributes: overall liking, liking of general dimensions (e.g., overall texture liking), liking of specific attributes and attribute intensity questions. Alternatively, there are many researchers who ask consumer attribute intensity questions (Bower and Whitten 2000; Moskowitz 2001; Prost et al. 2001; Fillion and Kilcast 2002). Attribute intensity questions, also known as diagnostics, address perceived intensity of sensory attributes (e.g., thickness intensity, chocolate flavor intensity, oilylgreasy intensity, clean and softness intensities, shininess intensity, etc.). We professionals who ask overall liking questions, liking of attributes, as well as attribute intensity information have beliefs contrary to the above two points of view, discussed in (1) and (2) above. a.
The overall liking response of consumers does not get appreciably biased with the addition of attributes. Most consumers are very determined about their likes and dislikes. Their responses do not change when attributes are asked.
b.
Consumers are able to understand product attributes, as long as this terminology has been well-thought out and properly selected. Attributes chosen should be consumer terms, i.e., terms that consumers understand and use to describe the product. No complex and/or technical attributes should be asked (i.e., fracturability, resilience, benzaldehyde).
c.
The majority of consumers are able to use a scale, especially if they have participated in a brief orientation where the use of the scale has been explained and demonstrated, and if the scale is properly structured (see Chap. 9). There will be the expected high variability in consumer attribute intensity data. However, the variability will not be any smaller than that
ASKING CONSUMERS TO RATE PRODUCT ATTRIBUTES
187
encountered in other consumer data sets (overall liking, liking of attributes, etc.), since people like product characteristics at different levels. d.
The most complete information is obtained when consumers answer all of these types of questions. In addition, if descriptive analysis is used to understand consumer data, any misleading consumer information from attribute questions can be unveiled (see Chap. 7 and 21).
In summary, the three points of view (and in actuality research philosophies) described above have followers. The majority of sensory professionals fall in group 3, and ask overall and attribute liking, and diagnostic questions.
REFERENCES BOWER, J.A. and WHI'ITEN, R. 2000. Sensory characteristics and consumer liking for cereal bar snack foods. J. Sensory Studies 15, 327-345. DAW, E.R. 1997. Relationship between consumer and employee responses in research guidance acceptance tests. In: ASTM Manual 30. Relating consumer, descriptive and laboratory data to better understand consumer responses. A.M. Muiioz, ed. ASTM Press, West Conshohocken, Penn. FILLION, L. and KILCAST, D. 2002. Consumer perception of crispness and crunchiness in fruits and vegetables. Food Quality and Preference 13(1), 23-39. HUSSON, F., LE DIEN, S. and PAGES, J. 2001. Which value can be granted to sensory profiles given by consumers? Methodology and results. Food Quality and Preference 12(5-7), 291-296. MOSKOWITZ, H.R.2001. Interrelations among liking attributes for apple pie: Research Approaches and Pragmatic Viewpoints. J. Sensory Studies 26, 373-391. PROST, C., LE GUEN, S., COURCOUX, P. and DEMAIMAY, M. 2001. Similarities among 40 pure odorant compounds evaluated by consumers. J. Sensory Studies 16(6), 551-565. STONE, H. and SIDEL, J.L. 1993. Sensory Evaluation Practices. Academic Press, New York.
MAXIM0 C. GACULA, JR. The Sensory Evaluation department is typically requested by the product developer to conduct a Research Guidance Panel (RGP) test. Likewise, a similar
188
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
request can be made to Marketing Research (MRD) when the prototype is finalized for a large consumer test. These scenarios are commonly seen in consumer product industries. In both RGP and MRD, the first step is critical. It consists of deciding what attributes to ask the consumers, and what measurement scale to use in order to obtain the required information. Dr. Moskowitz did a very good review of sensory attributes that combines both the psychological and Sensory Science aspects, and this author has nothing to add. The review provides sensory scientists some guidelines in making choice of attributes for particular product categories. As Moskowitz pointed out, the choice of attributes is often pragmatic. Perhaps, the greatest challenge for sensory scientists and marketing researchers is the choice of measurement scales. Affective scales were discussed in Chap. 9. Moskowitz touched on trained panelists and multiple attributes per evaluation. I and many sensory practitioners agree that panelists can perform multiple attributes per evaluation on both intensity and liking in one questionnaire. Moskowitz provided early psychophysical studies that untrained panelists can do intensity evaluation. In fact, a recent study by Gonzalez et al. (2001) on cheese hardness showed no difference in average ratings between untrained and expert (trained) panels. It is important that sensory research on this subject should be published, as results using various products and product categories have differed. Thus results of studies comparing untrained versus trained panel cannot be generalized, because results are highly dependent on product types, experimental situations, discrimination ability of the panel, and other factors. See Chap. 7 for more discussion of this subject. It is to be reiterated and emphasized that the purpose of training is to make the panelist uniform and consistent in their use of measurement scales as exemplified by various descriptive analysis techniques. I agree that magnitude estimation which generates a ratio scale is a valid measurement of sensory magnitude for a given stimuli or product. However, it was not accepted in practice because of tradition, the need to change the way of thinking by both management and research personnel, and logistic problems in field implementation as indicated by Moskowitz. Magnitude estimation is eloquently discussed in various publications (Moskowitz 1985; Lawless and Heymann 1998; Stone and Side1 1993). An excellent point that Moskowitz brought up is the number of categories in the scale. This becomes an issue because it is difficult to institute a standard scale like a “ruler.” Magnitude estimation is a classic standard scale with true origin on the scale like a ruler, but was not accepted in practice for reasons stated above. So we resort to various types of scales with different lengths. One of the practices that we see today is the reduction of the length of rating scale, a practice with which I do not agree. As Moskowitz pointed out, how can one accommodate 12 stimuli or products in the 9-point scale? In some of my work,
ASKING CONSUMERS TO RATE PRODUCT ATTRIBUTES
189
I occasionally put a horizontal slash between categories of a 9-point scale in order to give 18 points on the scale. This provides more options for the panelists during evaluation, especially when the products being assessed have various degrees of sensory differences. As to the statistical treatment of sensory data, there should be no standard method. The type of statistical analysis should be based on the experimental design, data collection, product types, inherent characteristics of the data, and other factors. Most statisticians usually perform exploratory analyses using two or more methods, as time permits, then decide on the method to use that meets or approximately meets the basic assumption of the statistical techniques and data characteristics. An example of this is the use of multivariate methods, such as partial least-squares, ridge regression, etc., to account for data correlation that exist in most sensory data. Perhaps, the closest to having a standard method of analysis is the use of paired t-test for analysis of paired comparison experiments. Muiioz described sensory practices in asking consumers for product evaluation: (1) asking only overall liking, (2) asking overall liking and specific attribute liking, and (3) asking diagnostic questions in addition to overall and specific attribute liking. As observed, these practices have not been explicitly covered in most Sensory Science textbooks. I fully concur with Muiioz that practice (3) has the most followers and the author is one of them. There is scientific evidence in both psychological and Sensory Science studies that consumers can effectively perform practice (3) in one questionnaire (Moskowitz 1981; Moskowitz 1995; Moskowitz 1997). The sensory evaluation textbook by Lawless and Heymann (1998) shows an example that combines hedonic and attribute intensity in one questionnaire. It should be noted that experience with the product and attribute understanding are important factors for the successful use of this questionnaire. Because the resultant data will be highly correlated due to hedonic and intensity contrasts, an appropriate statistical method that accounts for multicollinearity is suggested, i.e., principal component regression, partial least-squares regression, ridge regression.
REFERENCES GONZALEZ, R., BENEDITO, J., CARCEL, J. and MULET, A. 2001. Cheese hardness assessment by experts and untrained judges. J. Sensory Studies 16, 277-285. LAWLESS, H.T. and HEYMANN, H. 1998. Sensory Evaluation of Food, Principles and Practices. Chapman & Hall, New York. MOSKOWITZ, H.R. 1981. Sensory intensity versus hedonic functions, Classical psychophysical approaches. J. Food Quality 5 , 109-138.
190
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
MOSKOWITZ, H.R. 1985. New Directions for Product Testing and Sensory Analysis of Foods. Food & Nutrition Press, Trumbull, Conn. MOSKOWITZ, H.R. 1995. One practitioner’s overview to applied product optimization. Food Quality and Preference 6, 75-81. MOSKOWITZ, H.R.1997. A commercial application of RSM for ready to eat cereal. Food Quality and Preference 8, 191-201. STONE, H. and SIDEL, J.L. 1993. Sensory Evaluation Practices. Academic Press, San Diego.
CHAPTER 11 QUESTIONNAIRE DESIGN HOWARD R. MOSKOWITZ The questionnaire is perhaps the most important part of the study. The questionnaire constitutes the means by which the research interacts with the panelist, and provides the panelist’s response to the stimuli. Depending upon one’s intellectual history and research predilections, questionnaire development may be excruciatingly exact or loose. To make matters worse, there are no set rules for good or bad questionnaires. There are rules of thumb, and indeed in some situations the questionnaire clearly fails to acquire the data needed to make the decision. There are also loudly voiced opinions, occasionally, but not often accompanied by hard fact. This section presents the author’s (HRM)point of view. There is little in the way of published information on questionnaires, although from time to time there might be an article on the nature of attributes. Thus, a lot of the wisdom in creating questionnaires emerges from the individual’s experience in the field, rather than from published scientific literature.
Are There Rules of Thumb for Questionnaire Design? When a person tests a food the sensory inputs occur in a fixed sequence, with visual appearance coming first, followed by aroma, followed by taste/flavor, followed by texture, and aftertaste. If the panelist has to work the product manually, then texture by feel may also emerge at the same time or shortly after visual appearance. Similar order occurs for health and beauty aids as well. Although it is not mandatory to follow the sequence of attribute appearance, it makes a good idea to do so, simply because the panelist otherwise has a difficult time remembering the attributes. If the attributes appear in the questionnaire in a jumbled order, then the panelist first must experience the product in its entirety, and afterwards rates the attributes in the order that they appear on the questionnaire. That review is difficult, especially since the panelist must focus attention on different attributes, in an unnatural sequence. Given this sequence of activity the more prudent approach lays out the attributes in the order that they naturally appear in order to minimize re-tasting and maximize the quality of the data obtained. Furthermore, the attributes for appearance should comprise all of the appearance-related terms so that once the panelist finishes with appearance he need not return to appearance, but can move forward to aroma, then to taste/flavor, and then to texture, respectively. In all 191
192
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
cases it is prudent to observe a person experiencing the product, and then array the attributes in the questionnaire so that they follow the sequence of activities. The foregoing only applies to sensory characteristics experienced while the panelist is in direct contact with the product. For other attributes (e.g., of an image nature), the panelist may answer these after having experienced the product. Indeed, for some attributes the panelist may need to experience the product several times before being able to rate the product on the attribute. An example of this is “efficacy” in an underarm deodorant. One experience with the deodorant may not suffice to provide the necessary information.
Where Should the Liking Rating Be Asked in the Questionnaire? In most consumer research the key question is the evaluative question which in simplest terms is “good or bad.” In other situations the operative phrase is “did the product pass or fail.” Where should this overall attribute be placed? Should it be placed at the start of the evaluation, for example just after the first bite when the panelist has had an opportunity to perceive the appearance, the aroma, the taste/flavor and the texture, respectively? Should the overall rating appear at the end of the questionnaire, after the panelist has had an opportunity to attend to all of the product characteristics, and can now devote full attention to the product? If there has been any “most frequently asked” single question about attributes and questionnaires, then this might well be that question. The topic arises again and again and again, often interminably. It never seems to go away. Nor does it appear to be cogently answered. It may well serve as the source of heat for many an argument. No one appears to be afraid of proffering an opinion. In truth there is no “right answer.” Ask a dozen practitioners to place the liking question in the “right place” in the questionnaire and more than likely one will find that each practitioner places the liking question in a different place. Furthermore, each practitioner has a good reason why the liking rating is placed where it is placed. Less dogmatic researchers will own up to the fact that they really don’t know where the liking rating should be located. Probably the best answer to this question is that the overall rating, e.g., liking, purchase, can be put virtually anywhere in the questionnaire, provided that the panelist has had the opportunity to sense, although not necessarily to rate, the different sensory characteristics of the product. If there is the fear of changes in overall liking as the panelist proceeds through the evaluation (e.g., for chewing gum, whose taste changes with ongoing chews) then the prudent researcher will instruct the panelist to rate overall liking at various time intervals. No solution is without its critics. If the panelist rates all of the attributes first and only then rates overall liking, then the critic can argue that the panelist may be biased from the ratings assigned prior to the overall rating. If the
QUESTIONNAIRE DESIGN
193
panelist rates liking first, and subsequently rates all of the other attributes, then the critic can argue that the panelist will try to “justify” the overall rating by having all other attribute ratings, and especially attribute liking ratings, confirm the overall rating. In market research terms this is called the “preference justification effect,” and rears its head when the panelist must make a preference decision between two products. It seems hard to justify a preference for one product overall if the other product is preferred on all of the attributes. Hence the panelist, ever trying to be rational and consistent, will justify the preference judgment by rating the preferred product higher on many evaluative criteria.
How Long Should a Questionnaire Be? Questionnaires, like government bureaucracies, have a way of growing larger and larger over time. If one takes an historical perspective, then it appears that the early questionnaires were quite short, often limited to questions about panelist overall liking and a few key attributes. Psychophysical research uses short questionnaires - often one or two questions, requiring the panelist to attend to a specific part of the external environment, and to focus on one or two responses. The survey research profession, and especially market researchers who execute product tests in the commercial environment, exhibit a strong, and indeed increasing militant tendency to work with much larger questionnaires. It is usual for a product questionnaire to contain upwards of 20 attributes, and questionnaires comprising 40, 50,60 or even 70 attributes are not uncommon. Length may be a surrogate for emotional assurance that valid results will emerge from the study. There is no optimal length of a questionnaire. Since, however, the questionnaire is the key link to the consumer (the user interface) and the conveyor of information (the communication channel), there is the continuing tug-of-war between making the questionnaire short, thus pleasing the panelist, versus making the questionnaire long, thus acquiring a great deal of information. In the worst of situations the questionnaire comprises a laundry list of attributes, many of which are particularly well thought out. For instance, in an attempt to cover all of the attributes the researcher may end up asking the panelist to return again and again to the same attribute, expressed in two, three or four different ways. If truth were known, nothing so irritates a panelist as answering the same question, posed different ways. In the most obsessive situations the researcher may end up having the panelist evaluate an attribute in three ways - amount of the attribute, liking of the attribute, and a directional evaluation (too much, just right, too little). Whether this array of questions adds substantially more information is moot. We do know, however, that this triptych of questions rapidly becomes very repetitive, very annoying, and sometimes counter-
194
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
productive as the irritated panelist refuses to continue the task, and either walks out or stops paying attention and gives random ratings instead. There is little research on the absolute length of the questionnaire and how it affects the quality of the data. We know, however, that panelists become bored with longer questionnaires when the stimulus stays the same. Panelists also become bored if they must answer the same question posed in different ways. Panelists may pay more attention to many products each with its own short questionnaire than with one or two products tested with a long questionnaire - even when the total time is the same. When developing questionnaires, therefore, the researcher should try to think about the task from the panelist’s point of view. Are there too many questions? Are the questions redundant? Are the questions meaningful? Does an interviewer have to explain the questions or are the questions self-explanatory? To the extent that the researcher can make the task fun, lighten the panelist’s load, and obtain information quickly and painlessly the researcher will have created a better questionnaire.
Consumer Fatigue - Questionnaire Versus Product Effects? We often hear that the panelist is fatigued in a study, and simply cannot test more than a few products or concepts. We dealt with this above in the section on sensory fatigue, and how many products can a panelist really test. This section deals with fatigue caused by the questionnaire. First of all, if there is true sensory fatigue, then one should not continue the test. It makes no sense. Yet in more ordinary circumstances panelists don’t appear to exhibit much fatigue. Sensory research also shows that the notion of “sensory fatigue” occurs when the sensory system is stressed far more than occurs under normal situations. For a person to lose the sense of smell, even temporarily, albeit for even a few seconds, requires the person to be exposed to a constant, unwavering stimulus for an extended period. This constancy of stimulation cannot happen when the person drinks a beverage, unless the beverage is made to flow at a constant rate and temperature over the tongue. The constancy of stimulation for odor must expose the panelist to the odorant in a room filled with a relatively high concentration of the odorant. This constancy of stimulation again does not occur in ordinary flavor evaluation, unless the panelist sits in an enclosed chamber and the odor stimulus is made to fill the room. What we are seeing in these cases of fatigue is a response to a boring situation. It is well-known that children complain of fatigue when they don’t want to do something. Yet, let a friend come over and this very fatigued child arises, filled with energy, to play with abandon for several more hours. Similarly, if a person is interested in what he is doing (e.g., eating many itemat a banquet) there is rarely, if ever, a report of sensory fatigue.
QUESTIONNAIRE DESIGN
195
Let us now generalize this boredom and fatigue to panelist-questionnaire interactions. What happens when a panelist says he has lost his sensitivity? Much of it is boredom and anger. The fault with many questionnaires lies in their repetitiveness, as well as in the way that they are administered. The task seems interminable, so the panelist complains. If the interviewer reads the attributes from a questionnaire in a monotonous voice, then the task becomes even more boring, and the panelist complains sooner. The prospect of listening to the same redundant questions again and again suffices to irritate even the most willing panelist. For the most part, it would appear that the fatigue reported is really a case of boredom.
ALEJANDRA M. m O Z
The design of a consumer questionnaire requires special attention, since it is one of the most critical aspects of a consumer test. Often, the importance of this task is overlooked. The design of a sound questionnaire will determine the quality of a consumer test and data. Lawless and Heymann (1998) suggest to make a flowchart in designing a questionnaire, which can be very detailed and include all skip patterns, or it may simply list the general issues in order. Consumers are paid to complete a test to .evaluate test product(s) and express their opinions. Therefore, to get paid, they are asked to complete a questionnaire. This means that consumers will complete that questionnaire regardless of the order, the type, and the clarity of these questions. In addition, most of the time consumers are forced to answer all questions in a questionnaire. Therefore, it behooves the sensory professional to invest sufficient time in designing the best questionnaire to obtain the best data quality. Lawless and Heymann (1998) present a good discussion of the topic and cover a section on rules of thumb for questionnaire construction. There are controversies and discussion points regarding the structure and types of questions in a consumer questionnaire. A discussion of some of these points follows. Position of the Overall Liking Question
This question is asked very frequently. Should the overall liking question be positioned first or last in a questionnaire? There is also a debate about the effect of the position of this question on the data obtained. This issue becomes important only when attribute questions are asked. In general, there is no definitive answer as to where the overall question should be positioned in the questionnaire. The decision depends on the type of
196
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
product, the way it is applied/consumed, and the perspective of the sensory professional. Belson (1981). Sorensen (1984), Fowler (1995), Earthy er al. (1997) and Levy and Koster (1999) discuss important issues related to questionnaire design, position and number of attributes.
Positioned as First Question. Many sensory professionals prefer to place the overall liking attribute as the first question in the questionnaire. The rationale behind this preference is: The overall liking question is the most important question. Therefore, if placed first, the consumer will carefully and with hidher full concentration answer this question. If placed as the first question, the consumer will provide his/her first reaction to the product, which is the most realistic and true consumer response desired. If placed first, the overall response is not influenced by the attributes/ diagnostic questions asked later in the questionnaire. This viewpoint may only be true for the first product evaluated. Ensuing products, will be influenced by attribute questions asked about the first product.
Positioned as Last Question. Other practitioners prefer to position the overall question last. This practice is endorsed principally by those professionals who believe that the consumer should pay attention to and answer all attribute/diagnostic questions before deciding how much a product is liked overall. Another case where it may be better to position the overall question last, is when consumers should manipulate, use/apply a product before a decision is (or can be) made on liking. Such is the case in some foods like chewing gums, products with lingering basic tastes or feeling factors (bitter and hot products, such as chocolates, peppermint, salsas and other hot products, etc.), and personal or home care products, such as lotions and creams, hair styling agents, surface cleaners, etc. Consideration to the position of the overall question is important for multiproduct tests. When more than one sample is evaluated in a monadic sequential presentation, after the first product is evaluated, subsequent ratings are affected by earlier products seen and the attributes that have been rated. The effects of the attributes only can be overcome by having the overall question at the end of the questionnaire, so that the influence of the attribute ratings affects all products equally (ASTM 1998).
QUESTIONNAIRE DESIGN
197
Order of Attribute Questions The two considerations taken into account to decide on the order of attributes in a questionnaire are: the order in which the attributes are perceived, and their importance in the product/test. In general, attributes need to be addressed in the order in which they are perceived to avoid re-tasting or re-applying products. Appearance characteristics should be asked first, since the product is consumed or used when assessing other attributes and no sample may be left to answer the appearance questions. Also for consumers specifically, and untrained people in general, the appearance has an effect on other sensory attributes. Appearance should be evaluated first, before other attributes. After the appearance questions, the other sensory dimensions (flavor, fragrance, texture) are addressed. In foods there is no difference if flavor or texture attributes are asked after appearance. In some cases it may be better to ask flavor before texture because: Flavor is more fatiguing than texture. If flavor questions are asked first and consumers need to re-taste products, then texture will not be affected by being addressed second. Generally speaking consumers are more familiar with flavor than texture characteristics. Addressing flavor first may make consumers comfortable with the questionnaire and test. Within flavor or fragrance attributes, the order in which the attributes are asked may not be important. However, in texture and skin feel evaluations, the attributes should be asked in the order perceived. For example, hitiallfirst bite texture characteristics such as firmness and denseness in foods should be asked before chewiness, greasiness and toothpacking (which are perceived during chewing or after swallowing). Thickness and absorbency in lotions and creams (evaluated during application) should be asked before oily/greasy . When the position of the attributes is not driven by order of appearance, the decision on the order of attributes should then based on importance. It is a fact that people, particularly untrained panelists and consumers, pay more attention to questions asked at the beginning of a questionnaire than those asked last. This is particularly true for very long questionnaires. Therefore, the most importadrelevant questions should be asked first (Resurreccion 1998). Moskowitz (2001) studied the position of attributes in a questionnaire and found that the attributes placed last in a questionnaire showed as much ability to differentiate products as did the attributes placed first. However, the ratings were lower for attributes rated last.
198
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
Number of Questions (Attributes) Another consideration in the design of a consumer questionnaire is the number of questions. How many questions can a consumer answer without getting tired or bored? Again, there is no conclusive answer to this question. Generally speaking, this author believes that the consumer questionnaires should be as short and concise as possible. Resurreccion (1998) indicates that the questionnaire should ask the minimum number of questions to accomplish project objectives. Sensory professionals should assess which attributes are really needed and also make sure that there are no redundancies. However, for certain products or tasks, there may be a need to design long questionnaires. If the researcher includes liking and attribute diagnostic questions per attribute, and there are about 20 - 25 relevant product attributes, then there may be as many as 40 - 50 questions in the questionnaire. This is not unreasonable for some products and tasks. As long as the questionnaire is properly formatted, as discussed below, questionnaires with 40 - 50 questions are acceptable. This author objects however, to longer than needed questionnaires. This occurs when the same questions are asked in different parts of the questionnaire, and when many related questions (different words, same meaning) are asked (unless it is a research project geared to investigate consumer vocabulary). It is the author’s experience that consumers handle long questionnaires well when they have been given enough time to complete them, when the questionnaire is properly formatted, when attributes are clearly understood, when they are shown the type of questionnaire they will be asked to complete prior to the test, and when they have a chance to ask any questions they have in an orientation held before the test begins. It needs to be emphasized that some researchers have different views and are appalled by the idea of having too many questions and prefer to design questionnaires with one to ten questions.
Questionnaire Format The format of the questionnaire is often overlooked and requires attention when designing the questionnaire. Practitioners should pay attention to the appearance of the questionnaire. Questionnaires should not look cluttered, or have a small or an unclear print. In addition, the type and format of the scales should be consistent throughout the questionnaire. Consumers have an easier task completing the questionnaire when it has been properly formatted. If the format of the questionnaire is consistent from page to page, then consumers will concentrate on the product evaluation rather than wasting time mentally adapting to a new format in each page. The types of scales used throughout the questionnaire affect its format, and the ease or difficulty with which the consumer completes this questionnaire. Practitioners should choose the hedonic and attribute diagnostics scales they
QUESTIONNAIRE DESIGN
199
prefer and use the same type of scales throughout the questionnaire. Consumers get confused with many scale formats in the same questionnaire. Preference and Liking Questions A discussion point among sensory professionals is the appropriateness of collecting hedonic/liking and preference responses for the same studyhest. Liking and preference are two distinct questions and provide different information. Liking, which is asked using a hedonic/liking scale, provides information on the degree of liking. Preference, if asked as a forced choice question, provides information on which product is preferred (chosen) over others. Liking questions yield interval data. Preference, asked as a forced choice question, yields nominal data. Both tests should be conducted when the two consumer responses are needed. A liking question should always be asked when the objective is to gather information regarding the degree of acceptance/liking of a product, and the way it compares to other products in terms of liking. Most consumer tests, thus questionnaires, include liking/acceptance questions. Preference should be asked when a head-to-head comparison of two or more products is needed. Except for projects that require only preference (e.g., some claim support studies), preference should not be the only question asked, unless researchers know how well a product or set of products is liked. Researchers should never extrapolate liking information from preference responses. For example, there may be a difference in preference between two products, but neither product may be liked. However, the inclusion of both liking and preference questions provides information on the degree of liking of the products and on the product(s) preferred. First, sensory/consumer scientists should assess if in fact both consumer responses are required. Once this is confirmed, it should be decided if one or two tests should be completed. Two separate tests are optimal, since “true” independence can only be achieved through two individual tests. Unfortunately, this is not possible for everyone and every test. In that case this author considers it appropriate to ask both types of questions in the same test and include both questions in a questionnaire. It is recommended then that a separate questionnaire, and if possible a separate set of products (with different codes), be used when both acceptance and preference are included in the same test. This practice creates some degree of independence between the two tests. Currently, a standard guide is being prepared by the ASTM Task group E18.04.05 to discuss issues related to acceptance and preference, and their inclusion in a consumer questionnaire (ASTM 2003). When both questions are asked, a sensory/consumer scientist should decide which question to ask first. Herskovic ef al. (1997) discussed the preference
200
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
question placement in a consumer test conducted for orange juice and at one city/market. The “no preference” option was also investigated. Herskovic e?al. found that the “preference forced choice first condition” (i.e., when preference is asked first) yielded the strongest discrimination between products. The “no preference first condition” was the least discriminating on preference and acceptability attributes. In an attempt to replicate the orange juice study conducted by Herskovic et al. and establish its generality across products and consumers, Kuesten (1998) and Ennis (1998) analyzed the data of a study with wheat thin crackers. This effort was coordinated through the ASTM task group on acceptance-preference (ASTM 2003). In addition, Ennis (1998) analyzed the orange juice data generated by Herskovic et al. (1997) using the beta binomial model. Kuesten (1998) concurred with Herskovic et al. that the “preference forced choice first condition” was the most sensitive in showing product differences. She also reported that acceptance first, preference last is least discriminating. Ennis’ analysis did not support these conclusions and found no evidence that sensitivities differ with different positions of the preference question. Based on these results, a definite recommendation on the placement of the preference question cannot be given. More research is needed across products, locations, tasks and data analyses to provide a definite recommendation on this questionnaire design issue. The Issue of the “No Preference” Category An often debated issue when dealing with preference tests is the structure of the paired preference test (Gridgeman 1959; Odesky 1967; ASTM 1998; Ross 1969). Most sensory professionals include the preference question as a force choice question; viz., consumers are forced to chose a product they prefer, even if they expressed equal liking for both. However, other practitioners believe that consumers should not be forced to choose a product if there is no preference. In that case, the question includes a “no preference” category. Most sensory professionals believe that asking the question as a forced choice question is the sound way to study preference. Consumers make (or are forced to make) a decision on preference. Researchers are not left with the responsibility of handling “no preference” responses. In addition, when the preference test is run as a forced choice test, the data can be analyzed statistically using the binomial distribution (which deals with the two outcomes: product A preferred, product B preferred). It is this author’s opinion that consumers can understand how important it is to make a decision in the test, if an adequate explanation is provided. This explanation should be given during an orientation prior to the test. Consumers should be told that they will be asked to make a choice and to select a product even if they cannot easily chose one.
QUESTIONNAIRE DESIGN
20 1
Consumers should also be told that it is legitimate to guess and to choose any one product when having difficulty making a decision. They should also be advised that the data analysis performed later will handle their guessing responses, and that this analysis is easier with their “definite” preference responses. Usually, most consumers given this explanation will complete the forced choice preference test without a problem. Consumers who are still unwilling to make a choice while completing the test should be subtly forced one more time. Only if a consumer refuses to choose a product at that point, are they allowed to indicate “no preference.” It is this author’s opinion that following the foregoing practice yields a small number of “no preference” responses. The practitioner who conducts a test that includes the “no preference” category, must deal with more issues. The test now differs from the forced choice test, both from a behavioral and from a statistical point of view. Consumers are now allowed to indicate “no preference.” This means that with small differences in preference consumers may now choose the “no preference” category, when in fact they would have chosen one of the products had they been forced. The power of the test decreases (no significant differences are found) for products with small differences. The main issue with the inclusion of the “no preference” category is how to handle the “no preference” responses. When the number of “no preference” responses is small, the researcher has several options: disregard the “no preference data” (this practice decreases the power of the test, and a real preference outcome may be missed). split the “no preference” responses equally between the two products, assuming people who did not have a preference would have randomly chosen a product when forced. split the “no preference” responses proportionally to the preference pattern for the two products; i.e., the product with more preference responses gets assigned more “no preference” judgments in the proportion of the difference between both products. Each of three options present some problems and the outcome may be questionable, since the sensory professional handles the “no preference” judgments in different ways. In the end, however, the manipulation of the “no preference” responses may not be viewed as such a problem when this proportion is relatively small (e.g., 10%). There may not be a great comfort level if the practitioner has to make a decision on how to handle “no preference” data when the percentage is much higher.
202
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
Special considerations are needed for superiority claim support. For the purpose of supporting an unqualified preference claim, “no preference” responses should not be allocated, whether in total or in part, to the advertiser desiring to claim preferences. Regardless of the significance level or preference, if the percentage of “no preferences” is 20% or more, an unqualified preference claim should not be made. In these cases, the preference claim should be made in terms of “those who expressed a preference” (ASTM 1998). The reader is referred to some literature that discussed the analysis of preference responses (Glenn and Davis 1960; Rao and Kupper 1967; Draper el al. 1969; Odesky 1976).
Open-ended Questions Open-ended questions, as the name implies, are questions that allow the consumers to provide product information using their own words. Therefore, this information is qualitative in nature. Common open-ended questions include: What specifically did you like (dislike) in this product? What appearance (flavor, fragrance, feel) characteristics did you like the most in this product? Why did you prefer this product? How did you cook (apply) this product? Please provide your comments regarding... Open-ended questions have advantages and disadvantages. Therefore, some practitioners always include them in questionnaires, while others never do. Their main advantage is that they are a venue through which consumers can provide product information using their own words and comment on those product characteristics that they consider relevant or worth mentioning (for good or bad reasons). Therefore, these consumer comments can provide additional insights. They help to support quantitative information and often help clarify data not easily understood by the researcher. Another advantage is that consumers can provide their comments for missing attributes. The main disadvantage of open-ended questions is the effort and time needed to decode and tabulate the information. The time invested may not pay off in those cases where no new information is obtained through these questions. Additionally, the person decoding and summarizing the
QUESTIONNAIRE DESIGN
203
information needs to be oriented to be able to properly categorize and summarize responses. There is merit in including at least one open-ended question in the questionnaire if the practitioner can afford the time to summarize this information. It can provide new insights, confirm the quantitative results, provide an opportunity for consumers to express some opinions in their own words, or collect important consumer comments for attributes in the questionnaire. Additionally, some attributes missed by the researcher might be cited in openended questions.
REFERENCES ASTM. 1998. Standard E 1958-98: Standard Guide for Sensory Claim Substantiation. In: ASTM Annual Book of Standards. ASTM, West Conshohocken, Penn . ASTM. 2003. Standard guide on Acceptance-Preference (Task Group E. 18.04.05). (In preparation). BELSON, W.A. 1981. The design and understanding of survey questions. Gowever Publishing Co., Brookfield, VT. DRAPER, N.R., HUNTER, W.G. and TIERNEY, D.E. 1969. Analyzing paired comparison tests. J. Marketing Res. 6, 477-480. EARTHY, P.J., MACFIE, H.J.H. and HEDDERLEY, D. 1997. Effect of question order on sensory perception and preference in central location trials. J. Sensory Studies 12(3), 215-237. ENNIS, D.M. 1998. ASTM Acceptance-Preference study to determine the sensitivity of preference versus acceptance measures when order varies and the no preference options are given. Presented at the ASTM Task Group El 8 .O4.05 meeting. FOWLER, F.J. 1995. Improving Survey Questions: Design and Evaluation. Sage Publications, Newbury Park, Cal. GLENN, W.A. and DAVID, H.A. 1960. Ties in paired-comparison experiments using a modified Thurstone-Mosteller model. Biometrics 16,86- 109. GRIDGEMAN, N.T. 1959. Pair comparison, with and without ties. Biometrics 15, 382-388. HERSKOVIC, J.E., SLEDZIESKI, L., SHAW, J.R. and ADEYEMI, B. 1997. Effects of a “no preference” option and of preference question placement on consumer preference tests. Paper presented at the IFT (Institute of Food Technologists) 1997 Annual Meeting.
204
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
KUESTEN, C. 1998. ASTM Acceptance-Preference study to determine the sensitivity of preference versus acceptance measures when order varies and the no preference options are given. Presented at the ASTM Task Group E18.04.05 meeting. LAWLESS, H.T. and HEYMANN, H. 1998. Sensory Evaluation of Food, Principles and Practices. Chapman and Hall, New York. LEVY, C.M. and KOSTER, E.P. 1999. The relevance of initial hedonic judgements in the prediction of subtle food choices. Food Quality and Preference 10(3), 185. MOSKOWITZ, H.R. 2001. Interrelations among liking attributes for apple pie: Research approaches and pragmatic viewpoints. J. Sensory Studies 26, 373-391. ODESKY, S.H.1967. Handling the neutral vote in paired comparison product testing. J. Marketing Res. 4, 199-201. RAO, P.V. and KUPPER, L.L. 1967. Ties in paired-comparison experiments: A generalization of the Bradley-Terry model. J. American Statistical Assoc. 62, 194-204. RESURRECCION, A.V.A. 1998. Consumer Sensory Testing for Product Development. Aspen Publishers, Gaithersburg, MD. ROSS, I. 1969. Handling the Neutral Vote in Product Testing. J. Marketing Res. 6, 221-222. SORENSEN, H. 1984. Consumer Taste Test Surveys; A Manual. Sorensen, Corbett, OR.
MAXIM0 C. GACULA, JR. Completing a questionnaire in private dealing with consumer products is not as difficult as filling up questionnaires dealing with social issues, such as drug use, racial issues, and other threatening questions. Most consumer products dealing with foods, beverages, cosmetics, textiles, over-the-counter medicinal products, and others, use a structured questionnaire design in order to measure people's attitudes and behavior. The questions asked are not threatening. Threatening questions used in social issues are very bias-prone. The answers are subject to change depending upon the experimental situations. Yet, it is an art to ask sensory questions for consumer products. Sensory-related questions often require careful thought because it encompasses the five basic human senses of taste (food flavor, sweetness, sourness), smell (food aroma, fragrance intensity), touch (fabric friction, drag, skinfeel), hearing (food crispness/crunchiness, squeakiness), and sight (packaging design, product color). Important books containing guidelines on questionnaire design that every sensory practitioner
QUESTIONNAIRE DESIGN
205
should have are those by Moskowitz (1983), Lawless and Heymann (1998), Resurreccion (1998), and Meilgaard er al. (1999). The development of questionnaire design is rooted in the social and behavioral sciences, the various theories of which are now used in other areas, such as in marketing and Sensory Science. Sudman and Bradburn (1982) defined three components of measuring attitudes. These components can be translated into what we know today in sensory and consumer testing: (1) Affective component exemplified by the well-known hedonic question, acceptance question, and the preference question. (2) Cognitive component exemplified by intensity question pertaining to sensory attributes; questions dealing with what is in the product. (3) Action component exemplified by purchase intent. The Just-About-Right scale discussed in Chap. 9 contains both the affective and cognitive components, a combination that creates various viewpoints and controversies in its current use in sensory and consumer testing. Muiioz and Moskowitz expressed their practices, opinions and reservations on the most common problems encountered in developing questionnaire design for a particular product category: location of overall liking, length of questionnaire, consumer fatigue, open/close-ended question, order of attribute evaluation, and "no-preference" category on the rating scale. In this section, the author (MCG) brings out his viewpoints and opens up some opportunities for future research work.
Placement of Overall Liking There is no published scientific work on the correct placement of overall liking in a questionnaire. It is generally a practice that the placement of overall liking depends on the product and the objectives of the study. Within a company, the rules are pretty much set as to the format of questionnaire design. For food products evaluated at one sitting, overall liking can be either a first question or a last question in the design. For products that are used over a period of time at home, such as antiperspirant, razor blades, etc., I prefer to place the overall liking at the end. It is hypothesized that overall liking or overall evaluation of consumer products is a process of sensory integration of various product properties. Given this point of view, ratings of the degree of overall liking would be dependent on these properties. Hence the overall liking question should be placed at the end of the questionnaire. In terms of statistical model, one can simply write
206
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
Overall liking = f(diagnostic sensory attributes) where f refers to the functional forms of the relation between overall liking (dependent variable) and diagnostic attributes (independent variable). Unless there is a compelling reason for a specific placement of overall liking, my rule is to make overall liking the last question in the design.
Length of Questionnaire Questionnaire length and respondent fatigue present us with another set of issues. On diagnostic attributes, i.e., sensory intensity, important information to have at the beginning is the correlation matrix of attributes, which can be obtained from historical data. Attributes that are highly correlated, either positively or negatively, can be represented by, say, two or three attributes important to the product. In most cases, and through years of experience and knowledge of the functions of ingredients in a given formula, product developers can provide guidance to the choice of the attributes for inclusion. If this information is responsibly utilized, then the questionnaire length would not be an issue. Sensory fatigue is not really an issue. The issue is the amount of time involved in performing the task. Consumers are reluctant to be tied up for a long time doing the task. A study by Vickers et al. (1993) on yellow cakes showed that neither the presence of key attribute questions nor the length of the questionnaire affected the value or the sensitivity of judges’ overall liking scores. In this study, 450 consumers participated in the study using a 2 x 2 factorial design in which the samples were modified to create texture and flavor flaws in the product. For new products in which the sensory attributes are not fully known, a Focus Group can be used to obtain the needed sensory attributes as illustrated by McNeill er al. (2000) on peanut butter. In this study, the Focus Group provided a vocabulary for the development of a quantitative consumer test questionnaire, and increased understanding of consumer language for peanut butter. The Focus Group method can be used for other consumer products. The reader is referred to an edited focus group book by Morgan (1993) and the book by Resurreccion (1998) which devotes a chapter to Focus Groups.
Order of Attributes in the Questionnaire The appropriate order of diagnostic attributes in a questionnaire is known and the author (MCG) agrees completely with the rationale given by Moskowitz and Mufioz and those given in books by Stone and Side1 (1993), Lawless and Heymann (1998), and Meilgaard er al. (1999). That is, order is determined by the sequence of detection/appearance of sensory attributes during product use. For instance, in a shaving test the first attribute observed would be closeness of
QUESTIONNAIRE DESIGN
207
shave followed by nicks and cuts; thus closeness should be the first attribute in the questionnaire followed by nicks and cuts. Overall evaluation or overall liking should be the last question to be asked in line with the statistical model given earlier.
Preference Rating Scale The author (MCG) contends that the “no-preference” category should be one of the choices on the preference rating scale. The inclusion of no-preference adds more information to the data, which information needs no further explanation as this is widely known in the sensory community. In most consumer studies, the sample size is generally sufficient. Including this response category should not create a statistical problem in the analysis, as well as interpretation. It is assumed that the no-preference response denotes equality of preference between the two samples presented. This is important because the statistical procedures used to analyze the preference question with “nopreference” choice are based on this assumption. These procedures are: equally splitting the no-preference counts to the two samples, splitting proportionately according to the preference counts of each sample. Thus it is important that this preference question must be simple with sufficient instructions, properly worded, and that the respondents fully understand the meaning of the “no-preference” choice. The statistical procedures stated above appear to have not been experimentally assessed. Consumer product companies have volumes of historical data that can be used to examine the above statistical procedures, including the exclusion of “no-preference” counts in the statistical analysis. At the present time, the field is burgeoning with arrays of statistical and computer techniques that provide opportunities to investigate these procedures. For instance, the preference data with “no-preference” choice can be subjected to the Bootstrapping technique (computer simulation) and the possibility of incorporating Bayesian statistics in the analysis. It is hoped that bringing this idea to the reader will be a challenge for us within the next decade. REFERENCES LAWLESS, H.T. and HEYMANN, H. 1998. Sensory Evaluation of Food, Principles and Practices. Chapman and Hall, New York. McNEILL, K.L., SANDERS, T.H. and CIVILLE, G.V. 2000. Using focus groups to develop a quantitative consumer questionnaire for peanut butter. J. Sensory Studies 25, 163-178.
208
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
MEILGAARD, M., CIVILLE, G.V. and CARR, B.T. 1999. Sensory Evaluation Techniques. CRC Press, Boca Raton, FL. MORGAN, D.L. ed. 1993. Successful Focus Groups. Sage Publications, Newbury Park, Cal. MOSKOWITZ, H.R. 1983. Product Testing and Sensory Evaluation of Foods, Marketing and R&D Approaches. Food & Nutrition Press, Trumbull, Conn. RESURRECCION, A.V.A. 1998. Consumer Sensory Testing for Product Development. Aspen Publishers, Gaithersburg, MD. STONE, H. and SIDEL, J.L. 1993. Sensory Evaluation Practices. Academic Press, San Diego. SUDMAN, S. and BRADBURN, N.M. 1982. Asking Questions, A Practical Guide to Questionnaire Design. Jossey-Bass Publishers, San Francisco. VICKERS, Z.A., CHRISTENSEN, C.M., FAHRENHOLTZ, S.K. and GENGLER, I.M.1993. Effect of questionnaire design and the number of samples tasted on hedonic ratings. J. Sensory Studies 8, 189-200.
CHAPTER 12 CHOICE OF POPULATION IN CONSUMER STUDIES
HOWARD R. MOSKOWITZ Sensory scientists and market researchers share a long, quite often, antagonistic history when it comes to the choice of the population in consumer studies. Historically, sensory scientists were relegated to the position of the “low-cost supplier” of consumer data. What this meant and often still means was that in company after company, university after university, the pool of “consumer” panelists comprised the employees, students, faculty, and whoever was available, and had not been otherwise subjected to formal training in product evaluation. This training, in turn, would make them eligible for the expert panel. Quite often the very proud, cost-conscious and public-relations oriented management in food companies, and now in health and beauty aids companies, built the test facility in the workplace, whether this be a corporate headquarters (a favorite venue because it could serve as a showplace), or at the factory. More often than not the test facilities doubled both as a laboratory for data acquisition and as a showplace to demonstrate to management that the R&D group was au courant in new ways of understanding consumers. The research management, e.g., the vice president of R&D, would recruit a staff of research scientists and clerical assistants to man this facility. The panelists were chosen from the employees. In the more organized facilities each panelist would be classified in terms of the products that they liked and disliked, in order to avoid recruiting a panelist who did not like a specific product. Thus, the manager of Sensory Science, forced to deal with in-house consumer panelists, would at least be able to justify the selection of the appropriate category of individual, if not the true consumer. These “consumer panels, ” or better-phrased, the untrained employee panels, functioned quite well. They kept the costs down because there were no outside costs for recruiting the panelists or paying for their time. There were other costs, hidden from the accounting sheets, such as the real cost for the panelist’s time because the panelists were on the corporate payroll. Often the real costs of these panelists were far higher than the accounting sheets revealed, because the costs due to interrupting one’s work, the costs due to the actual time involved, etc., were never really factored into the calculations. What appeared to be a bargain price in using “in-house” panelists often turned out to be no bargain at all.
209
2 10
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
There were and there remain other difficulties with in-house panelists, not the least of which is bias.
(1) Product Familiarity. One bias comes from familiarity with the product that produced a sort of “quasi-expert,” not really instructed in the evaluation of the product, but knowing a lot more than would an untutored, ordinary consumer. An exceptionally strong bias, for instance, occurs in cigarette work, where the in-house panel absolutely understands and recognizes the “signature” of a particular product, or the specific profile of sensory characteristics of that product. Every cigarette has a signature, which soon becomes obvious to the inhouse panel, with experience. (2) Panelist Over-Participation. Another bias is the over-exposure of the product. With so many free panelists, it is tempting to test the same person again and again. (3) Incorrect Demographics. A third bias is the inability to work with a stratified sample of consumers. The in-house panelists comprise the full array of individuals available to the researcher, but they do not vary on the geodemographic characteristics that comprise a larger panel. (4) No Children. A fourth basis is the inability to work with children, since the employees are all adults.
(5) Subtle Professionalizaton. A recurrent bias comes from the fact that, despite all precautions and loud assertions to the contrary, some individuals evolve into professional respondents. These become professionalized because coffee), and the they enjoy the break from work, the reward (cookie camaraderie that inevitably develops between the participant and the researcher. They volunteer again and again, satisfying the in-house researcher who is happy to have another body to fill the quota.
+
The So-Called “Church Panel”
- A Compromise Solution
In recent years the R&D sensory scientist has sought to expand the capacity of testing by branching out into a small-scale consumer research function. Part of this function consists of a set of panelists chosen from a local affinity group, such as a church, thus giving rise to the name church panel. The panelists are not screened to participate, although most researchers insist that the panelists in these studies be at least non-rejectors of the product. Does the church panel work in practice? It certainly can be executed properly, the costs are substantially lower than other methods, and the panel
CHOICE OF POPULATION IN CONSUMER STUDIES
211
provides data that can be used successfully. The problem is that no one really knows whether or not the church panel represents a false economy, for these six reasons: (1) Representation. The relatively low cost to do research must be balanced against the fact that the panelists often do not really represent the population to which the product will be marketed. (2) Scope of Evaluation. The panelists will not evaluate more than 2-4 products, precluding larger scale studies and sensory segmentation.
(3) Interest. The panelists may or may not be motivated to do the work, and may feel compelled due to social pressure.
(4) Production Line Mentality. Companies that use church panels begin to pride themselves on the ability to do work cheaper than they would otherwise be able to do, but almost never do. They proclaim loudly that the church panel provides better or the necessary data. (5) Dumbing Down Research. In companies where the church panel is the preferred method for obtaining research, the projects are “dumbed down” to the level that the church panel data can answer, rather than being expanded to deal with more strategic issues. (6) Growth Blocking. Gresham’s law for research can be restated in this case as: “cheap research drives out expensive research.” Since the fruits of product and concept research cannot be immediately seen except in the most dire of corporate circumstances, internal staff relying upon the financials of the research use the low cost to make church panels the standard method of testing. From there the panel approach becomes standardized, driving out anything that does not fit into its easy, neat purview of lost cost, low effort, simplistic research.
Can There Ever Be a Random Sample of Panelists? With all of the problems emanating from the in-house panel, how then can the sensory scientist and the market researcher ever develop a truly random sample, or at least a representative sample? The real answer is that the researcher in no way can ever develop a true random sample. Randomness means that anyone has the chance to participate in the study, and that the selection of the panelist is done purely by the researcher. That is, the selected
212
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
panelist must participate. This is clearly impossible, for the following three reasons: (1) Slavery Has Been Abolished. Except perhaps for college sophomores who are required to participate in research experiments. In reality a panelist can refuse, for whatever reason. We are left with a sample of individuals that can do the task, either because they are interested in participating, like the product, want the reward, etc. The sample is already not random. (2) Drop-outs Are Permitted. Panelists who begin the study often do not finish it. The study is biased in favor of individuals who like to participate, or at least who feel guilty not participating.
(3) Not Everyone Can Be Invited (Sampling). It is impossible to sample the entire population because many panelists are hard to reach. Higher economic classes tend to participate less frequently than lower economic classes. Even paying panelists doesn’t necessarily ensure that one has sampled all classes equally.
The news is not altogether bad, however. The issue of a random sample is probably not particularly relevant at the early stages of product development, where most sensory researchers operate. There is no clear data that show the sensory preferences of one group of hard-to-reach people to be any different from those of easy-to-each people. Indeed, one of the issues in sensory preference segmentation is that these segments appear, intertwined, in all parts of the population, and are not easily defined by conventional classifications, such as age, income, gender, market or product usage. If sensory preference patterns are hard to correlate with conventional demographics, then the odds are that sampling any group of panelists will generate similar types of acceptance patterns for products, at least on a blind basis. The author has found this to hold for many products. Standard classification variables do not co-vary with liking ratings. Older and younger consumers, men and women, often rate products quite similarly on a blind basis, even if these individuals use different products. One can easily confirm the foregoing statement by evaluating 10 products in a category blind, among 100 consumers, 25 in each of four markets, half male, half female, half heavy users, half light users, with an assortment of brands used most often. An analysis of variance will show these groups of panelists to exhibit similar rankings and even ratings of liking.
CHOICE OF POPULATION IN CONSUMER STUDIES
213
Users Versus Non-Users - Do They Differ, and If So How To Measure the Difference? Consumer researchers often divide panelists by the criterion of whether or not the panelist uses a specific product (otherwise known as user versus nonuser). At first glance this division appears absolutely reasonable, since the two groups of panelists behaviorally differ in the key criterion of product usage. Problems arise, however, when the researcher then attempts to correlate usagehon-usage with the properties of the product, such as acceptability, acceptance of specific sensory attributes, and the like. All too often this easy-tocreate division into user versus non-user fails to correlate with the acceptance of products tested “blind” (viz., without benefit of the brand), but may correlate somewhat, but not necessarily highly with the acceptance of the products tested “branded.” The difference between users and non-users can thus be considered in terms of the following hierarchy. (1) Sensory Intensity Perception (Amount of an Attribute, Such As Darkness). Rarely if ever does product-usage co-vary with this type of attribute, nor should it. Sensory attributes are simply the rating of the amount of a characteristic. There is no CI priori reason for the intensity of a sensory attribute to vary according to the panelist’s purchase behavior.
(2) Sensory Directionals (Too Much, Just Right, Too Little of an Attribute). Sensory directionals are also fairly robust, and independent of a panelist’s usage pattern. The typical instance that a sensory directional may co-vary with usage would occur if the sensory attributes of the product are dramatic and noteworthy. For instance - people who use an extremely hot salsa product like the taste, whereas non-users may not like the taste. When the sensory directional deals with an intense or “signature” sensory attribute, then users and non-users may differ. When the sensory directional deals with a more conventional and prosaic attribute, such as thickness of texture or darkness of color, there are usually few differences between users and non-users. (3) Liking Attributes for a Product That Can Truly Be Said To Be Testable on a Blind Basis. If the researcher can disguise the product so that it is absolutely not identifiable in terms of brand, then there may be some, but probably no dramatic differences in blind tests. (4) Liking Attributes for a Branded Product. In this situation there are often dramatic differences between the products rated by users versus non-users.
214
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
It is clear from branded ratings that a great deal of the rating may be ascribable to the brand name. If the product is presented along with a white card description, then the difference between user and non-user is large. If the product is presented with the actual packaging that identifies the product, then the difference between user and non-user may be even larger, perhaps because the visual cues provided by the package become far more overwhelming.
Are Geographical Differences Relevant? In consumer studies the goal is to represent the range of consumers. One of the easy ways and a prima facie correct way to achieve representation is to sample consumers in different markets, with the objective of representing the full range of consumers. Although there is no clear relation between the “geography” of the panel (viz., market) and response (e.g., product preference), the unstated hope is that by obtaining data from different markets the researcher will have represented the universe. Geographical variation may exist, but all too often this variation hides a more profound division of consumers into different segments. These segments occur in all markets, albeit to different proportions. Unless, however, the researcher knows about these segments ahead of time, the researcher must operate on a superficial level. The real differences, such as those revealed through sensory preference segments, cause apparent differences in geographies of respondents simply because the segments may be present in different proportions in different markets. In some instances, however, geographical differences are relevant for at least two reasons. (1) Different Cuisines. Consumers in different regions may exhibit different sensory preferences because the foods to which they have been accustomed differ from each other quite dramatically. For example, in the Southwest, “hot” foods are considerably hotter than the same label assigned to a product in the Northeast. Consumers in the Southwest may, in fact, grow up with hotter, spicier foods.
(2) Different Products. Consumers in different regions are exposed to different brands, and may come to expect different things from the same product. For example, in the northeast United States, consumers are accustomed to stronger coffees, whereas in the northwest United States, in contrast, consumers are accustomed to weaker coffees. In Italy, the coffee is far stronger than it is in the United States. The Italians drink smaller portions of this stronger coffee.
CHOICE OF POPULATION IN CONSUMER STUDIES
215
Can We Over-Train Panelists? Consumer researchers prefer to work with panelists who have not been over-exposed to products. In many consumer projects, the researcher uses a “screener” or initial questionnaire that seeks information about the last time the panelist participated in a study. Often the panelist is rejected from participation if the panelist participated more recently than three months ago. The specific length of time during which the panelist is not to have participated varies by researcher, project, etc. These precautions are often to prevent the panelist from becoming overtrained. The word over-trained may be a little too strong in this context because there is really no training involved, but rather a choreographed exposure to one or a set of products. Yet, if the panelist participates in the same type of study too frequently, it may well turn out that the panelist’s criteria change as the panelist gains increasing experience. The panelist may begin the experience attending to one aspect of the product such as flavor, but the questionnaire may bias the panelist to attend to other aspects such as texture, and to other specific aspects (e.g., a certain characteristic of texture). Such a focusing effect is inevitable in a study, because any set list of attributes may call attention to product characteristics to which the panelist did not attend previously. The problem becomes more severe when the panelist repeatedly evaluates the same type of product. This might be the case for a company whose employees act as participants. Eventually, the panelist changes his way of thinking about the product. Previous criteria may change, and not just for the immediate test session, but for a long time to come. At this point the consumer panelist becomes a new entity - not quite a consumer, not quite an expert. The psychological laws of perception take over, and modify the panelist’s criteria. Such changes in the perception of product occur even when the stimuli are quite simple, such as odorants, where repeated exposures to a simple stimulus increase the range of sensory nuances that the panelist reports (Moskowitz and Gerbers 1974). The same type of change will naturally occur if the panelist participates in a panel, and the responses from that panel generate feedback. Feedback changes the pattern of responses. Each panelist generates a specific pattern of responses that reveals how that panelist learns the sensory characteristics of products. It is often thought, albeit without proof, that once all the panelists reach a specific level of expertise through training and feedback, they then stop changing. There is no reason to believe that this is the case - that is, experts themselves are fixed and immutable in their expertise. The same laws hold for perceptual changes when a person is an expert as hold when a person is an untrained consumer - repeated exposures modify the way the panelist organizes the world. Similar phenomena occur all the time in the consumer’s world, where
216
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
repeated listening to a musical piece reveals new and hitherto unsuspected features of the piece, even if the listener is a musical expert. Sensory researchers have approached this issue by creating the analytic discipline of "monitoring panelist performance." There needs to be much more in the way of understanding panelist Performance, beyond a tracking system. How does the researcher discover when a panelist changes his mental organization of the stimulus material, when new dimensions of perception enter, when criteria for description of the product shift? These represent new and fertile areas for Sensory Science, but only if the emphasis in on the change in cognitive structure with experience, rather than performance on specific, statistically measurable tasks. One might, for instance, look at panelist x product interactions, to determine whether the panelist changes criteria. Or, one might look at the ranking of products by a panelist over time, to determine whether or not the panelist maintains the same criterion. A constant criterion would lead to repeatability in the ranking. To the degree that the ranking by a panelist of the same set of products changes on a specific attribute, we can conclude that the panelist has changed his criterion. Unfortunately, there is yet to be developed a strong linkage between the statistical analysis of panelist performance and verbalization of the underlying criteria. That is, people know how they rank and rate products in tests, but often cannot verbalize the criteria that they use. Dividing the Population by Sensory Preferences The author has discussed the existence of alternative ways to divide consumers. One of these is to divide people by the pattern of their preferences (Moskowitz and Krieger 1998). The observation was made over a period of 15 years that the individual differences in product acceptance could be traced to factors beyond the conventional geo-demographics of age, income, market, and beyond the conventional marketing classifications, such as brand used most often. Rather, there appeared in the population groups of consumers with radically different sensory preferences. This topic has been dealt with in Chap. 2, on international research. It is important to note, however, that the sensory segmentation hypothesis provides an alternative way to screen respondents for sensory studies. Since these individuals vary so much in the pattern of what they like, it is important to ensure that any panel of respondents comprise these different segments. These segments are distributed throughout the population, so understanding their preferences and how to communicate with them provides the marketer and the product developer with a powerful organizing principle and practical tool in the industry. Satisfying a sensory preference segment has the potential of creating a far more deeply satisfied customer.
CHOICE OF POPULATION IN CONSUMER STUDIES
217
Sensory Preference Segments - How Can We Find These Consumers?
If there truly exist sensory preference segments transcending markets, and even countries, then how does the researcher and marketer locate these consumers? This is both a research question and a marketing question. The researcher must find these people because they represent different types of panelists, with whom to conduct studies, and thus allow the researcher to understand more profoundly the nature of the differences between these segments. The marketer, in contrast, now knows that there exist groups in the population with substantially different sensory preferences. Thus it makes a great deal of sense to create products that appeal to these segments, rather than to create products that appeal to everyone, but in a marginal fashion only. From a marketing perspective, it is far better to have a limited population of consumers who love one’s product, and continue to re-purchase it. The product features can be fine-tuned to that segment. From the author’s experience, a substantially greater profit can be had from creating products designed for the specific target audience. Conventional methods divideconsumers using well-establishedcriteria. Yet, they do not work for sensory segments. It is fairly easy to divide consumers on the basis of the market in which they live, and it is becoming increasingly easier to identify consumers showing specific purchase patterns. These consumers do not share common sensory preferences. Thus, dividing the consumers by these conventional marketing indices does very little to identify these segments. Yet, if the most promising differences among consumers are due to sensory preferences, then how can the researcher identify these groupings of individuals without repeating the entire study for each person in order to discover the segment to which the panelist belongs? This is not an easy question to answer. There are at least three strategies, of different complexity. (1) Polarizing Products as Test Stimuli. This approach involves evaluation of one or two test stimuli by panelists. These test stimuli are chosen in order to be maximally polarizing - e.g., highly acceptable by panelists in one segment, and unacceptable to another segment. For instance, assume that the sensory segmentation occurs on the basis of heat and particulates, with one segment loving the heat and particulates, and the other segment hating the heat and particulates. The strategy would be to present the panelist with one product having a substantial amount of both physical attributes, and another product having a low level of both features. Panelists belonging to the high impact segment should like the high impact sample, and dislike the low impact sample. Panelists in the other segment should dislike the high impact sample, and like the low impact. One needs simply to use those two differentiating products. One might even go so far as to use only one product - the high impact product, and
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
218
obtain the liking rating. It is important, however, that the product be truly polarizing, and that there be only two segments. With three or more segments matters become more difficult. With three products the segment may not be opposite to each other, but rather exhibit different patterns of preferences. (2) Regression-Type Procedures. Other, more statistically based methods can identify panelists in the population, as long as one can develop a relation between the sensory preference segments and information available about the panelist. The statistical methods go beyond simple cross tabulations, which at the most basic level do not provide the necessary discriminating power to differentiate these segments. Statistical approaches often involve one or another variant of multiple discriminant analysis. The researcher works with a variety of stimuli classified into discrete groups (e.g., acceptancehnacceptable), and attempts to predict this classification by some weighting rule applied to the different independent variables. Typically these independent variables are physical measures of the product, but they can be aspects of the panelist instead.
(3) Data-Mining Procedures. Recent advances in statistics, used more in the social sciences, provide a possible solution to the problem. The general approach is called “data mining,” and specific “CART” (classification and regression trees; Salford Systems 1994). One version of data mining follows these three steps: a.
Step 1 - Run the study with panelists, who both evaluate the products, and complete an extensive classification questionnaire dealing with their geo-demographic characteristics, media habits, etc.
b.
Step 2 - From the product evaluation ratings, segment panelists into the different segments, based upon the pattern of preferences (and specifically upon the curves relating sensory attribute level to liking).
c.
Step 3 - Develop a decision tree, such as that shown in Fig. 12.1. The decision tree shows the relation between the classification variables that can be easily asked in the next screening questionnaire, and the membership in a sensory preference segment. Typically, segment membership is not determined by one classification variable alone, and indeed quite often the pattern is unclear. Yet the statistical programs can create a potentially useful decision tree that classifies a person into one of the segments with greater correctness than would be the case if the person were randomly classified into either one of the segments, or the most populated segment.
CHOICE OF POPULATION IN CONSUMER STUDIES
Node -1 Class = 1
219
\
I
Node -3 Class = 2
Yes
I
7
\
Node -2 Class = 1
I
FIG. 12.1. EXAMPLE OF A DECISION TREE
Whether the researcher uses discriminant function analysis or CART (Salford Systems 1994),or any of the other tools available now, the power of high speed computing makes it possible to identify individuals in these segments from profiles other than those obtained from direct product tests. The key is that finally there is a way to better target product development, using sensory segmentation as an organizing principle. Furthermore, these statistical decision systems act as enabling devices to identify an individual as a member of one of a limited number of such segments.
REFFXENCES MOSKOWITZ, H.R. and GERBERS, C. 1974. Dimensional salience of odors. Annals of the New York Academy of Sciences 237, 3-16. MOSKOWITZ, H.R. and KRIEGER, B. 1998.International product optimization: a case history, Food Quality and Preference 9, 443-454. SALFORD SYSTEMS. 1994.CART for Windows, Version 2.0. San Diego.
220
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
ALEJANDRA M. m 0 Z The consumer population to be used in a consumer test is among the most important test design issues to assess. Consumer tests, as the name says, should be conducted using the actual consumer, i.e., the user or potential user of the product, who is ndive in product testing and who meets other recruitment criteria (Stone and Side1 1993; Lawless and Heymann 1998). In addition, a group of consumers is selected as a sample of some larger population about which the researcher hopes to draw some conclusions (Resurreccion 1998). This discussion is intended to cover two main points: the selection of the consumer population, and the assessment of the actual consumer population after the test is completed. We tend to overlook the importance of this second assessment, upon completion of the test.
The Selection of the Consumer Population There are several issues that sensory/consumer scientists need to consider in selecting the consumer population for a given test. Among the most important ones are: Qualifications and needed recruitment criteria for the test Cost considerations Special consumer populations
A brief discussion of these issues follows. (1) Choice of Population Based on Diverse Recruitment Criteria The recruitment of consumers is most often completed by having consumers answer a series of questions addressing diverse recruitment/ qualification criteria (McDemott 1990; Carter and Riskey 1990). The more specific the required qualifications, the more difficult it is to find those consumers (incidence level may drop), and therefore, the more expensive it becomes to recruit the desired consumers. Therefore, it is important to assess the qualifications that are really required for the test. It is important to add all the qualification criteria to ensure that the appropriate respondents participate. However, there may be times where some criteria may not be needed and an unnecessary higher test cost can be avoided.
a. Product Usage. This is the most important qualification criterion. All consumer tests need to include this requirement, even when employees participate. Participants need to have familiarity with the category and have
CHOICE OF POPULATION IN CONSUMER STUDIES
22 1
the proper frame of reference and experiences for the assessment of the test products. When new products are tested, this criterion cannot be met. However, other qualifications should be considered, such as usage of similar product categories (e.g., snack users if a new snack will be tested), other presentations (e.g., user of shelf stable if a given similar frozen product will be tested), similar flavors, fragrances, etc.
b. Frequency of Use. This criterion determines if heavy, medium or light users are recruited. Sensory/consumer scientists need to assess the degree to which this qualification is needed. Heavy users are not required in each project. Projects that may require the participation of heavy users may be ingredient and process substitution, claim substantiation, comparison to competitors and benchmarks, product matches, establishment of Quality Control sensory specifications. Also projects involving key brands may be the only ones requiring the recruitment of heavy users. Higher recruitment costs may be incurred if heavy users are recruited. Therefore, the sensory practitioner has to determine the true need for heavy users in the test. The market researchers should be consulted to obtain usage information. This information aids in determining the cut off points for product usage. c. Gender. This criterion is very important when the product requires that one gender be recruited because of its use. Otherwise, most studies require a mix of both genders. Realistically however, all consumer studies tend to include a larger proportion of females, since they are more available to participate in studies. d. Age. Generally speaking this is not a very strict criterion, unless the project requires it. This parameter is only controlled if a product is geared towards consumers of a specific age (e.g., children, the older population, etc.). Otherwise, it is a common practice to have all consumers between the age of 18 to 55/60 participate. e. Other Criteria. There are many other criteria that can be added to the screening questionnaire used for the recruitment. Yet sensory professionals should assess if these other criteria are needed. Criteria such as income, children in household, level of education, etc., may not need to be controlled in sensory studies. Consequently, the higher costs needed to find people with extra qualifications can be avoided.
222
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
(2) Choice of Population Based on Budget and Available Resources Recruiting and having actual consumers participate in a test is one of the main reasons for the relatively high cost of consumer tests. Unless the company has its own test facility, an outside contractor needs to be hired to conduct the consumer tests in order to: Recruit the consumers needed Provide the facility where the test is conducted Administer the test Provide other services. as needed Consumer tests are expensive since the contract research agency has to be paid for these services. The consumer incentives paid to participants for their involvement increase the total costs. In general, most sensory/consumer insights groups that conduct sound consumer tests hire outside contract research agencies or use a consumer pool (administered by themselves or by a contract research agency). The groups that do not have the budget to conduct consumer tests using naive consumers on a regular basis, must find other alternatives to obtain consumer information. The most common one is to use employees. A discussion of issues involved in running consumer tests with employees, using a consumer pool, and an outside contractor follows.
a. Employee Consumer Tests. The most economical alternative to conduct consumer tests is to use company employees. Sensory professionals need to understand the advantages and limitations of this practice. The main advantage in using employees for consumer tests is the low expense incurred in running the test. There are no costs for the recruitment, and facility rental, and possibly no consumer incentives to pay. The main disadvantage of this practice is the risk involved in possibly collecting nonrepresentative and biased information. Employees have definite biases that affect their responses (Resurreccion 1998). First, one hopes that no product developers/chemists and technical employees directly related to the product, process or project be used. They would be the worst participants and they should not be recruited for any consumer test. Regardless, employees have other biases. They know that company products are being tested and/or that information for a company’s project is being collected. This bias may cause these consumers to be less critical of products and thus may not represent the true consumer response. In addition, employees tend to be used frequently in consumer tests and with time do not represent the naive, inexperienced user after frequent participation. Consequently, employees
CHOICE OF POPULATION IN CONSUMER STUDIES
223
should not be used in critical consumer tests for final and critical product decisions, and in preference tests. Some practitioners are totally opposed to conducting consumer tests with employees. While there is agreement that this is not the best option, it is acknowledged that useful information can be obtained from running inhouse consumer tests, especially with budget restrictions. As this author discussed in Chap. 3, it is best to provide some sound information than no information. Test requesters will try to obtain data under other less optimal conditions anyway. In this author’s opinion, employee consumer tests are valid provided that: these tests are conducted in early stages of projects (mainly geared to screen out the worst prototypes based on their acceptance) actual or potential product users are recruited among employees (a product usage questionnaire needs to be completed) employees who are familiar with the project or the product tested, trained panelists and upper management are not asked to participate ideally, administrative personnel with no experience in product testing and without a technical background are recruited
b. Participants from a Consumer Pool. The second more economical alternative available to sensory/consumer scientists to conduct consumer tests, is to build and use their own consumer pool. This consumer pool may be administered by the sensory/consumer insights group or by an outside agency. When the consumer pool is administered by the sensory/insights group, the tests may be conducted in-house, yet employees are not used for the test. This alternative offers a great improvement over the use of employees, since local residents/consumers are recruited and used for testing. The costs are still much lower than conducting the tests through an agency, since no rental facility and recruitment costs are paid. The main expenditures incurred in using a consumer pool are the following. If the sensory/insights group manages the pool: (a) the initial expense of building the pool, (b) the administrative expenses and time involved in managing the facility, the facility’s personnel, and the consumer pool, (c) the resources to execute the test, and (d) the consumer incentives.
If a contract research agency manages the pool: (a) the initial expense of building the pool, (b) the agency’s cost to cover the facility’s
224
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
usehental, and their personnel to manage the test, and (c) the consumer incentives. The disadvantages of this alternative is that consumers participate in consumer tests frequently, and only local opinions are collected. The naivetC of these consumers is questioned after frequent participation. To alleviate this situation, the sensory professional should keep a record of the consumers’ participation and limit their involvement in a calendar year. At the same time, new consumers should be added to the pool on a regular basis to assure a sufficiently large number of naive consumers in the pool.
c. Consumers Recruited Through an Outside Agency. When an agency is hired to conduct a consumer study, the sensory professional needs to be aware of the type of consumer poolbase used by that consumer testing facility and the level of the participants’ naivetC. In addition, the sensory researcher should check on the reputation of that facility to ensure that it completes sound recruitment and test execution. This brief discussion is not to address all the caveats and considerations when hiring an outside agency. Only a few issues related to the recruitment and use of the consumer population is addressed herein. When working with a contractor, it is recommended to include a screening question that addresses maximum participation. Agencies do not pay close attention to this issue, unless it is requested. Hopefully, this practice will eliminate consumers who have participated in a large number of consumer tests. The reality is that there are many consumers who make a living out ofparticipating in consumer tests, and locate consumer facilities in their vicinity to participate in tests on a daily or weekly basis. Those consumers should be excluded from tests. We researchers conducting consumer tests need to understand that our consumers are not truly “naive. Except when a recruitment is specifically geared to recruiting these completely naive individuals (i.e., who have not participated in any testing before), consumer tests are conducted with people who have participated in consumer tests on several occasions, who know about product evaluations and scaling of product attributes. Consumers are motivated by the incentives and want to participate in as many consumer tests as possible. Therefore, as mentioned before, sensory professionals should include some quality control checks to detect and eliminate consumers with frequent participation and who have become experts.
CHOICE OF POPULATION IN CONSUMER STUDIES
225
(3) Special Populations Some sensory/consumer scientists may face the challenges of recruiting and working with special populations based on the test products. The consumer literature has given little attention to this topic. Few professionals face special situations. In addition, many of these researchers working in industry cannot publish or discuss some of these challenges because of confidentiality reasons. This section is meant to only discuss a few examples where special recruitment and test considerations are needed. More importantly, this discussion is meant to make the sensory professionals aware of the challenges that others face, and draw attention to the fact that established protocols and methods need to be modified and adapted to special situations. a. Tests Involving Dual Users. There are some tests that require dual users (e.g., testing of baby products or pet products). One of the users is the consumer who uses/eats the product, i.e., the babykhild or animal, and the other user is the parent or owner of the pet. The response of this second “consumer” is very important, since this adult needs to like the product for the actual consumer (child, pet) to purchase it. The adult assesses some components of the product, while the actual user consumes/uses the product. The adult will focus on some of the sensory attributes of the product and on the response of the actual user of the product. For example, a pet owner cares about the smell, ease of handling the package and product, and the appearance (visual and texture) of the product. A parent also is exposed to the aroma or fragrance, ease of handling the package and product, the appearance (visual and texture) and other sensory characteristics (e.g., skinfeel of a lotion or shampoo, texture of a food perceived when dispensing the product, the actual flavodtexture of baby food, etc.) of the product while givinghndling the product to the babykhild. Ideally, information about these products require the input of both users. Often, the manufacturers of these products tend to focus only on the adult, since they have the purchasing power. Many of these products are developed, and/or improved solely based on the adult’s input. However, it is important that some studies be conducted with the actual user to determine ultimate acceptance. b. Tests of Medications and Drugs. The challenges and difficulties involved in the testing of medications and drugs, and the need to adopt
226
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
common recruitment and testing procedures to these situations should be appreciated. Pharmaceutical companies and physicians are interested in understanding the sensory properties of the drugs they manufacture and prescribe. It is assumed that patients are more likely to maintain compliance with the drug that offers the best sensory characteristics. There are very few publications that have addressed this topic (Silvers 1993; Gerson er al. 1999). In terms of the selection of the population, the topic covered in this chapter, the special challenges include: patients/participants needed qualifications determination of the participants’ health condition required for the study recruitment resources information delivered to patients/participants needed legal documentation and consent forms When working with medications and drugs, the selection of the recruitment criteria, the sources of participants, and the actual recruitment requires special care. Since finding qualified participants may be difficult, a considerable amount of time for their recruitment must be allotted. There are many other challenges involved in the design and actual test execution of medications and drugs. This topic is outside the scope of the discussion herein and will not be addressed.
c. Tests Involving the Elderly. Similar to the tests involving medications and drugs, studies involving the elderly pose many challenges to the sensory/consumer scientist. The main challenges being the design of the test and the data interpretation, since it has been very well-documented that sensory perception and sensitivity change with age (Chauhan er al. 1987; Tepper and Stoerr 1991; Schiffman 1991; Laska 2001). In terms of the selection of the qualified participants, several issues need to be considered: the targeted age bracket other required qualificationshecruitmentcriteria, such as health issues, dietary constraints, lifestyles, etc. the effect of conditions of the elderly on the test; thus the need to recruit for or avoid such conditions (e.g., sickness and health conditions, dentures, etc.) recruitment resources transportation to test facilities for CLT studies
CHOICE OF POPULATION IN CONSUMER STUDIES
227
show rate and attrition motivation
The Assessment of the Actual Consumer Population upon Test Completion Generally, sensory/consumer scientists take adequate care and pay considerable attention to the establishment of recruitment criteria, the actual recruitment and thus the selection of the required population for consumer studies. However little attention is given to the assessment of the actual consumer population that participated in the study, upon test completion. For some studies, one of the a-priori set objectives may include the analysis of the consumer population in more detail, such as the study of special consumer segments that the recruitment deliberately established. In these cases, the consumer population is segmented based on specific qualifications or recruitment criteria, such as age, geographical location, gender, brand usage, frequency of use, income, etc. This author contends that consumer information contains more information than what we extrapolate from the direct general or pre-established segmentation data analyses. Even when the study’s objective does not call for a study of specific segments, sensory professionals should inspect/analyze the data to determine the presence of subgroups based on the actual consumer responses. Minimally, the sensory professional should assess any possible segmentation based on overall liking, or based on all responses to discover segments/ subgroups and patterns by conventional segmentation analyses (Moskowitz et al. 1985; Vigneau et al. 2001). As Moskowitz mentions in this chapter, the individual differences in product acceptance could be traced to factors beyond the conventional geodemographics of age, income, market, and beyond the conventional marketing classifications, such as brand used most often. Rather, there may be groups of consumers with radically different sensory preferences. We sensory professionals and market researchers need to unveil these groups. The study of consumer segments and the business decisions that result from such an assessment are key, as more and more unique products and innovations enter the marketplace. Companies should expand their business by creating these unique products that may only appeal to specific consumer segments. The sensory/consumer scientist must be prepared, and thus have the necessary methodological tools, to respond to this trend and business need, and adequately test the unique products developed for specific consumer segments.
228
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
REFERENCES CARTER, C. and RISKEY, D. 1990. The roles of sensory research and marketing research in bringing a product to market. Food Technol. 44(1l), 160- 162. CHAUHAN, J., HAWRYSH, Z.J., GEE, M., DONALD, E.A. and BASU, T.K. 1987. Age related olfactory and taste changes and interrelationships between taste and nutrition. J. Amer. Dietet. Assoc. 87, 1543-1550. GERSON, I., GREEN, L. and FISHKEN, D. 1999. Patient preference and sensory comparison of nasal spray allergy medications. J. Sensory Studies 14, 491-496. LASKA, M. 2001. Perception of trigeminal chemosensory qualities in the elderly. Chemical Senses 26(6), 681-689. LAWLESS, H.T. and HEYMANN, H. 1998. Sensory Evaluation of Food. Chapman & Hall, New York. McDERMOTT, B.J. 1990. Identifying consumer and consumer test subjects. Food Technol. 44( 11) , 154- 158. MOSKOWITZ, H.R., JACOBS, B.E. and LAZAR, N. 1985. Product response segmentation and the analysis of individual differences in liking. J. Food Quality 8, 168-191. RESURRECCION, A .V .A. 1998, Consumer Sensory Testing for Product Development. Aspen Publishers, Gaithersburg, MD. SCHIFFMAN, F. 1991. Taste and smell losses with age. Contemporary Nutr. 16, 2. SILVERS, W.S. 1993. Comparative taste evaluation of aerosolized formulations of triamcionolone acetonide, flunisolide, and flunisolidewith menthol. Clin. Ther. 15, 988-993. STONE, H. and SIDEL, J. 1995. Strategic applications for sensory evaluation in a global market. Food Technol. 49(2), 80-88. TEPPER, B.J. and STOERR, A. 1991. Chemosensory changes with aging. Trends. Food Sci. Technol. 2, 244-246. VIGNEAU, E., QANNARI, E.M., PUNTER, P.H. and KNOOPS, P. 2001. Segmentation of a panel of consumers using clustering of variables around latent directions of preference. Food Quality and Preference 12(5-7), 359-363. MAXIM0 C. GACULA, JR. The choice of population to be used in consumer studies is pretty much established. Moskowitz and MuiIoz discussed important aspects of the choice and arguments for and against those choices. It is important knowledge for the
CHOICE OF POPULATION IN CONSUMER STUDIES
229
novice going into a career in Sensory Science and consumer testing, and a reminder for experienced analysts and marketers. Nally (1987) reported the implementation of consumer taste panels. In this paper, the benefits of a consumer panel over an in-house panel were discussed, including recruitment procedures, panelist training, and performance monitoring. Such discussions are also covered in the Moskowitz and Muiioz sections. Fortunately, both of them have exhaustively discussed important factors in the choice of population used for consumer testing, and MCG's part is considerably shortened, since I fully concur with their viewpoints and practices. However, a few points need to be added. Statistical sampling is one of the significant contributions of statistics in the study of population. Otherwise it will be impossible to gather information from every member of the population of interest. The various sampling techniques will not be discussed here and depending on the practitioner's background the books by Cochran (1977) and Snedecor and Cochran (1967) are suggested. A unique characteristic of sampling members of a population to be used in consumer testing is that they have been screened on the basis of study requirements, e.g., age, gender, product usage, income, consumer segmentation, etc., as clearly explained by Muiioz and Moskowitz. As a result, one is sampling from a relatively homogenous population and the differences in their likes and dislikes will not differ substantially. Thus a simple random sampling will suffice. In this scenario, sample size plays a major role in obtaining responses that are representative of the population of interest. In simple random sampling, every member of the defined population has an equal chance of being selected to represent the population. What happens with the sampling plan when some selected respondents declined or failed to complete the task? Nothing. Every event that happens should be considered purely random. Since sampling is with replacement, random sampling should continue to replace those who declined to participate thereby meeting the required sample size. The sample size is critical in obtaining reliable estimates of those population parameters estimated from the sample mean and sample variance. It is a waste of time and resources to use in-house panelists. Hence they should never be utilized to gather preference, acceptance, or other forms of hedonic information of various product attributes. The various reasons for this contention are given in the Muiioz and Moskowitz sections. The accepted practice is to recruit outside panelists to form the so-called Research Guidance Panel or some other similar designation. These panels are used in Research and Development environments, such as in Product Optimization studies and other design-of-experiment applications for product development. Moskowitz discussed a relatively new technology which is now a buzzword in both academia and industry, known as "data mining." As the name implies, it is a process of discovering patterns, characteristics, and properties of volumes
230
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
of data to aid business and technical decision making. Berry and Linoff (1999) recently wrote a book on data mining with applications to marketing, sales, and customer support using the SAS software environment. Fayyad e? al. (1 996) edited a book on data mining with emphasis on industrial engineering applications. These books should be a good start to explore data mining for sensory and consumer testing work. At the present time, sensory practitioners and sensory statisticians are tied up with multivariate methods for evaluating the massive volumes of sensory data. It is about time to explore the data mining technology. REFERENCES
BERRY, M.J.A. and LINOFF, G. 1999. Data Mining Techniques For Marketing, Sales, and Customer Support. SAS Institute, Cary, NC. COCHRAN, W.G. 1977. Sampling Techniques. John Wiley & Sons, New York. FAYYAD, U.M., PIATETSKY-SHAPIRO, G., SMYTH, P. and UTHURUSAMY, R. 1996. Advances in Knowledge Discovery and Data Mining. AAAI Press/The MIT Press, Menlo Park, CA. NALLY, C.L. 1987. Implementation of consumer taste panels. J. Sensory Studies 2, 77-83. SNEDECOR, G.W. and COCHRAN, W.G. 1967. Statistical Methods. The Iowa State University Press, Ames, Iowa.
CHAPTER 13 BIASES DUE TO CHANGING MARKET CONDITIONS HOWARD R. MOSKOWITZ Market conditions in the food (and other consumer) industries change all the time. New competitors come into the market, manufacturers attempt to change the market by introducing new products, often with radically different flavors, textures, shapes, etc., and finally consumers tire of products and choose new ones on their next purchase occasion. Change is inevitable. It is the engine driving new product development and marketing. Fortified with new offerings, manufacturers are always ready to pounce on these newly unsatisfied consumers. Advertising may affect the market by calling attention to new features of the product, or to features that were always present, but not necessarily highlighted as a “point of difference” from competitors. In all, the consumer market is dynamic, changing daily in ways that cannot be easily forecasted. Changing market conditions affect sensory scientists in at least two ways. (1) Changing Realities. Over time the panelists may change their internal reference standard. For example, as soups become less and less salty, and as panelists become used to these lower salt levels, the panelists may become increasingly sensitive to salt. They may actually perceive the salt taste to become stronger. The same story holds for sweeteners. To many, artificial sweetened products possess the “real sweet taste.” Sugar sweetened products, in contrast, are described as being heavy and syrupy. One of the most pressing issues of applied product development is to keep up with this change in one’s internal sensory references. One need only witness the enormous problems that ensued when the popular wisdom was to reduce the fat, sugar, and salt in products.
(2) Need for Updated Research Norms. Many sensory scientists use the competitive frame of products as stimuli for their research. As the products in the marketplace change, so does the frame of reference. The norms that researchers develop may change as well. Often researchers want to know what a “high scoring” product does, versus a “low scoring” product. If the product continues to change, and increase in acceptance, then the norms for one year may be outdated the following year, and certainly the drift in the product quality over time will make the norms unstable. The continually changing market conditions make the use of normative data quite difficult, especially if the researcher wants to read the results on a long23 1
232
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
term basis. For example, the author has been involved in several categories that have seen dramatic changes over time. One product category is the pasta sauce market. In the early- and mid-1980s this was an underdeveloped market. The panelists participating in the early studies divided into three segments (low impact thin and smooth; high impact spicy; and chunky, respectively). Over time, and with a proliferation of different products, the segmentation changed from this remarkably simple and straightforward division to a much broader, flavor-based segmentation. A lot of the change had to do with the introduction of new flavors, and the emphasis by manufacturers on flavor type as a key aspect of the product. The hitherto, easy to understand segmentation, became far more complex. For product testing these continually changing market conditions mean that databases must be updated on a regular basis. There is no clear number of years for which a database can be considered "valid." However, a prudent action is to re-test the product category every 2-3 years, in order to ensure that the data remain current. One should test the same products, along with the newly introduced products. Furthermore, if any products have been highly advertised, then they should be part of the new set to be tested, even if they do not command a large market share. They may, in fact, be the vanguard of a new set of products that will soon gain popularity. Furthermore, if the researcher discovers that there is a rapid increase in the purchase and consumption of a particular product or set of products, then it may be well-worth repeating part of the old study, and incorporating the new data into the database. It may be relevant to measure attitudes as well. There are some attitudinal measures instituted in corporate tracking studies. The goal of these studies is to measure the changing consumer response to the marketplace. Most of these tracking studies deal with awareness of brands and product usage. There are large-scale tracking studies that deal with sociological shifts, and their possible impact on products (e.g., the Yankelovich Monitor@).Recently, the author and his colleagues have created a series of tracking studies for particular food products, using conjoint analysis to measure the impact or utility of brand, usage situation, and product description (Crave It!""; Beckley and Moskowitz 2002).
REFERENCES BECKLEY, J. and MOSKOWITZ, H.R. 2002. Databasing the consumer mind: The Crave It!, Drink It!, Buy It! & Healthy You! Databases. Paper presented at the 2002 Meeting of the Institute of Food Technologists, Anaheim, CA.
BIASES DUE TO CHANGING MARKET CONDITIONS
233
ALEJANDRA M. -0Z The dynamic consumer products market changes continuously. In general, most products change the market conditions at either of two levels. (1) At the most basic level, the market conditions are modified because of formula, process and packaging changes of product categories. Existing products are modified regularly due to product reformulation (supplier changes, changes in ingredients, etc.), product improvements, packaging innovations, etc. (2) At the most advanced/dynamic level, market conditions change with product introductions/innovations,such as the marketing of new specialhique products, major product reformulations (e.g., new flavors, fragrances, features, benefits, etc.), and innovative advertisement campaigns which change consumers’ expectations. Examples of key innovations in the past two decades include the introduction of new products/characteristics/benefits,such as chewy cookies, facial tissues with lotion, gourmet/premium food products within food categories such as tea, coffee, salad dressings, snacks, etc., anti-aging facial creams, 2 in 1 shampoo and conditioner products, etc. However, many more innovations have been introduced in the past few years at an accelerated rate (e.g., Magical Jell-O@,chocolate French fries, etc.). The “innovative nature of the innovations” varies. While some companies consider the introduction of a new flavorlfragrance or a color change a product innovation, others modify their products more radically. More striking innovations include major changes and introduction of new sensory dimensions (e.g., new tactile and chemical feeling factor sensations in lotions and body washes, self-heating cans, bath tissue with lotion, body deodorant sheets, dental care gum, 2 in 1 toothpaste and mouthwashes, jelly filled waffles, etc.). This observation is supported by Watzke and Saguy (2001), who report that out of 24,543 new products Ernst and Young and AC Nielsen researched, only 539 were innovative. The larger number of innovations that have been introduced in these past few years, are changing the market conditions more rapidly and radically. This phenomenon is driven by the newly emerged philosophy of “Innovate or Die” (Darlin 1997; Drucker 1999). The above innovations and changes in market conditions modify the consumer’s frame of reference, product knowledge, product preferences, expectations, etc. Therefore, every company should monitor these products,
234
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
consumer knowledge, experience and expectation changes from the business and technical point of view. From the business perspective, it is important to track: (a) the complete product category and how the company’s product performs in that new/modified category, (b) the new product introductions/innovationsto assess their effect on the existing company’s brands and to evaluate new business opportunities, and (c) changes in consumers’ expectations, wants and needs in a product category to be able to develop and reformulate products to meet these consumer needs. The results of such monitoring provide information for business decisions, such as strategies for reformulation, product introduction, advertisement, product distribution, etc. From the technical perspective, modifications in testing methodology are triggered by innovations and new product dimensions. Some of these modifications include, changes in consumer questionnaires, the product targets and references included in tests, etc. Market researchers are generally in charge of tracking the changes in consumer expectations, wants and needs, and in advertising strategies. A sensory/consumer insights group is primarily in charge of tracking product changes. Recently, however, some sensory/consumer insights groups have had the opportunity to work closely with market research or have conducted their own research. In this way, sensory professionals have contributed to building and updating the knowledge base of changes in consumer expectations, wants and needs, as a result of changes in market conditions. Therefore, when studying and discussing modifications in the marketplace, it is important to focus on the three changes taking place: (1) the product modifications themselves (e.g., different or new sensory dimensions, benefits, etc.), (2) the resulting changes in consumer expectations, likes, wants and needs, and frame of reference, and (3) the potential needed modifications in testing methodology.
Sensory professionals may be involved in one or more of the above changes as follows. Changes in Product Characteristics A sensory/consumer insights group has the best tools to monitor, and most importantly to document, product changes. Descriptive analysis is the most
BIASES DUE TO CHANGING MARKET CONDITIONS
235
valuable and useful tool to complete this task. The objective of descriptive analysis is to document the perceived sensory characteristics of exiting and new products. The documentation of market changes in terms of products’ sensory properties can be accomplished in two ways: the comparison and documentation of a change (a) relative to one or a few existing products, or (b) relative to a product category’s database. The latter is the best approach. (1)
Comparison of Change Relative to One or a Few Products This approach is the least desirable, since it offers valuable but limited information. This approach is undertaken when a company has not established a database and a system to continuously monitor the product category of interest. Companies that follow this practice only react to market condition changes, instead of periodically monitoring and documenting the product category’s shifts. The approach involving only a few products consists of the simple evaluation of the new product(s)/innovation(s) relative to the company’s own product. The products’ sensory profiles are compared to assess the differences and similarities between the new and existing products, and the type and magnitude of sensory changes. Although this is a sound approach to measure sensory properties of the existing and new products, it does not allow the study of the complete category’s properties and changes. It does not include a database of all (or the most important) products in the category, as in the case below.
(2) Comparison to a Database This represents the best approach to monitoring and documenting the sensory changes in a product category/market. It implies the development of a set of databases that include information on the product category and the company’s brands. These databases could be instrumental, sensory, or marketing driven and encompass instrumental/analytical measurements, sensory properties, sales, and other relevant marketing information. The sensory database is discussed herein. The same sensory tools (e.g., descriptive analysis) used in approach (1) above are applied. The difference is in the number of products included and the time period covered. While approach (1) consists of the evaluation of only a few, current products, this approach involves two steps: a. the development of the database, and b. the evaluation of new product introductions and the comparison of results to the database.
236
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
a. Building the database The establishment of a database is an investment. A company completes a product category review or appraisal to build this database. The sensory characteristics of the category and/or main products of interest are measured and documented. Often, this is one of the sources of information for companies to study the category and make decisions regarding the new sensory dimensions of future innovations. The reader is referred to the literature on category appraisals for information on approaches and methodology (Moskowitz 1985, 1994; Muiioz ef al. 1996; Moskowitz and Market0 2001). Some authors call this approach benchmarking (Thomson 2002). b. Evaluation of new product introductionshnovations and comparison of results to the database. The impact of a new product introduction is more completely assessed when the sensory characterization of the new product introduction is compared vis-a-vis to the existing database. Among others, the following is obtained: a complete characterization of the sensory properties of the new product introduction the “position” of the new product in the category’s sensory space (e.g., differences and similarities between the existing and new products in the category), and the new sensory dimensions and direction established by the new product introduction.
Changes in Consumer Liking, Wants and Needs The modifications in market conditions must be tracked in order to understand the changes in the consumer’s frame of reference and attitudes. This information can address two important areas. (1) Understanding of Modified Consumer Frame of Reference, Wants and Needs Changes in a product category will change consumer expectations, frame of reference, wants and needs. Product characteristics or benefits once important in a category may change due to the new dimensions offered by the product innovations and highlighted by the corresponding advertising campaigns. This shift may not be apparent immediately.
BIASES DUE TO CHANGING MARKET CONDITIONS
237
Enough consumers need to be familiar with the new products. Also consumers must develop enough familiarity with the new products for a shift to take place. These changes in attitudes, expectations, wants and needs can only be fully explored through qualitative and anthropological tools. The sensory professional involved in this exciting area must become proficient in such methods. Refer to Chap. 15 for a brief discussion of the importance of qualitative research in Sensory Science. (2) Guidance for the Development of New Products To Satisfy New Consumer Needs Successful innovations provide a new research area for companies who wish to develop competitive products with the new dimensions and benefits of such innovations. The support that a sensory/consumer insights group can offer in this area is paramount. The support may include: a. The study and documentation of the consumer’s new wants and needs for product guidance. This information is gathered through the research completed by market researchers and sensory professionals described in item (1) above. b. The documentation of the sensory characteristics of new products and the company’s developed prototypes. This information is obtained through descriptive analysis. Research guidance is provided to product developers for the formulation and optimization of new products.
Changes in Test Methodology Changes in methodology may also be triggered by changes in the market conditions. This occurrence may apply to both consumer and analytical sensory techniques. In consumer methodology, some of the following testing parameters may have to be modified: the selection and testing of the market/product category leader and other products, attributes included in consumer questionnaires, product presentation and usage, concepts tested, type of methods used, etc. In descriptive methodology, parameters that may need to be reassessed and modified may include: selection of targets and controls, references, lexicons, product presentation, evaluation or application, etc. Universal panels that are trained across diverse product categories and intensity references may be less affected than product specific panels (Muiioz and Civille 1998). A considerable
238
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
amount of retraining may be needed for product specific panels to expand their frame of reference because of the new “sensory space” created by the new product introductions. (Refer to Chap. 18 and 20). A Caveat Related to Product Comparisons
A final discussion point related to the implications of changes in market conditions is the comparison of databases and the risks associated with the comparisons. This section has emphasized the possible shift in consumer expectations, frame of reference, wants and needs as a result of innovations. Therefore, care should be exercised in merging historical consumer data with current data, or when comparing consumer responses over time for a category that underwent radical changes. Consumer expectations and the scoring of products change with this shift in the consumer’s outlook. Conversely, the descriptive database created with evaluations from Universal panels using quantitative references (refer to Chap. 18 and 19) are minimally or not at all affected. In this case, with a well-trained and calibrated Universal panel, valid product comparisons can be made regardless of changes in market conditions, product characteristics, relevant attributes, etc. Therefore, professionals who believe in the benefits of developing databases through category appraisals are encouraged to include descriptive information. which can be used when the comparison of consumer results is difficult. REFERENCES DARLIN, D. 1997. Innovate or die. Forbes 159(4), 108-112. DRUCKER, P.F. 1999. Innovate or die. Economist 352, 25-28. MOSKOWITZ, H.R. 1985. New Directions for Product Testing and Sensory Analysis of Foods. Food & Nutrition Press, Trumbull, COM. MOSKOWITZ, H.R. 1994. Food Concepts And Products: Just In Time Development. Food & Nutrition Press, Trumbull, Conn. MOSKOWITZ, H.R. and MARKETO, C. 2001. Selecting products for category appraisal studies. A fewer products do almost as well as many products. J. Sensory Studies 16(5), 537-549. MUAOZ, A.M. and CIVILLE, G.V. 1998. Universal, product and attribute specific scaling and the development of common lexicons in descriptive analysis. J. Sensory Studies 13, 57-75. MUROZ, A.M., CHAMBERS, E. IV. and HUMMER, S. 1996. Amultifaceted category research study: How to understand a product category and its consumer responses. J . Sensory Studies 11, 26 1-294.
BIASES DUE TO CHANGING MARKET CONDITIONS
239
THOMSON, D. 2002. Competitive Benchmarking. ASTM symposium: Preparing your product for the Marathon Run: Product Maintenance. Montreal, Canada. WATZKE, H.J. and SAGUY, I.S. 2001. Innovating R&D Innovation. Food Techno1. 55(5), 174- 188.
MAXIM0 C. GACULA, JR. The importance of “benchmarking” to meet the changes in market demands is clearly indicated by the presentations of Moskowitz and Muiioz. As viewed by the author (MCG), benchmarking as used in consumer product industries is simply the establishment of a reference point for product categories or services. This reference point can be a company’s product, the leading product in the marketplace, or simply benchmarking all products within a category or across categories. The determination of the number of products to be used for benchmarking is a critical part of the decision process. The decision on which products to include should be made by the sensory scientist, marketing researcher, and product developer. In this decision, cost is an important consideration. To reduce the cost of the study, historical data and/or descriptive analysis results can be statistically clustered, i.e., biplot of principal components, allowing the researcher to select two or more products from each cluster for inclusion in the benchmark study. Understandably, a benchmark is not a permanent reference point due to market changes. As experience shows, benchmarking is a continuing process. Products that are not continually benchmarked are likely to fail over time. Consumer product companies, marketing research, and other consumer testing services offer various techniques for conducting benchmark studies. In general, the technique used is tailored to the objectives of the study. Among the techniques is a consumer survey with a questionnaire dealing with the sensory properties of the named products or product category. In such a survey, no actual samples of the products are evaluated. This survey is followed by descriptive analysis of products of interest, and finally by a consumer test. The role of sensory statistics comes into play as we will see below. The relationships among the three databases (survey, descriptive analysis, consumer test) are explored in order to establish benchmark patterns to aid in decision-making. As a review: (1)
Consumer surveys provide information on what characteristics the consumer looks for in a product that can lead to an assessment of preferences and acceptanceAiking, negatives and positives, and current
240
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
norms. Technical information in developing a consumer survey questionnaire can be found in edited publications by MacFie and Thornson (1994) and Meiselman and MacFie (1996). (2) Descriptive analysis provides the intensity of the sensory attributes in a product; it basically answers the question, what is in the product? Applications of Descriptive Analysis methods to various types of products and product categories are found in a book edited by Gacula (1997). See also an ASTM publication by Hootman (1992).
(3) Consumer tests provide information on the degree of liking for the various sensory attributes, directions for product improvements, new findings, preferences, and other acceptability measures. For further information refer to the book by Resurreccion (1998). In some situations, the consumer survey is not necessary. For example, established products, where the sensory characteristics are fully defined and understood, may not need the consumer survey up-front. In almost all cases, the descriptive analysis and the consumer test must be conducted. As stated earlier, benchmarking is a continuous process to prevent "product isolation and disappearance," e.g., as is the case with the formerly famous Breck shampoo and conditioner. The frequency of this process is dictated by the changes in market demands, mostly as a result of new products being introduced, modification of existing products in the marketplace, i.e., "new and improved," or new technical developments related to safety and other health reasons. This is well demonstrated by the experiences of Moskowitz and Muiioz given in the previous sections. REFERENCES GACULA, JR., M.C. (4.).1997. Descriptive Sensory Analysis in Practice. Food & Nutrition Press, Trumbull, Conn. HOOTMAN, R.C. (ed.). 1992. Manual on Descriptive Analysis Testing for Sensory Evaluation. ASTM Manual Series, MNL 13, West Conshohocken, Penn. MACFIE, H.J.H. and THOMSON, D.M.H. (eds.). 1994. Measurement of Food Preferences. Chapman & Hall, London. MEISELMAN, H.L. and MACFIE, H.J.H. (4s.). 1996. Food Choice Acceptance and Consumption. Chapman & Hall, London. RESURRECCION, A.V.A. 1998. Consumer Sensory Testing for Product Development. Aspen Publishers, Gaithersburg, MD.
CHAPTER 14 SAMPLE SIZE N, OR NUMBER OF RESPONDENTS HOWARD R. MOSKOWITZ What is the value of good data? Should the researcher use a budget to control the research costs (obviously yes), and if so, then how much should the budget be? Can cheap research do the job as well as expensive research? And ... if so, why bother with expensive research? Sample size is a key issue in most research, especially insofar as it affects cost and thus potentially research quality. A good understanding of the effects of base size on data quality can go a long way towards making the research cost-efficient and effective. Panelists cost a great deal of money. If the researcher must work with consumers, then anything that the researcher does to reduce the number of consumers or reduce the effort of acquiring each consumer’s data will reduce the cost. Yet, there is the inevitable tradeoff. Research is often done in order to reduce risk, not to increase risk. Many practitioners feel that decreasing the base size of a consumer panel will increase the risk of making an incorrect decision. Furthermore, for many researchers but unfortunately not for all, the correct panelists are better than the incorrect panelists. That is, a large base size of the wrong people is neither as valid nor as useful as a small base of the right people. Unfortunately some researchers feel that the absolute base size itself is critical, and do not look beyond the base size.
The Rush Towards Cheap Research - Consequences of the “Web Mentality” As we move through the early days of this 21st Century we see a new trend emerging. The cost of access to consumers has decreased with the increasing penetration of the Internet. One of the key selling points for web-based research is that the Internet reaches many more people at presumably far lower prices. The market research community, for one, has adopted web-based research wholeheartedly because of this decreased cost. A consequence of this rush is the inevitable degradation of research quality as researchers strive to decrease the cost of panelists. A great deal of the decision to use large base sizes comes from emotional sources. Many researchers are not particularly sophisticated in statistical analysis, and can be “thrown” by esoteric questions. One of these questions is “what is the confidence value, or P value, that two products significantly differ from each other?” Another is “what is the risk that one is actually rejecting a product as being less favored by panelists, when in fact the product performs as 24 1
242
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
well as the gold standard?” Since most researchers are not statistically minded, they opt for the answer that provides the greatest emotional security, and the least cause for anxiety. That answer is to find statistical differences with high base sizes because no one can complain or point a finger at a researcher who finds a significant difference. No one ever got punished by management for being obsessive about base size if the study was conducted within the budgetary constraint. Researchers tend to be more frequently punished for failing to find a difference. It is a natural corollary to this situation that the researcher will increase the base size. All Ratings Are Not Created Equal: Most of the Useful Information Comes from the First Few Data Points or Judgments Let us assume that the researcher has absolutely no information, or at least has no direct consumer information, although there may be previous knowledge of the category, intuition, informed judgment and the like. The first panelist to provide information provides a great deal of such information. The value of the first panelist can be substantial if there is no information. If the researcher’s information is judgment, and if this judgment is given equal weight, then the first judgment provides 50% of the information needed to make an informed judgment. Each additionaljudgment from a panelist provides proportionately less information. The reduced value of the information comes from the ability of the incremental ratings to determine the mean score. The first judgment determines the mean score by 100%. The second judgment determines the mean score by 50 % . The third judgment determines the mean score by 33 % , and so forth. The 100th judgment (from the 100th panelist) determines the mean score by 1%. This insight, provided to the author by his graduate professor (S.S. Stevens) was an astounding revelation to a 23-year old aspiring researcher, especially because of the zeitgeist and technology prowess of the time. The statement and its insight was presented during the course of an informal conversation at Harvard University in the Laboratory of Psychophysics (1967). The scientific community and the industry were awash in newly developed computer programs that could facilitate analysis from lots of panelists in literally the punching of a few instructions on a computer card. Stevens’ insight suggested a radically different way to think about data, concentrating on the information contained therein and the stability of information, rather than concentrating on the variation of data and one’s prowess in manipulating such variation to find differences and establish their likelihood in a probabilistic sense. Given this state of affairs the researcher should be most happy with the first judgment, and show the lowest marginal increase in happiness with the 100th judgment. Yet, if researchers are asked about the base size, most would say that they feel very uncomfortable with a base size of 5 , and very comfortable with
SAMPLE SIZE N. OR NUMBER OF RESPONDENTS
243
a base size of 100 or more. On another occasion ask the researcher to compare levels of subjective comfort of a small scale study comprising 5 panelists versus level of comfort when making a decision based upon one’s own judgment. Many researchers will answer that with such a low base size they feel uncomfortable relying on the data, and that they would rather make the decision on the basis of their own “intuition. ” Many will say that they feel that such a low base size of panelists may, in fact, distort the reality of the situation, and so opt for their own opinion, which in the end poses an even greater business risk.
Remedies To Reduce Risk: Suppressing Noise Versus Averaging Noise There are two ways to reduce risk. One way is to control the experiment
so tightly that the extraneous sources of variation are eliminated, or at least controlled. In this way the researcher can be sure that the aspects of the product being measured are real and valid. Sensory scientists who use well-controlled test booths favor noise suppression, perhaps because at least superficially the suppression of noise appears to be the more professional research strategy. There is no supporting data, however, that show suppression to yield better data than do other methods. The suppression systems may take on many incarnations such as white booths, or isolation panels to prevent one panelist talking to another, hidden delivery system so that the panelist never interacts with the test administrators, etc. Another way to reduce risk cancels out the different sources of bias by testing in a “noisy” situation. “Noisy” in consumer research may mean testing at home to allow the product to be consumed under different conditions, with different body states. Or “noisy” may consist of choosing a heterogeneous consumer panel in different markets, with different purchase habits, etc. The goal when averaging noise is to obtain a true reading of the product by ensuring that there are no confounding effects, or that the confounding effects themselves have been cancelled. Most consumer researchers opt to reduce risk by testing the product in different conditions, with different people, etc. The first order of business is to ensure that the average rating really represents an average of the “full cross section” of the population.
ALEJANDRA M. h 0 Z For sensory scientists, sample size is a critical parameter to consider when designing and executing any sensory test: discrimination, descriptive and consumer tests.
244
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
In selecting the sample size/number of participants, the following needs to be considered: statistical and financial implications, and representativeness and availability of panelists; specifically: (1) The sample size has statistical and financial implications in all sensory tests (2) The consumer test sample size determines how representative that consumer population is in that test (3) The availability of panelists in internal tests affects the sample size.
Financial Considerations The chosen sample size for a test has financial implications. Large sample sizes translate into expensive tests. This applies to both consumer and discrimination/analytical tests. (1) Consumer tests. Consumers must be compensated for their participation. Therefore, the more participants in the test, the more expensive the test. The impact of sample size (number of consumers) on the total test’s cost is greater in actual consumer tests than in consumer employee tests, because:
smaller sample sizes are used in employee consumer tests than in actual consumer tests. This is common, since employee consumer tests are (or should be) used only for screening purposes or in preliminary project phases. The sample size (number of internal consumers) that is acceptable in the field ranges from 40 to 80/100 consumers. actual consumers are paid for their participation, while employees are not. Therefore, some companies consider internal/employee consumer tests less expensive. With budget restrictions, a sequential testing approach is another method that can be used to optimize the sample size in consumer testing. The advantage of this approach is that less money is spent on consumer recruiting and incentives. The disadvantage is that there may be more expenses in the participants’ scheduling and in planning the logistics for the test. In this procedure a small number of consumers (e.g., n = 40 or 50) is recruited. The data are analyzed and interpreted. If differences among products are found, the testing can stop. If no product differences are found (and differences are expected), more consumers are recruited to participate in the test. The data are merged and reanalyzed based on the new sample size.
SAMPLE SIZE N. OR NUMBER OF RESPONDENTS
245
(2) Discriminative/analyticalTests.Most of the discrimination tests are conducted with employees (although, recently, an increasing number of sensory/consumer insights groups are also using consumersAocal residents for these tests). The reverse is true for descriptive tests. Currently, most companies use local residents as descriptive panelists, while a few companies use employees. Financial implications are taken into consideration when deciding on the type and number (sample size) of participants to be used for discriminative tests (i.e., employees or local residents). Often, the only factor that is considered in this assessment is the actual incentives paid to participants. Because non-employees are remunerated, the tests are more expensive when local residents are used, than when employees participate. However, other cost considerations must be taken into account when using employees. These costs include overhead costs, and the “cost” of having employees away from their main job responsibilities when participating in sensory tests. Therefore, with financial restraints, the use of a small number of panelists is favored.
Statistical Considerations: Power of the Test
From the statistical point of view, the sample size affects the power of the test. The smaller the sample size (number of participants) in the test, the lower the power of the test (ability to detect differences where differences exist). Researchers infrequently calculate the power of the test given the test characteristics. Neither is the required sample size based on the critical difference, estimated variability of the data, desired power and Q calculated frequently. It is recommended that sensory professionals use sample size calculations in order to determine the required sample size for a test (Gacula and Singh 1984; Gacula 1993). Increasing the sample size makes the test more expensive. Therefore, sensory scientists need to decide what is best for their situation: (1) reaching an adequate power level and incurring higher costs, or (2) compromising on a less statistically powerful test and using fewer participants. The reader is referred to publications addressing test sample sizes and their implications. Some of these include the relevant discussion on sample size, power and risks involved in preference tests in the ASTM Standard Guide for Sensory Claim Substantiation (ASTM 1998), and the discussion on number of subjects and the resulting statistical power in research tests by Kraemer and Thiemann ( 1987).
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
246
Representativeness Considerations In consumer tests, the objective is to study people’s responses. It is desirable to extrapolate the findings of a consumer test to the population of consumers who purchase and use the company’s products. Therefore, in order to be confident that the consumer population of interest is well-represented in the test, a sufficiently large sample is required. Does a sample size of 20 - 50 consumers adequately represent the consumer population of interest? Larger sample sizes may be needed to assure a representative consumer population in the test.
Availability of Employees
Often, the availability of employees is the factor affecting the test sample size. Even though sensory professionals may be aware of the appropriate sample size for a given test, they may not be able to recruit the required number of participants, due to unavailability and lack of commitment. In these cases sensory professionals may: run tests with a smaller number of participants than required, and thus sacrifice either the power or the integrity of the test, or be forced to use local residents for routine analytical tests (e.g., discrimination and descriptive tests).
Summary (1) Sensory professionals feel comfortable using sample sizes of 80 - 200 consumers. A small number is adequate for preliminary project phases, screening studies to eliminate gross negatives, and for studies with large differences in intensity and/or liking. Frequently, these are the objectives of employee consumer tests, when the sample sizes are smaller.
(2) The sample size used in descriptive tests varies. Most sensory professionals use 8 - 14 trained panelists. However, many researchers use less panelists, if they are highly trained. Chambers ef al. (1981) found that a panel of three highly trained, experienced individuals performed at least as reliably and discriminated among products as well as a group of eight less-trained individuals. Those results suggest that the use of well-trained individuals could reduce the number of panelists necessary for sensory testing. (3) Sensory professionals are becoming more knowledgeable on the sample size requirements for discrimination tests. The statistical parameters of 01 and fl
SAMPLE SIZE N. OR NUMBER OF RESPONDENTS
241
are assessed to determine the sample size needed for difference or similarity tests (Meilgaard et al. 1999).
REFERENCES ASTM. 1998. Standard E 1958 - 98: Standard Guide for Sensory Claim Substantiation. In: ASTM Annual Book of Standards. ASTM, West Conshohocken, Penn. CHAMBERS, E. IV., BOWERS, J.A. and DAYTON, A.D. 1981. Statistical designs and panel training/experience for sensory science. J. Food Science 46, 1902- 1906. GACULA, JR., M.C. 1993. Design and Analysis of Sensory Optimization. Food & Nutrition Press, Trumbull, Conn. GACULA, JR., M.C. and SINGH, J. 1984. Statistical Methods in Food and Consumer Research. Academic Press, San Diego. KRAEMER, H.C. and THIEMANN, S. 1987. How Many Subjects?: Statistical Power Analysis in Research. Sage Publications, Newbury Park, CA. MEILGAARD, M., CIVILLE, G.C. and CARR, B.T. 1999. Sensory Evaluation Techniques, 3rd Ed., CRC Press, Boca Raton, Fla.
MAXIM0 C. GACULA, JR. In practice, we do not verify the validity of the sample size (base size) used in a study for the following obvious reasons: It is impractical since the experiment has already been done, is costly, and our heavy reliance on the correctness of the variance used in the estimation of sample size (N). Results of sample size calculations are generally an approximation. Detection of small differences for an attribute or response variable with large variability results in large N which is prohibitive due to cost. On the other hand, using small N can create risk of getting biased results. First let us review the calculation of N so that we can find out the parameters that are critical in the determination of N. (1) Size of difference to be detected (D). This is quite difficult because of several types of rating scales used in practice, varying number of categories on a scale, and other factors as discussed in Chap. 11.
248
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
(2) Estimate of variance or standard deviation of the difference (S) in the case of the paired comparison test. This,quantity is generally obtained from historical data.
(3) Prescribed significance level of the test for detection of the difference, the most common levels being P = 0.01, 0.05, 0.10, respectively. In practice, these three parameters are manipulated in order to balance the cost and risk of obtaining the wrong conclusion. As a result, N must be an approximation. The simplest well-known formula for calculating N is
N = [(Z,
+ 2,)’
S2] / D2
where Z is the so-called normal deviate obtained from a standard normal distribution table; in this formula, Z, and Z,are Z values corresponding to Type I (significance level) and Type I1 errors, respectively; S is the standard deviation of attribute in question; D is the specified difference to be detected at the prescribed Type I and Type I1 errors. Depending on the experimental cost and risk that can be tolerated, the values for Z and D can be specified. Suppose we have D = A - B = 0.75 and S = 1.5 on a 9-point rating scale, and want a Type I error of P = 0.05 and Type I1 error of P = 0.10 (Power = 1 - Type I1 = 90.0%). The calculation of power of the test as applied to claim substantiation is given in Gacula (1993). Referring to a standard normal curve table or from a t-table (infinite degrees of freedom), Z values corresponding to 0.05 and 0.10 are, respectively, 1.960 (Two-sided null hypothesis) and 1.282 (Type I1 is always one-sided). Then
N = (1.960
+ 1.282), (1.5), / (0.75)2 = 42 panelists.
If we desire to detect a smaller difference, say, D = 0.25 and maintaining the same confidence level and power of the test, then
N = (1.960
+ 1.282), (1.5)2 / (0.25),
= 378 panelists.
The subjectivity of the determination of sample size is indeed illustrated by the above example. If the research team is convinced that the selected values of S and D are reasonable, then cost can be factored into the choice of sample size. Another factor to be considered is which type of error is more important to control in the study, Type I or Type I1 error. If both types of errors are equally important, then we want the size of Type I and Type I1 to be close to each other. As shown by the formula given above, the values of Z, and Z, can be appropriately chosen to answer this question. For further reference on sample
SAMPLE SIZE N, OR NUMBER OF RESPONDENTS
249
size and power of the test, see Gacula and Singh (1984), Kraemer and Thiemann (1987), and Gacula (1993). Muiioz presented various considerations under practical situations in the choice of sample size. Sample size used by sensory practitioners have ranged from 40 to 100 respondents. The suitability of these ranges can be evaluated by computer simulation as given below. The author (MCG) conducted a simple simulation study using SAS (1990) in order to obtain simulated difference D (D = A - B) between two populations using a standard deviation of 1.57 on a 9-point hedonic scale, where the value 1.57 was based on historical consumer test data. The purpose of the simulation study was to provide a realistic picture for the practitioners, of the important role of S and D in the calculation of appropriate sample size. The simulation used specified values of D from 0.0 to 1.O, respectively. Only the results for D = 0.0, 0.2, 0.4, 0.6, 0.8, 1.O are reported, which are given in Tables 14.114.6, respectively. The significance level is based on the paired t-test. Table 14.1 shows the results for simulating a population of differences between two products with an assumed mean difference of D = 0.0 and a standard deviation of 1.57. The first simulation run used N = 50, the second simulation run used N = 100, up to N = 1000. As expected, none of the sample differences reached statistical significance. However, notice the result for N = 50 with a mean difference of -0.23 and P = 0.2208, which shows directionality in favor of the second sample and thus would be in disagreement with the true difference of zero. This result illustrates that in practice, even for very similar products, we may need at least N = 100 panelists to test the null hypothesis. In this table, the conclusion would be the same whether using 100 or over 100 panelists. Thus cost consideration can be factored into the choice of sample size. Notice that one can obtain a mean difference of -0.23 or 0.05 from this population solely as a result of sampling variation. The simulation experimental results may shed light to the various sample sizes stated by Muiioz, and Moskowitz’s statement “A great deal of the decision to use large base sizes comes from emotional sources.” An application of the type of result shown in Table 14.1 is in claim substantiation for parity. This application should provide confirmation of parity obtained with low cost, and provides information on the level of risk, again obtained with low expenditure. To proceed, the parameter value S is obtained from the result of a consumer test for product parity, and the prescribed significance level is set. Then conduct a simulation study using D = 0.0 and the result should be similar to that in Table 14.1. This approach also reduces the subjectivity of sample size calculation. It also answers the problems and viewpoints expressed by Moskowitz on risk and Muiioz on sequential testing. Given this type of analysis, it is predicted that within the next decade, the
250
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
technology of computer simulations will become an integral part of Sensory Science and consumer testing practices. TABLE 14.1. SIMULATION RESULTS FOR SPECIFIED DIFFERENCE D =O.O.
For D = 0.2, we need a minimum N = 180 to detect a significant difference (Table 14.2). Notice that the simulated differences start to stabilize near D = 0.2 as N increases. For D = 0.4, we need N = 80 (Table 14.3); for D = 0.6, we need N = 60 (Table 14.4); for D = 0.80, we need N = 40 (Table 14.5) and finally for D = 1.O, the minimum N is also 40 (Table 14.6). Thus, using the same standard deviation, the sample size needed decreases with increasing magnitude of difference to be significantly detected. However, one may also consider the closeness of the simulated mean difference from the true difference D. For example, for the results given in Tables 14.5 and 14.6, the
SAMPLE SIZE N, OR NUMBER OF RESPONDENTS
25 1
choice of N = 140 may be a better choice for reducing risk and could easily be acceptable to the research team. Results in these tables provide information for sensory scientists and others on the behavior of N as D varies. This information should be useful in making decisions in claim substantiation studies where the choice of N is critical, and in the control of cost and risk. At the NBC Network (Davenport and Shusman 1991), they require a minimum sample size of 500 panelists per cell in a parity claim study and 300 panelists for superiority claim. Notice the low risk of using 500 panelists as shown in Table 14.1.
TABLE 14.2. SIMULATION RESULTS FOR SPECIFIED DIFFERENCE D = 0.2
Simulated mean difference
Sample size
N
Significance level
-0.09
20
0.7661
0.21
40
0.3446
0.26
60
0.1635
0.39
80
0.0149
0.35
100
0.0330
0.32
120
0.0350
0.40
140
0.0050
0.42
160
0.0012
0.45
180
0.0003
0.43
200
0.0003
252
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
TABLE 14.4. SIMULATION RESULTS FOR SPECIFIED DIFFERENCE D = 0.6
Simulated mean difference
Sample size N
Significance level
0.31
20
0.3309
0.61
40
0.0078
0.66
60
0.0006
0.79
80
o.oO01
0.75
100
0.0001
0.72
120
0.0001
0.80
140
0.0001
0.82
160
0.0001
0.85
180
0.0001
0.83
I
200
I
0.0001
SAMPLE SIZE N, OR NUMBER OF RESPONDENTS
253
TABLE 14.6. SIMULATION RESULTS FOR SPECIFIED DIFFERENCE D = 1 .O
Computer simulation techniques, which are cheap to conduct, can be used to validate doubts in consumer test results. Based on the result of a completed test, the variance and the mean difference would be the parameters to be used in the computer simulation study. Whatever the result is, a better decision can be made on the choice of the next step to follow; such a decision is factual and made on a scientific basis. A sample simulation SAS program is given in Table 14.7. In this program, the equation for each difference D is given by
D
= p,,
+ sqrt(variance) * rannor(seed)
where p,, is the mean difference of interest and variance is the variance obtained from historical data. The rannor SAS function returns an observation (variate) generated from a normal distribution with mean zero and variance 1. To use this program, the sensory scientist needs only to enter two values: mean difference and the variance. It is educational for the sensory scientist to vary these values, run the program, and study the results. Moskowitz presented two ways of reducing risk of obtaining a wrong experimental conclusion: (a) suppressing "noise" via a controlled experiment and (b) the non-suppression of "noise" by conducting the experiment under normal product use. The situations in (a) and (b) affect parameters D and S in the sample size calculation. One can obtain the same mean product difference in (a) and (b), but their variances may differ, and consequently would have different
254
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
sample size. As a follower of Total Quality, depending on the purpose of the experiment, the author (MCG) recommends the second way (b) for reducing risk. TABLE 14.7.
SAS PROGRAM CODE FOR THE SIMULATION *prog chaptl4. sas ; data normal; retain seed 1OOOO; do i = 1 to20; d l = 0.00 + d2 = 0.20 + d3 = 0.40 d4 = 0.60 + d5 = 0.80 d6 = 1.00
sqrt(2.47)*rannor(seed); sqrt(2.47)*rannor(seed); + sqrt(2.47)*rannor(seed); sqrt(;!.47)*rannor(seed); + sqrt(2,47)*rannor(seed); + sqrt(2,47)*rannor(seed);
if i = 1 then do; seed = 10000; end; output; end;
run; proc means mean n std prt maxdec = 3; var d l d 6 ; title”Mean d l = 0 d2 = .2 d3 = .4 d4 = .6 d5 = .8 d6 = 1.0 seed = 10000”; run;
REFERENCES DAVENPORT, K. and SHUSMAN, E. 1991. NBC Guidelines: Foodheverage preference claims research and documentation. Proceedings, NAD Workshop 111, Advances in Claim Substantiation (April 29-30), pp. 61-65. Council of Better Business Bureaus, New York, NY. GACULA, JR., M.C. and SINGH, J. 1984. Statistical Methods in Food and Consumer Research. Academic Press, San Diego. GACULA, JR., M.C. 1993. Design and Analysis of Sensory Optimization. Food & Nutrition Press, Trumbull, Conn. KRAEMER, H.C. and THIEMANN, S. 1987. How Many Subjects?: Statistical Power Analysis in Research. Sage Publications, Newbury Park, CA. SAS INSTITUTE. 1990. SAS Language: Reference, Ver. 6, 1st Ed. SAS Institute, Cary, NC.
CHAPTER 15 THE USE AND CAVEATS OF QUALITATIVE RESEARCH IN THE DECISION-MAKING PROCESS HOWARD R. MOSKOWITZ Qualitative research presents a marked departure from the traditional quantitative approach espoused by sensory scientists. Sensory Science, however, tracing its heritage to descriptive analysis, readily appreciates qualitative research, especially when it comes time to identify what terms describe a product. Indeed, the Flavor Profile (Caul 1957), one of the first descriptive systems, can be said to be primarily qualitative in nature, because the terms emerged from discussion and products were then rated on their intensity during the course of open discussions among the panel participants. The Flavor Profile did not use statistics, but rather relied upon discussion and consensus. Yet, over time sensory researchers wedded to the Flavor Profile Method began to add statistical analyses. These additions would thus make the method more powerful, and more amenable to methods, yielding a new class of insights. The Flavor Profile represents one direction, and a minor direction in Sensory Science, which in its growth over the past 50 years has evolved from the pure qualitative model to a mixture of qualitative and quantitative aspects. In their quest to achieve scientific recognition and acceptance, sensory scientists primarily have used quantitative, not qualitative techniques. The emphasis has been upon methods to measure stimuli, represent those measurements graphically (e.g., relationships among attributes), and then perform statistical analyses on these data (e.g., to describe the relations; to identify differences among products, etc.). Qualitative analysis, however, encompasses far more than the development of descriptive language. Qualitative analysts probe for reasons behind the panelist’s opinions, whether these opinions are expressed on numerical ballots, or in focus groups, respectively. Much of the development of qualitative research has come from market research. In recent years, Sensory Science has modeled itself a great deal on market research, including the use of qualitative procedures. The qualitative approach uses focus groups or in-depth interviews, and lately, observational techniques, in order to understand what makes a product “tick.” The assumption is that through a structured interview with the consumer, such as an in-depth interview, or through a discussion with a group, the researcher can uncover aspects of the product that might otherwise be missed through conventional questionnaires. The qualitative interview enables the 255
256
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
researcher to create a description of the product from the consumer’s viewpoint (McQuarrie and McIntyre 1986). In contrast, quantitative research works with profiles of ratings, and attempts to understand the responses to the product from a purely numerical analysis. Qualitative researchers and quantitative researchers therefore look at the product from different perspectives. It is not a matter of the number of panelists that one uses, with qualitative researchers using far fewer, but rather what one seeks to learn about the product by interviewing the panelist. There is an enormous amount of literature on qualitative research, ranging from creativity to the conduct of focus groups. For instance, product innovation can use creativity workshops (Geschka 1986). This type of insight-driven innovation, among other methods is important for technology companies that must maintain a lead (McGuinness 1990; Gupta and Wilemon 1996). However, one needs only to talk to sensory practitioners to discover that they too are beginning to recognize the importance of innovation and creativity in the domain of sensory research. Qualitative research methods provide some of these innovative approaches. In recent years sensory scientists have begun to incorporate qualitative methods in their approaches. The early qualitative methods were used to develop lists of attributes for descriptive analysis (Stone and Side1 1985; Von Sydow ef al. 1974). The qualitative methods used for attribute development were fairly rudimentary by today’s standards. Other methods such as laddering have been introduced in order to introduce more rigor in qualitative analysis. Laddering comprises a structured approach to identify underlying relations existing in the consumer’s mind. The panelist is interviewed in a systematic fashion, with specific additional questions following each answer, until the interviewer can elicit a chain of answers, one leading to the next. The verbatims, comprising a recorded string of questions and answers, provide significant insight into what aspects of the product are closely correlated to each other. In addition, analytic tools have now been developed to work with these verbatims in order to construct a “map” showing how the sequence of phrases can be related to each other. The map allows the researcher to identify “root” ideas, and from these root ideas then discover the secondary ideas. Another approach is called by various names, such as ethnography, incontext, or observational research (Ciccantelli and Magidson 1993). In this type of research there may not even be a discussion. Rather, the researcher watches the way the panelist interacts with the product. From this interaction the researcher begins to understand what aspects of the product the panelist finds most acceptable, what aspect the panelist changes or would like to change, etc. The researcher may even “live with the panelist,” either for a short or a long period of time, to understand in depth how the panelist uses the product in daily life. Observational research is not a new idea, although it is gaining in popularity due to its formalization by anthropologists, who have migrated into
QUALITATIVE RESEARCH IN THE DECISION-MAKING PROCESS
257
the commercial realm. For many years, product manufacturers have used test kitchens, and invited panelists to use these kitchens to prepare products. By videotaping the panelists, the corporate researchers have begun to understand how the panelist interacts with the food, in a way that quantitative research could never provide. What is important about the acceptance of qualitative research tools is the recognition by the sensory researcher that there is more to understanding the panelist-product interface than simply testing, using inferential statistics, and building mathematical models. Qualitative research forces the sensory scientist to reach beyond testing and statistical reports. Qualitative research forces the sensory scientist to interact with the other members of the development and marketing groups, to understand the consumer in a new way, and to broaden the horizons of what is appropriate knowledge. Qualitative research breaks the sensory researcher out of the mold of rigid test design, rigid test execution, rigid analysis, and rigid reporting, enabling the sensory researcher instead to confront a whole new aspect of the consumer. Recently, the author and some colleagues have combined the ethnographic and qualitative methods into a procedure for new product development, that encompasses both the freedom of qualitative insight and the rigor of quantitative analysis (Moskowitz er al. 2002). Despite all the positive aspects of qualitative research it is important to point out some of the caveats that must be obeyed. Qualitative research is often best used, but not always and not only, for hypothesis formation. Results from qualitative research cannot be projected to the entire population of consumers. Despite the advances in qualitative research, the method is more art than science. There are biases in the interview, limits to the types of analyses that can be done. It will be interesting to see how sensory researchers adapt qualitative research to product evaluations, and whether from their own history and culture can the sensory researchers add a new dimension to qualitative research.
REFERENCES CAUL, J.F. 1957. The profile method of flavor analysis. Advances In Food Research, 1-40. CICCANTELLI, S. and MAGIDSON, J. 1993. Consumer Idealized Design: Involving Consumers in the Product Development Process. J. Product Innovation Management 10, 341-347. GESCHKA, H. 1986. Creativity workshops in product innovation. J. Product Innovation Management 3, 48-56. GUPTA, A. and WILEMON, D. 1996. Changing patterns in industrial R&D management. J. Product Innovation Management 13, 497-5 1 1 .
258
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
McGUINNESS, N. 1990. New product idea activities in large technology based firms. J. Product Innovation Management 7, 173-185. McQUARRIE, E.F. and MCINTYRE, S.H. 1986. Focus groups and the development of new products by technologically driven companies: some guidelines, J. Product Innovation Management 3, 40-47. MOSKOWITZ, H.R., FLORES, L., BECKLEY, J. and MASCUCH, T.C. 2002. Crossing the knowledge and corporate boundaries to systematize invention and innovation. Paper presented at the ESOMAR (European Society Of Marketing Research) Congress, Barcelona. STONE, H. and SIDEL, J.L.H. 1985. Sensory Evaluation Practices. John Wiley & Sons, New York. VON SYDOW, E., MOSKOWITZ, H.R., JACOBS, H.L. and MEISELMAN, H.L. 1974. Odor-taste interaction in fruit juices. Lebensmittel, Wissenschaft und Technologie 7, 18-24. ALEJANDRA M. -0Z
Focus Groups, Interviews and “Beyond”
Traditionally, qualitative research has been a tool used by marketing/market research and other fields, such as promotion, sociology, anthropology, etc. (Stewart and Cash 1985; Jacob 1988). This was not a technique used by sensory/consumer scientists in the past. Several of the books in sensory evaluation do not cover qualitative research as one of the methods used in sensory testing (Amerine et al. 1965; Stone and Side1 1993). Qualitative research began to be used in the 1980s by some sensory practitioners. Very soon thereafter, the use of this technique grew in the field (Marlow 1987; Chambers and Smith 1991; Casey and Krueger 1994; Lawless and Heymann 1998; Resurreccion 1998). The most common qualitative methods used by sensory professionals are focus groups (and all the variations of group sessions) and one-on-ones or interviewing techniques. Currently, most of the sensory practitioners use focus groups and many utilize one-on-one interviewing. Many sensory/consumer scientists have been trained as moderators and therefore conduct focus groups themselves. However, most practitioners use interviewers hired or trained to complete the one-on-ones because of the time involved. This chapter focuses on the most common qualitative research techniques used by sensory/consumer scientists, which are focus groups and interviewing techniques. However, it is important to mention that sensory professionals are getting involved in observational approaches; e.g., ethnography (Perry 1998; Woodland 2003) and the application of other less traditional qualitative methods,
QUALITATIVE RESEARCH IN THE DECISION-MAKING PROCESS
259
such as projective and elicitation techniques (Urbick 2003). This author acknowledged this as an exciting new research area for sensory professionals in Chap. 1. Sensory professionals are recognizing that the true understanding of consumer attitudes, wants and needs cannot be completely accomplished within research facilities and focus group rooms. Researchers involved in consumer testing are aware that consumers - although sufficiently motivated to share information - provide abundant, yet somewhat narrow information through interviewing and group discussions (Muiioz and Everitt 2003). Through anthropological, projective and elicitation approaches a researcher has the ability to see the world through another’s eyes, to hear the rich stories and anecdotes, and to interpret indirect consumer responses that participants often do not express in traditional interviews (e.g., through collages, memory associations, etc.). Interacting with real, living, breathing people, not abstract categories or pieces of data, researchers are able to uncover subtle patterns for a deeper understanding of motivation and behavior. From this direct contact come insights that help researchers investigate real, if unarticulated, needs (Perry 1998; Astrom et al. 2003; Urbick 2003). Anthropological research and the application of projective and elicitation techniques, because of their exploratory and holistic nature, open up new avenues of thinking and understanding of people’s attitudes, beliefs, behaviors, etc. The new and exciting application of these techniques by sensory professionals, provide a new venue to gather and understand consumer unarticulated wants and needs (Muiioz and Everitt 2003). Qualitative Research: A Market Research or a Sensory Testing Tool? Sensory practitioners have to be cautious with the design and use of qualitative research to avoid any internal conflicts with Marketing and Market Research. Since these tools have been traditionally used by Market Research and Marketing professionals, some of these individuals/groups perceive that sensory practitioners are stepping into their territory, when using qualitative research. Therefore, sensory practitioners should ensure that the objective and the scope of the qualitative tests remain in the sensory realm. See Chap. 4 for more details on the differences between sensory and market research objectives and tests. The Uses and Misuses of Qualitative Research by Sensory Professionals This brief discussion focuses on the two main qualitative techniques currently used by sensory/consumer scientists: focus groups and interviewing techniques. It is not intended to cover the mechanics of these techniques. The reader is referred to the literature in the area for specifics on the design and execution of these tests (Marshall and Rossman 1989; Stewart and Shamdasani
260
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
1990; Krueger 1994; Lawless and Heymann 1998; Resurreccion 1998). Caveats, uses and misuses are covered in this discussion. Qualitative research is widely used in the sensory field in the following applications: In-depth study of a product category, Idea generation, Study of consumer perceptions of products and services, In-depth study of the attitudes towards and perceptions of new products and their sensory characteristics, Study of perceived differences and similarities among products, Discussion/generation of consumer vocabulary for the design of consumer questionnaires, Information on product packaging, Product serving (or application) and preparation procedures, Attribute and product information useful for the design of quantitative studies, Interpretation of quantitative results, Consumer perceptions of and attitudes towards packaging issues, and Quick assessments/screening of products prior to quantitative tests. The applications of qualitative research techniques have also been discussed by Lawless and Heymann (1998) and Resurreccion (1998). Sensory/consumer scientists will and should continue using quantitative consumer research methods. Ultimately, quantitative information is the tool required to make product decisions. However, qualitative research techniques should be used more widely due to the advantages and unique characteristics they offer. Qualitative research allows the researcher to explore the reasons behind the consumer’s responses, opinions and attitudes (the “whys and the whats”) through direct interaction with the consumer. In addition, immediate feedback is obtained in qualitative studies. The researcher is able to probe for additional information, clarify responses and explore other points/issues. The information obtained is expressed in the consumer’s own words without the researcher’s input and use of technical terms. Depending on the study objective, the sensory professional decides which technique, qualitative or quantitative research, is the more appropriate. Quantitative testing offers the quantitative data, but does not allow the freedom to probe for additional information. Therefore, in many projects, the use of both qualitative and quantitative techniques is the best approach for a full understanding of products and consumer responses (Rossman and Wilson 1985; Muiioz er al. 1996). If both are used, the sequence of the tests must be determined.
QUALITATIVE RESEARCH IN THE DECISION-MAKING PROCESS
26 1
Another very important advantage of qualitative techniques, such as focus groups, is the ability to have consumers interact with one another. This characteristic is particularly useful in idea generation, development of questionnaires, understanding of consumer terminology, discussion of new ideas, concepts, product characteristics, etc. In addition, qualitative research provides a relatively quick turnaround time of results, and the flexibility to change the interview/discussion midstream. Among qualitative techniques, focus groups are generally more widely used by sensory professionals, since the interaction among consumers is desired in many of the sensory projects and applications. One-on-ones are used when independent responses are needed and when consumers need the one-on-one interaction with the interviewer, as in the exploration of personal issues, in children’s tests, etc. There are some misuses of focus groups in the sensory field. The following practices should be avoided: (1) Conducting few groups per study, (2) Collecting quantitative data in focus groups, (3) Generalizing information and making decisions exclusively based on focus group information (in focus groups there are biases, the population may not be totally representative of the true consumer because participants are more assertive, the consumer database is small, and the outcome is dependent on the moderator and the interaction among participants), (4) Introducing biases to the group when product/process information is known, or by a poor performance by the moderator. Some of the misuses of one-on-ones are: (1) Use this technique exclusively as a data collection tool (i.e., read the questionnaire to consumers and record responses), instead of conducting an interview to explore consumer responses, probe, etc., (2) Recording answers without any further probing and not developing any rapport with consumers, (3) Use a small sample size if quantitative information is in fact desired. Qualitative research techniques add a wealth of information and value to sensory quantitative data. The proper execution of these tests and the proper use of this information provide insights not generally obtained through quantitative information.
262
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
REFERENCES AMERINE, M.A., PANGBORN, R.M. and ROESSLER, E. 1965. Principles of Sensory Evaluation of Food. Academic Press, New York. ASTROM, A., LANGTON, M. and ANDERSSON, H.E. 2003. Measuring the implicit nature of consumer needs and expectations and translating them into product specifications. “2003: A Sensory Revolution”, 5th Pangborn Sensory Science Symposium, Boston. CASEY, M.A. and KRUEGER, R.A. 1994. Focus group interviewing. In: Measurement of Food Preferences, (H.J.H. MacFie and D.M.H. Thomson, eds.) Blackie Academic and Professional, London. CHAMBERS, E. IV. and SMITH, E.A. 1991. The use of qualitative research in product researchand development. In:Sensory Science Theory and Applications in Foods, (H.T. Lawless and B.P. Klein, eds.). Marcel Dekker, New York. JACOB, E. 1988. Clarifying qualitative research: A focus on tradition. Educational Researcher 17, 16-24. KRUEGER, R.A. 1994. Focus Groups: A PracticalGuide for Applied Research, 2nd Ed. Sage Publications, Newbury Park, CA. LAWLESS, H.T. andHEYMANN, H. 1998. Sensory Evaluationof Food, Principles and Practices. Chapman & Hall, New York. MARLOW, P. 1987. Qualitative research as a tool for product development. Food Technol. 41(11), 74, 76, 78. MARSHALL, C. and ROSSMAN, G. 1989. Designing Qualitative Research. Sage Publications, Newbury Park, CA. MUNOZ, A.M., CHAMBERS, E. IV. and HUMMER, S. 1996. A multifaceted category research study: How to understand a product category and its consumer responses. J. Sensory Studies 11, 261-294. MUNOZ, A.M. and EVERITT, M. 2003. Non-traditional consumer research methods. Workshop presented at the “2003: A Sensory Revolution”, 5th Pangborn Sensory Science Symposium, Boston. PERRY, B. 1998. Seeing your customers in a whole new life. J. Quality and Participation 21(6), 38-43. RESURRECCION, A.V.A. 1998. Consumer Sensory Testing for Product Development. Aspen Publication, Gaithersburg, Maryland. ROSSMAN, G.B. and WILSON, B.L. 1985. Numbers and Words: Combining quantitative and qualitative methods in a single large-scale evaluation study. Evaluation Review 9(5), 627-643. STEWART, C. and CASH, W.B. 1985. Interviewing: Principles and Practices 4th Ed. William C. Brown Publishers, Dubuque, IA. STEWART, D.W. and SHAMDASANI, P.N. 1990. Focus Groups: Theory and Practice: Applied Social Research Methods. Sage Publications, Newbury Park, CA. STONE, H. and SIDEL, J.L. 1993. Sensory Evaluation Practices, Academic Press, San Diego.
QUALITATIVE RESEARCH IN THE DECISION-MAKING PROCESS
263
URBICK, B. 2003. Empathography - Partnering with your consumer. In: Nontraditional consumer research methods. Workshop presented at the “2003: A Sensory Revolution”,5th Pangborn Sensory Science Symposium, Boston. WOOLDAND, C.L. 2003. Ethnographic and observational research. In: Nontraditional consumer research methods. Workshop presented at the “2003: A Sensory Revolution”, 5th Pangborn Sensory Science Symposium, Boston.
MAXIM0 C. GACULA, JR. The use of qualitative test methods, such as the focus groups, has multiplied in the past decade in a variety of areas, such as social science research, marketing research, and Sensory Science. The strengths and weaknesses of focus groups are well-known in sensory and product testing, in the social sciences, and marketing research applications. Publications on qualitative methods that the author (MCG) recommends are those by Morgan (1993) and Silverman (1993), which are applications in the social science areas, and Lawless ef al. (1998) and Chambers ef al. (1991), in Sensory Science. These books contain useful information that will assist in the application of qualitative test methods to consumer product industries. As in Sensory Science, the development of focus groups is interdisciplinary. The reviews and viewpoints by Muiioz and Moskowitz on the use of qualitative techniques in consumer products detail their years of experience applying focus groups. This author (MCG) agrees with their findings on both the negative and positive aspects of focus groups. It should be emphasized that my views of the use of qualitative techniques, such as focus groups, are that it should be used solely by researchers in product development and in the development of descriptive analysis lexicon. An example of such use is the recent publication by McNeill el al. (2000), which uses focus groups to develop a quantitative consumer questionnaire for peanut butter. The approach should be extremely useful to sensory scientists conducting Research Guidance Panels, and also to marketing researchers. In fact a similar study was conducted by O’Brien (1993). The issue was improving survey questionnaires through focus groups, in a project studying the social relationships among gay and bisexual men at risk for AIDS. Other than these uses, results of focus groups should not be used for decision-making regarding the final product. In agreement with Muiioz and Moskowitz, the final decision-making should be based on both the quantitative and qualitative test methods. The information in Fig. 15.1 shows the author’s view on the role of the aforementioned two test methods. Briefly, the traditional focus group is simply a structured interaction between the trained moderator and a panel of 10-15 members. The moderator reports the findings of the focus group. The combined observational and traditional focus groups involved videotaping of the focus group participants during a directed subject discussionhrainstorming, of actual use of products in
264
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
a test kitchen, bathroom sink, and other scenarios. Concurrently, product developers, management team members, and others observed the focus group panel in session through a one-way mirror. For example, a study dealing with toothbrush, involved a one-on-one interview followed by the actual use of the toothbrush; in this case the observers did not have the opportunity to observe the actual tooth-brushing, but later observed it through a videotape re-run. The observers and the moderator later met and discussed the sessions in order to arrive at the results. Language and phrases used by the focus group panel for describing a product, concepts, etc., are evaluated for similarities. A report is written that includes the contribution from both the moderator and the observers. The resulting language aids in the development of questionnaire design for subsequent use in a quantitative test method. Thus the qualitative and quantitative methods are complementary when applied to Sensory Science research.
I
Qualitative Methods: Focus Groups Delphi Technique and others.
Traditional Focus Groups
I
Combined Observational and Traditional Focus
Quantitative Methods: Use of structured with rating scales.
FIG. 15.1. A LINKAGE BETWEEN QUALITATIVE AND QUANTITATIVE TEST METHODS
Words and phrases derived from focus groups are based on small sample size, but may contain large semantic differences. Thus the results cannot be generalized. In contrast, quantitative methods that involve statistical sampling and power analysis have unquestioned generalizability. This is the main reason why quantitative methods are used for decision-making in sensory and consumer testing, instead of the qualitative test methods. However, an important type of information obtained from a qualitative method is the development of a new hypothesis about a concept or the product itself. This information can be
QUALITATIVE RESEARCH IN THE DECISION-MAKING PROCESS
265
extracted from the results of the traditional focus group and the combined method given in Fig. 15.1. Interpretation and summarization of words and phrases are critical for a valid result, since the result will be used in the development of a quantitative questionnaire design. As a guide for focus group result interpretation, the seven dimensions of words reviewed and described by Toglia el al. (1978) can be used. (1) Concreteness. Words differ in the extent to which they differ in describing concrete objects, persons, places, or things that can be seen, heard, felt, smelled, or tasted. This contrasts with abstract concepts that cannot be experienced by our senses. (2) Imagery. Words differ in their capacity to arouse mental images of things or events. Some words arouse sensory experience, such as a mental picture or sound, very quickly and easily, whereas other words may do so only with difficulty or not at all.
(3) Categorizability. Words differ in the ease with which they can be put into some larger category or class. Some words obviously belong to one or more categories, whereas other words may be difficult or even impossible to categorize. (4) Meaningfulness. Words differ in their capacity to arouse other words as associates to them, or in what is termed their meaningfulness. Some words are very strongly associated with other words or are associated with a great many other words. Other words are associated very weakly with only a few words or cannot be associated with other words at all.
(5) Familiarity. Words differ in their familiarity - that is, in how commonly or frequently they have been experienced or how familiar they appear to be. Some words are very familiar, whereas others may be almost totally unfamiliar.
(6) Number of Attributes or Features. Words differ in the number of different features, attributes, and/or properties that are associated with, or constitute a part of whatever the word represents. Some words involve several different attributes, whereas other words involve very few attributes.
(7) Pleasantness. Words differ in their capacity to elicit a feeling of pleasantness. Some words induce a feeling of pleasantness in us, whereas other words evoke an unpleasant feeling.
266
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
Words elicited by the focus panel can be studied using the above dimensions in relation to the products or concepts. REFERENCES CHAMBERS, E. IV. and SMITH, E. 1991. The uses of qualitative research in product research and development. In: Sensory Science Theory and Applications in Foods, H. Lawless and B. Klein, eds. Chap. 14, 395-412, Marcel Dekker, New York. LAWLESS, H.T. and HEYMANN, H. 1998. Sensory Evaluation of Food, Principles and Practices. Chapman & Hall, New York. MORGAN, D.L. ed. 1993. Successful Focus Groups. Sage Publications, Newbury Park, CA. McNEILL, K.L., SANDERS, T.H. and CIVILLE, G.V. 2000. Using focus groups to develop a quantitative consumer questionnaire for peanut butter. J. Sensory Studies 15, 163-178. O’BRIEN, K. 1993. Improving survey questionnaires through focus groups. In: Successful Focus Groups, D. Morgan, ed., Chap. 7, Sage Publications, London, England. SILVERMAN, D. 1993. Interpreting Qualitative Data. Sage Publications, Newbury Park, CA. TOGLIA, M.P. and BATTIG, W.F. 1978. Handbook of Semantic Word Norms. Lawrence Erlbaum Associates, Publishers, Hillside, NJ.
CHAPTER 16 THE FOUR D’S OF SENSORY SCIENCE: DIFFERENCE, DISCRIMINATION, DISSIMILARITY, DISTANCE HOWARD R. MOSKOWITZ Difference testing is a key activity in Sensory Science. Much of Sensory Science deals with the differences between products. At the very simplest level the question is whether or not two products can be considered to be the “same” or “different.” At a more complex level the issue is to understand the magnitude and the nature of the differences between the products. In both cases it is vital to have sensory inputs because machines may not be valid indicators. In a test of two products the machine may suggest that two products differ based upon the profile of measurements. Yet the panelist might classify these two stimuli as being the same. Or, in other cases, especially in the chemical senses, the profiles may look very much alike, differing only to a small degree yet the panelists may state that the products are different. There is no a priori way to predict whether or not the two products would be classified as belonging to the same group or to different groups. The sensory scientist must do the experiment. Testing for “all or none” differences is the conventional use of difference testing. The issue is to determine whether or not the two samples come from the same batch, or whether the test sample can be considered to be sensorially identical to a “gold standard.” The literature is replete with ways to do these types of tests. From basic methods the literature comprises very important side issues, such as methods that correct for guessing (Fisher 1956; Gacula and Singh 1984; O’Mahony 1986; Vie and O’Mahony 1989), methods that take into account hypothesized underlying sensory processes (Dessirier el al. 1999), etc. From the scientific and philosophical viewpoints, the hardest issue raised by difference testing is the map of a continuous variable(s) from perception into an all-or-none response variable (same/different). We do not perceive stimuli in an all-or-none fashion. Consequently, the sensory experience differs from moment to moment, even with the same product. The issue is the nature of the rule that the panelist uses to classify two products as the same or different. This may turn out to be more of a cognitive issue than a sensory issue, because it deals with the notion of “concept formation.” Since we cannot experience two stimuli at exactly the same time, we are always comparing a current stimulus to our concept of the reference stimulus, and not to the reference stimulus itself. If we consider a product perception to be concept, then difference testing
267
268
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
mutates into the task of determining whether or not the stimulus belongs to the concept or not. In recent years experimental psychologists have gone beyond the notion of all-or-none differences to the measurement of overall dissimilarity. Rather than considering two stimuli to fall into the same or different categories, the researcher instructs the panelist to rate the degree of diflerence. The rationale is that two products need not be either identical or different, but rather that the two products can vary in the degree of dissimilarity. By scaling differences, the researcher enables the panelist to locate the products on a continuum, ranging from identical to extremely different. Of course, at the end of the task it is still a judgment call as to whether or not two products achieving a specific rating of dissimilarity should be placed into the category of “same” or the category of “different.” The binary decision is important because, unlike scaling, one cannot partially reject or accept a product for shipment. The action called for is “all-ornone. ”
Has The Focus On Difference Testing Affected The Growth Of Sensory Science? Emphasis on a particular aspect of science exerts simultaneous positive and negative effects. The positive effect is the growth of expertise in the particular area. The focus leads to new and presumably better methods. Practitioners become experts in these methods, and they develop an understanding of the issues that hitherto they lacked. The negative effect is the change in and the possible narrowing of perspective, so that other and more possibly productive aspects are shunted aside. It is so very easy for a researcher to fall into a rut, where all of the problems are reduced to an easy paradigm, or at least an easy way to think about them. Such ease often blinds the researcher to the bigger world about them, and to the importance of other questions. Ease often first facilitates professional development, but later cripples that same development. (“ Those who the gods wish to destroy they first give forty years of prosperity!” - an ancient Greek proverb.) Difference testing is one area of research and execution wherein “excessive focus’ has, in many ways, defined the pattern within which Sensory Science has developed. A great deal of practical research and in turn theoretical development focuses on the measurement of differences between products. The business objective is quite simple - maintain product quality by maintaining sensory identity. To do that requires the input of a sensory scientist who assesses the probability that a panel can discern the difference between two products. The consequence of this research and developmental focus is the refinement of methods for discovering product differences, and an entire probabilistic theory of perception (Frijters 1984).
THE FOUR D’SOF SENSORY SCIENCE
269
How does one integrate this probabilistic theory into the practical world of Sensory Science? It is not yet clear how to effect such integration, although a number of practitioners have staked out their intellectual claims to the area, and concentrate on difference testing as the key aspect of Sensory Science (e.g., Ennis and Mullen 1986; Frijters 1984; O’Mahony 1990). This writer believes that as a consequence of this extensive focus on difference and discrimination testing there may be a return to the intellectual world-view that “throws away the mean and processes the variability.” The foregoing was an oft-stated criticism by S.S. Stevens (1966) of the Thurstonian re-scaling of judgment to erect subjective scales of perceptual magnitude (Thurstone 1927). Although the author is not so extreme as was S.S. Stevens, still it is hard to understand how the understanding of variability throws much light on the practical applications for product research. Variability is a property of the measuring instrument. Every time we spend undue time on the measuring instrument we give short shrift to that which is being measured. Subjective Dissimilarity And Distance At the practical, executional level, the measurement of subjective dissimilarity rather than all-or-none difference avoids forcing the panelist to partition a natural continuum of perceptions into an artificial dichotomy or trichotomy. At the more statistical, analytical end the measurement of dissimilarity enables the researcher to map products into a geometrical space so that the distances between pairs of products in this space correlate with the perceived dissimilarity. The geometrical space may be of low or high dimensionality. A great deal has been written on the use of “mapping” products in the space (Moskowitz 1994, 2002). Whether mapping can add to the practical usefulness of dissimilarity analysis in Sensory Science remains to be seen. Mapping procedures have certainly found advocates in Sensory Science, however. Although not totally disregarding the issue of discrimination testing because there is certainly a need for this type of work, the author wonders whether there will be the eventual necessary payback from such discrimination testing in terms of furthering the field of Sensory Science. For instance, if we look back at the history of Thurstonian scaling, we find that it gained a great deal of favor in psychometrics and in some areas of experimental psychology. On the other hand, one wonders about the ultimate utility of the approach in product development. Has Thurstonian scaling lived up to its promise? Are researchers better equipped to understand products and to work with them because these researchers are equipped with fancy analytic techniques and powerful computation algorithms? The jury is still out. Most of the scales that researchers use deal with the direct evaluation of product properties. All too often the Thurstonian
270
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
approaches which process error and weave from it the principles of psychological fabric are tried, in a limited situation, and found too difficult to implement on a widespread basis. As a consequence, the Thurstonian scales are disregarded as being impractical. A similar sad fate has befallen the magnitude estimation scale, long a favorite of psychophysicists, but a procedure that has fallen into disuse in consumer research simply because it demands too many resources to execute.
REFERENCES DESSIRIER, J.M., SIEFFERMANN, J.M. and O’MAHONY, M. 1999. Taste discrimination by the 3-AFC method: Testing sensitivity predictions regarding particular taste sequences based on the sequential sensitivity analysis model. J. Sensory Studies 14, 271-288. ENNIS, D.M. and MULLEN, K. 1986. Theoretical aspects of sensory discrimination. Chemical Senses 11, 513-522. FISHER, R.A. 1956. Mathematics of a lady tasting tea. In: The World of Mathematics 3, 1512-1520. Simon & Schuster, New York. FRIJTERS, J.E.R. 1984. Sensory difference testing and the measurement of sensory discriminability. In: Sensory Analysis of Foods, 117-140 J.R. Piggott, ed., Elsevier Applied Science Publishers, London. GACULA, JR., M.C. and SINGH, J. 1984. Statistical Methods In Food and Consumer Research, Academic Press, San Diego. MOSKOWITZ, H.R. 1994. Product testing 2: Modeling versus mapping and their integration. J. Sensory Studies 9, 323-336. MOSKOWITZ, H.R. 2002. Mapping in product testing and sensory analysis: A well lit path or a dark statistical labyrinth? J. Sensory Studies 17, 207-2 14.
O’MAHONY, M. 1986. Sensory Evaluation of Food, Marcel Dekker, New York. O’MAHONY, M. 1990. Cognitive aspects of difference testing and descriptive analysis: Criterion variation and concept formation. In: Psychological Basis of Sensory Evaluation, 117-139, R.L. McBride and D.H. MacFie, eds. Elsevier Applied Science Publishers, Barking, U.K. STEVENS, S.S. 1966. Personal communication. THURSTONE, L.L. 1927. A Law of Comparative Judgment. Psychological Rev. 34, 273-286. VIE, A. and O’MAHONY, M. 1989. Triangular difference testing: Refinements to sequential sensitivity analysis for predictions for individual triangles. J, Sensory Studies 4, 87-104.
THE FOUR D’SOF SENSORY SCIENCE
27 1
ALEJANDRA M. h O Z Importance of Discrimination Testing Discrimination testing is a crucial methodology in Sensory Science. Every consumer products company conducts discrimination tests in order to answer the basic, yet paramount, question regarding the difference or similarity between/among products. From the business perspective, discrimination constitutes a key test for numerous projects when this question needs to be answered. Therefore, discrimination tests are used in almost every project at different phases when the question involves difference/similarity. These tests are critical in product improvement, product reformulation (i.e., ingredient and process substitution), claim substantiation, and other relevant projects requiring an answer regarding product difference or similarity. Discrimination tests are very popular for at least four reasons. (1) They are powerful and sensitive tests. Products are compared under the most stringent and sensitive conditions.
(2) In most cases, a simple “yes/no” answer is obtained. The nature of these data (e.g., nominal) satisfies many managers, product developers/chemists who want to get the “bottom line: yes/no answer.” This easy-to-satisfy criterion rapidly offered, may explain why many product developers/ chemists frequently request a discrimination test, believing that this is the only question they need to have answered in a given project. (3) They are relatively simple tests to execute. (4) For most tests, the statistical analysis is simple and straightforward (e.g.,
probability levels or levels of significance resulting from the statistical tests are summarized in tables).
How “Simple” Are These “Simple” Discrimination Tests? This author believes that we tend to undermine the complexity of these “simple” tests. When professionals learn about this category of tests, they learn about the various types of discrimination tests, their sample presentation schemes, their data analyses, and their applications. With this outlook, most people reach the incorrect conclusion that “these simple tests are simple.” However, this author contends that this methodology is complex, when one examines the different issues involved in designing, executing and applying these tests. Other researchers share this viewpoint (O’Mahony 1995). Therefore, in line with the perspective of this book, this author will discuss the practical and
212
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
controversial issues of these tests, without addressing a discussion on the methods per se. The reader can refer to the general textbooks that provide a comprehensive review of the different traditional discrimination methods (Stone and Side1 1993; Meilgaard et al. 1999). and those based on signal detection theory, Thurstonian scaling, R-index and guessing models (Lawless and Heymann 1998). In addition, the reader should be familiar with the publications by the key scientists who have advanced the area of discrimination testing and continue to challenge us with new concepts and ideas (O’Mahony 1979; Ennis 1993, 1998; Ennis and Bi 1998; Cliff er al. 2000; Rousseau and O’Mahony 2001; Rousseau et al. 1998, 2002; Rousseau 2001). The discussion that follows attempts to address some of the complex issues involved in answering the simple question of product difference or similarity. It is not all inclusive, but it attempts to make the reader aware of the questions that one should consider when designing these tests, when interpreting and applying the resulting data, the situations wherein these tests are appropriately and inappropriately used, and the current controversies and misuses surrounding these tests.
Deciding When a Difference Test Is Not Needed Due to Extreme Product Differences and Inability To Mask Some Sensory Dimensions Should a discrimination test always be conducted in order to answer the question of product difference or similarity? There are some situations when the execution of a discrimination test is not the best strategy. This section discusses two scenarios.
(1) Products Are Noticeably and/or Extremely Different. Do products need to be formally tested when the difference is obvious? This question may be rhetorical for some readers. However, there are a large number of discrimination tests that are conducted needlessly. Some of the reasons why unnecessary discrimination tests are conducted include: a. b. c.
Some researchers do not screen samples, and thus are unable to reach these conclusions themselves. Some professionals do not feel qualified to inspectkcreen products and make these decisions. The test needs to be conducted regardless because of political reasons. Data are needed to prove and document that samples are different or similar.
Except when products need to be tested for political reasons, sensory professionals should not run unnecessary discrimination tests in order to prove the obvious. In this case resources are wasted. Team meetings should be
THE FOUR D’SOF SENSORY SCIENCE
213
completed in order to make decisions on the products that must be tested, and those that can be judged to be sufficiently different and need no testing. This is particularly true when several submissions need to be tested. Testing all those products may not be needed. (2) A Sensory DimensiodAttribute Cannot Be Masked. Often products need to be tested without the influence of the obvious difference of one of the sensory dimensions that would affect the outcome. Classic examples include testing foods or beverages without the influence of appearance differences, or testing personal care products (e.g., hair care products, lotions and creams, etc.) without the influence of their odor/fragrance. Sometimes these differences can be masked. For example, opaque containers or special lighting conditions can mask appearance differences. However, there are cases where appearance differences cannot be masked. Also, rarely can the ododfragrance differences be masked, unless panelists are given nose clips or other devices to block their noses. Discrimination tests should not be conducted when these sensory dimensions or attributes cannot be masked. The outcome would show the obvious. In addition, sensory professionals should ensure that any condition they use to mask appearance or odor/fragrance difference does indeed accomplish the objective. Special colored or low-pressure sodium lighting, and colored cups/containers do mask some but not all differences. Sensory professionals should screen products under the conditions investigated prior to the test to ascertain if the appearance differences are in fact masked. Discrimination tests should not be conducted, if noticeable differences exist and cannot be masked.
Scenarios Wherein a Discrimination Test May Not Be the Most Appropriate Test
(1) Large Within Product Variability. How appropriate are discrimination tests when dealing with multifaceted products (e.g., soups, prepared frozen dinners, pizzas, cereals with multiple components, etc.), or products with a large within variability? If a researcher conducts a discrimination test with aliquots of the same samplehatch, then a significant difference between products (representing subsamples of the same product) may be obtained! Furthermore, differences among test samples could be overshadowed by the large within product variability. Discrimination tests may not be the most appropriate tests to conduct when dealing with these or similar situations. In these cases the use of scaling methods is a better test strategy, since the degree of difference or the magnitudehntensity of attributes can be scored. The difference from control method (Aust ef al. 1985) was in fact developed to deal with the within product variability. By providing a difference
214
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
from control scale to panelists and using one or several blind controls, one can obtain a measure of the within variability/difference and use that information in order to assess the degree to which samples differ. For example, on a 10 point degree of difference scale, the within variability of the product may occupy a “space or magnitude of difference” of four points (0-4). Differences among samples may become significant only when the magnitude of difference between two products is higher than four points. Another way to address this problem uses descriptive analysis, wherein panelists scale attribute intensities. This information can be analyzed to determine the attributes in which the products differ and the “degree” of difference among them, by comparing the magnitude of difference in the means, if significant. (2) When Additional Product Information Is Needed
Degree of Difference. Sometimes it is necessary to report the degree of difference among products. Most discrimination tests cannot answer that question, since the outcome are nominal data, or counts. In these cases, a difference from control test should be conducted instead of a classic discrimination test.
How do the Products Differ?(attribute information). Often test requesters need to know how products are different. Many times this question cannot be answered until the discrimination test results are analyzed. Sensory scientists either: a. b. c.
assess the qualitative information obtained (if panelists were asked to indicate/comment on how the products differ) complete a descriptive test conduct a consumer test
Further discussion on the execution of these tests, after a discrimination test, is covered in the topic below. However, sometimes attribute information is needed and/or requested at the onset of the test. This may occur when the products are believed or judged to be different by the project team, or because the type of products being tested pose limitations in the execution of a discrimination test (e.g., in some personal care products). The approaches taken to characterize differences include the following: Multiple Attribute Discrimination Tests. Panelists are asked to perform multiple discrimination tests focusing on a series of attributes selected by
THE FOUR D’SOF SENSORY SCIENCE
275
the scientist. The test results indicate if the products are sufficiently different or similar in the chosen attributes. This approach may not be the best, due to several potential problems: (1) sometimes not all relevant product attributes are included, resulting in incomplete or misleading information; (2) potential misunderstanding of attributes may occur across panelists since they are not trained; and (3) these untrained panelists may be overexposed to products and become potentially fatigued due to the re-tasting or re-application of products needed for the evaluation of each attribute. Descriptive Evaluation. Products are evaluated by a trained descriptive panel. Products are judged to be different or similar in the measured attributes. This approach is sound and provides the needed information. Sensory scientists need to decide if a descriptive test should be conducted after the discrimination test is completed, or if products can be evaluated directly by the descriptive panel without a discrimination test. Consumer Testing. Often sensory scientists may choose to bypass a discrimination test altogether in favor of conducting a consumer test. This approach is followed when (1) there is a certain degree of certainty that the products differ, (2) the company is more interested in the consumer responses, including their perception of differences, or (3) the company mistrusts internal discrimination tests. In this case, information is collected on acceptance and perceived differences by consumers. Even though this approach is sound and provides information on consumer perception, it may be an expensive strategy. Except when strategy is followed for political reasons, this approach may be only warranted when the products are indeed noticeably different. Therefore, most often sensory scientists complete the less expensive discrimination test first to ensure that products are different. If the test results show a difference, a consumer test is completed. (3) When Limitations on Product Presentation Are Posed by the Product or its Use/Application. The testing of personal care products falls into this category. There are characteristics of some personal care products that may pose difficulties in the design and execution of discrimination tests, because o f
a. b.
the inability to apply and evaluate more than two products at a time (e.g., mascara, hair care products) the need to assess the product performance over time (e.g., mascara, eye shadows, hair sprays, foundations/make up, antiperspirants, etc.)
216
c.
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
the behavior of products over time (e.g., products are similar at some stages of manipulation and use, and are different at others)
These are some of the main reasons why discrimination tests are not frequently used in the personal care industry. Often, the simple difference test (which requires only a pair-wise application) is conducted. However other presentations, that do not pose problems in foods, such as the duo trio or triangle test approaches, cannot be used in the evaluation of some personal care products.
Are We Interested in Product Difference or Similarity? This author believes that in the past there have been many misconceptions and abuses in discrimination testing, mainly because of the somewhat narrow knowledge some professionals have acquired in this area. For example, most of us learned discrimination tests as “difference” tests, without being taught that there are various objectives in discrimination tests (e.g., difference or similarity). Also, we were “taught” to use P
THE FOUR D’SOF SENSORY SCIENCE
211
“Life Beyond Discrimination Tests.” Follow-up to Significant Results In a project sequence, discrimination tests are often planned and conducted first. In many projects the question of overall difference and similarity must be answered prior to proceeding with other tests. (1) Reformulation Projects (Similarity Testing). When testing for similarity, the project and planned testing sequence may end after the discrimination test. If products are demonstrated to be sufficiently similar, then the ingredient, or process substitution is approved and no further testing is required. Noteworthy is the fact that sometimes consumer testing may still be completed, despite the discrimination test results. This may occur with highly visible/political projects or when a sensory/consumer insights group has lost credibility and internal discrimination results are not trusted. If products are not demonstrated to be sufficiently similar, the ingredient, or process substitution may not be approved, and as in the scenario above, then no further testing may be completed. Recommendations will be given to continue the formulation or process research efforts to produce products that are more similar. (2) Improvements Projects (Difference Testing). When testing for difference, the testing strategy is reversed compared to that followed in similarity tests and described above. If products are not demonstrated to differ, then the ingredient, or process substitution geared for product improvement and difference may not be approved, and no further testing may be done. Recommendations will be given to continue the formulation or process research efforts to produce products that are more different. If products are demonstrated to differ, sensory scientists may decide on any these strategies or tests: a.
Approval of improvement efforts (process or formulation research). No additional testing is conducted if the results show a significant difference. It is considered that the goal was achieved in formulating/producing a product different from the control.
b.
Descriptive analysis. This test is conducted to document the type and magnitude of differences.
c.
Consumer testing. This test is conducted in order to investigate whether consumers can detect and/or care about the product difference, to study the
218
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
effect of the product differences, or the acceptance of the new reformulatedhmproved product. Most frequently, improvement projects require a series of tests, and thus the planning and use of large amounts of products. Sensory professionals, therefore, should pay special attention to the planning and acquisition/storage of the product amount needed for the complete project. Often, only enough product for a discrimination test is requested/obtained. In this case, a new batch of product is produced or acquired for subsequent tests. This procedural consequence may have detrimental effects on the research, since potentially different products may be tested in the various project stages, hindering the comparison of test results. Alternatively, the use of different batches (and potentially different products) should be taken into account in the data interpretation. For example, frequently a consumer test may not show a significant difference among products, “contradicting” the results of the previously executed difference test. In this undesirable scenario, sensory professionals need to explain the discrepancy. Often sensory practitioners fail to consider the possible differences among products used in the various tests to explain this “discrepancy.”
A Complex Question: Who Should the Panelists Be in Discrimination Tests: ConsumersVersus Known Discriminators, Naive Versus SemitrainedVersus Trained Judges? When discrimination tests were developed, the theory of panelists selection was undoubtedly emphasized. Due to the importance and use of this methodology, procedures to test for the panelists’ discriminative ability were developed in early years (Lombardi 195 1; Bradley 1953). Sensory/consumer scientists would either: a. b. c.
screen panelists for their discriminative ability and choose only discriminators to run these tests use panelists without screening, monitor panel performance and eliminate non discriminators use panelists without screening and monitoring their performance
In the past, only naive employees were used as panelists for discrimination tests. Therefore, sensory professionals would choose any of the above three approaches in selecting, using and monitoring panelists. Currently, there is an increasing number of companies that frequently use other sources of panelists: non-employees/consumers and descriptivekrained panelists.
THE FOUR D’S OF SENSORY SCIENCE
279
(1) Use of Consumers. A recent trend of using naive consumers in discrimination tests by sensory professionals has been observed. There are two reasons why this practice may be followed:
the lack of availability of and participation from employees the desire to consider only the consumers’ responses, since they are the ultimate users of the product This is a current controversial topic, since sensory professionals espouse different views regarding this practice. Whereas some professionals totally endorse and follow this practice, others (who support the use of known discriminators) vehemently criticize and oppose this procedure. Unfortunately, there is no research that has demonstrated the pitfalls of using consumers to perform discrimination tests. This author would like to offer the following suggestions when consumers are used for discrimination tests, since she endorses the belief that consumers cannot be expected to understand and adequately complete discrimination tests without some orientation and practice. Most consumers do very well in hedonicAiking and preference tests, but not in their participation in discrimination tests for the first time. Therefore, from the options listed below, screening for discriminative ability is preferred (especially when a consumer pool has been formed to participate in on-going discrimination tests). Minimally, the second option, conducting an orientation, is recommended. a.
Screening for Discriminative Ability. This practice is especially recommended when sensory professionals will be using consumers as regular discrimination panelists. This author recommends that sensory professionals treat these “consumers” as panelists, and thus have them participate in screening exercises. Additionally, these consumers should not be used as regular consumers, and their participation should be limited to analytical tests. Since consumers have a difficult time completing discrimination tests, it is recommended they be oriented and asked to partake in a series of practice trials prior to completing the screening exercises.
b.
Orientation. An orientation is recommended when consumers will be used as discrimination panelists, either with or without screening, since they need to be familiar with the task. A series of exercises should be designed to orient consumers to the task and make them feel comfortable in completing these tests. Orientation exercises should cover several discrimination tests, several products and sensory modalities, from simple to complex. It is recommended that appearance exercises be used first to introduce the test concepts and task. Furthermore, orientation with products other than the
280
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
ones consumers will evaluate should be included. For example, odor/fragrance discrimination tests may be completed for consumers who will evaluate foods. Conversely, food discrimination tests may be completed for consumers who will evaluate fragrance or personal care products. Finally, different product categories should be covered. For example, for a food panel, beverages, savory and sweet products should be included in the orientation, regardless of the products ultimately evaluated by the consumers. (2) Use of Descriptive Panelists. Some sensory professionals like to use trained descriptive panelists in some discrimination tests, since these panelists have higher acuity than untrained panelists. The belief is that there is an increased likelihood of detecting perceived differences with the use of trained panelists. This “protection” is used by several practitioners particularly in similarity projects; i.e., when there is the need to protect against Type I1 error (declaring that products are the same when in fact they are different). The principle is that if trained panelists do not find a difference, naive consumers will not find any difference. This practice has to be carefully examined when the discrimination tests, particularly the similarity tests results are to be linked with consumer tests; or when interpreting “proportion of distinguishers (Pd)” (Meilgaard ef al. 1999). The results should be carefully used. since two populations are being used and compared in the data interpretation (highly trained panelists and the regular consumer, user of the product). Choosing a Cutoff Point Beyond Which to Declare Significance: What Are Those “P” Values?
This author has already stated her opinion on the relatively poor training in discrimination testing that the sensory community received in the past. Another weak area in this methodology is the understanding of the “P-values.” Firstly, all sensory professionals should be aware that there are actually two “errors” and probability (P) values of interest in discrimination testing: Type I error with probability a,and Type I1 error with probability 8. The “P” value with which most sensory professionals are familiar is a, related to the probability of committing Type I error. This is not at all relevant when the interest is product similarity, since it is not necessary to have protection against false positives. Rather one must protect against declaring that products are the same, when in fact they are different (Type I1 error). Secondly, we are “taught” to use PCO.05 as the “cutoff point” for testing the Null Hypothesis. Sensory professionals are aware that these values are related to the risk involved in decisions. Therefore, the lower the number, the
THE FOUR D’SOF SENSORY SCIENCE
28 1
lower the risk of making wrong decisions. But, why choose the value 0.05? Why not choose the value P=0.65? Sensory practitioners should be aware that the cutoff point of 0.05 is completely arbitrary. There are anecdotes that describe that 0.05 was selected as a “reasonable” level and easy for the calculations a century ago. But there is nothing “magic” about the P=0.05. Someone could choose a level of 0.07 as a reasonably low P-value to use. Chambers (2002) has even suggested that two levels should be chosen. One level would be one at which everyone on the research team agrees it shows an important and real effect (say P=0.02) and the other that everyone would agree that an effect has NOT been shown (0.35). Values in between would represent a “gray” area, where other information or input would be required to make a decision.
The Caveats of Attribute Discrimination Tests The overall discrimination tests are the most common tests used in most projects and applications. Panelists are asked to assess how the products in the set, as a whole, resemble or differ. Panelists consider the multidimensionality of the products in making a decision. However, there are cases where attribute discrimination tests, which focus on an attribute of interest, are the test of choice. In attribute discrimination tests, panelists are asked to determine if products are different or similar based on a chosen attribute. One has to be cautious in designing and applying attribute discrimination tests. A brief discussion of caveats and misuses follows. (1) Does an Attribute Discrimination Test Meet the Objectives? Sensory professionals should carefully assess if an attribute discrimination test is the appropriate choice, given the project and test objectives. The reasons why an attribute and not an overall test is chosen should be reviewed. In addition, practitioners should ask themselves if the test results will provide the needed answers. Often an attribute test is selected because of the real need to obtain attribute information but no descriptive panel is available. Therefore, the researcher needs to ask the following questions: Is the information on one attribute sufficient? Is there a need to know information about more, or all attributes? Thus, should a multiple attribute or a descriptive test be conducted instead? (2)The Selection of the Relevant and Appropriate Attribute(s). Attribute discrimination tests are designed by sensory professionals. Thus, they are responsible for the selection of attributes. This is a key responsibility since the choice of attributes delineates the type of information obtained. Misleading
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
282
results could be obtained with the incorrect attribute selection. Some issues to consider in the selection process include: a.
Do products in fact differ in the attribute selected? This may be a rhetorical question for most people, but it is a key point to address. Many sensory scientists, especially those who do not believe or do not have the time to screen products, would select the attribute to address the test requester’s objective without screening the products. It is known that frequently a change in an ingredient or processing condition does not necessarily have an impact on the corresponding sensory attribute. Also test requesters may erroneously believe that products differ in a particular attribute, since they have information on the products’ process or composition. The attribute(s) should not be chosen based exclusively on this input. Therefore, it is critical that sensory professionals screen products to ensure that differences exist and to make the best decisions on the attribute selection.
b.
Is there more than one attribute that differentiates products? Sensory professionals must know this information prior to designing any test. If products differ by more than one attribute (which happens most of the time with any reformulated product), the sensory professional must assess if indeed the attribute test (which addresses just one attribute) is the most appropriate approach. Some necessary questions include: i.
How will the panelist respond when he is asked to concentrate on only one of the several differences that may exist between products? ii. Should a multiple attribute test be run to capture the differences in more than one attribute? iii. Should descriptive analysis be considered to obtain information on all attribute differences? (3) How To Ensure That the Panelists Understand the Attribute(s). Panelists who participate in attribute discrimination tests are not trained panelists. Therefore, one has to be cautious when product attributes are addressed in discrimination tests. The sensory professional has to ask himself whether the attribute is simple enough for everyone to understand, even for untrained individuals. Therefore, the sensory professional should always consider the possibility of setting up an orientation or brief training on the attribute(s) evaluated. This orientation can be very simple and short. It requires that the attribute be defined and that the appropriate references be presented to demonstrate the attribute. Panelists only need to be exposed to two or three levels of the attribute. The name, definition, and if applicable, the evaluation procedure should be covered in this exercise. This is a short orientation that
THE FOUR D’S OF SENSORY SCIENCE
283
teaches the panelists the attribute, and thus ensures that all panelists have the same viewpoint when evaluating that product attribute.
Training Panelists on “Same” Versus “Different” Sensory Product Space The product presentation scheme and the task involved in discrimination tests are designed to provide the most sensitive product comparison, and thus the most stringent conditions for detecting product differences. These circumstances are needed to meet the objective and function of these tests. However, this condition is a drawback when working with products having a large within product variability, and when dealing with some applications, such as Quality Control (QC) measures, where it is not as important to record minute product differences. Muiioz ef al. (1992) discuss how traditional discrimination tests generating nominal data are not appropriate for QC applications. Products should not be rejected because of a significant difference from the control, but they should only be rejected when they have achieved a level or degree of difference above the “acceptable” one. This “acceptable” level of difference or limit should be established through consumers’ or management’s input. One viable solution to this situation is the difference from control test (Aust ef al. 1985) discussed above. In this situation, a decision is not made based on nominal data (“yesho” answers) but rather on the degree of difference among products. Another alternative is to train panelists on a “same” and “different” sensory product space and then have them make decisions based on several attribute differences or similarities. Muiioz ef al. (1992) describe the characteristics of this approach and the training process involved. This approach is described by the authors as the %/out” method. In principle, panelists are trained on a “same/in” and a “different/out” sensory space. During training, they are presented with all the “perceivably” different products that are to be considered “idwithin production. ” All these products are perceivably different in one or more dimensions and in different degreeshtensities. However, panelists are to abstract that product “sensory space” as “same,” or “in,” or “not different.” Similarly, they are presented with another set of products, much different from the “ s a m e h ” space, that are to be considered “different/out.” Again, panelists are to abstract all sensory attributes and intensities in that product sensory space as “different” or “out. * In QC applications products grouped in the same/in” space represent variations of the product category or production that are “acceptable or tolerated. ” Therefore, in routine product evaluations, panelists will not be concentrating on “small/trivial” differences and similarities but on the comparison of the test samples/products relative to the two sensory spaces learned: “same/in” and “different/out.” Even if the samples are perceivably
284
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
different from each other, they will be categorized as “same/in,” if their sensory characteristics fall within the “same/in” sensory space. The basis behind this approach is that the researcher has an opportunity to define the “sensory space” of interest or relevance. Small differences, or certain attribute differences, if not important or relevant, will be disregarded and are taught to the panelists as variations within the “same/in” sensory space. Thus, this is a solution in QC/QA applications to the problem emerging from rejecting a large volume of products due to small or trivial differences. The approach is commonly used in QC/QA but not in R&D applications. However, there could be many applications of this approach when dealing with situations where large within product variability or “trivial” differences are to be ignored. Ishii and O’Mahony (1987 and 1991) also have conducted studies using “in”/“out” evaluations. In their studies, participants were asked to categorize their responses into specific sensory concepts, by defining if stimuli were “in” or “out” of the concepts.
Abuses in Discrimination Testing Abuses are common in sensory and consumer methods, particularly by naive sensory practitioners, or when resources are limited. Discrimination tests are no exception. Many of the abuses encountered in this methodology are due to some of the intrinsic characteristics of discrimination tests: a. b. c. d.
e.
a relatively large number of panelists is needed (large sample size) an overall discrimination test only provides information on overall difference or similarity attribute discrimination tests only address one attribute discrimination tests do not provide hedonic measures the statistical analyses (e.g., binomial distribution) commonly used are appropriate for data generated from traditional tests, but do not address similarity parameters, replicated data, etc.
A few examples of the most common abuses observed in this methodology follow:
(1) Small Number of Panelists/Observations (Small Sample Sizes). Compared to descriptive tests which only require a small number of highly trained panelists, discrimination tests require larger sample sizes (20-80 people depending on the test, test parameters established, e.g., a,8, etc.). Often this required large sample size becomes a problem for small companies, since the large number of employees cannot be recruited for routine tests. In these cases
THE FOUR D’S OF SENSORY SCIENCE
285
many professionals choose to conduct their test with local residents/consumers, use fewer panelists than needed, or replicate the tests. It is best to conduct the tests with local residents when the recruitment of employees is a problem. This author discussed the caveats of this practice and presented some recommendations above. Using fewer panelists than needed is a problem and the sensory professional should be aware of the risks involved. In difference tests, Type I error (missing product differences) is committed when small sample sizes are used. In similarity testing, small sample sizes cannot provide the required 0 levels, and thus the high confidence levels to declare that the products are sufficiently similar. These limitations may have risky implications, such as a mistaken approval of process and ingredient substitutions, or the approval of product improvements that are not sufficiently different from the control, etc. How appropriate is it to replicate discrimination tests (Dacremont and Sauvageot 1997)? When a sensory professional is forced to replicate discrimination tests due to an insufficient number of panelists, he/she needs to use the appropriate statistical methods to analyze the data (Priso er al. 1994; Brockhoff and Schlich 1998; Bi 2002). Often practitioners use the traditional binomial test in replicated tests, which is incorrect. (2) Forcing Out Additional Information from Discrimination Tests. Different sensory and consumer test methods were developed to answer different questions. A well-established sensory/consumer insights group uses the repertoire of sensory and consumer methods to provide answers at the different project stages. Unfortunately, in new or small sensory/consumer insights groups and/or when resources are limited, the different methods are not available. Thus researchers try to erroneously extract all required information from the one or two techniques in place. Under that scenario, discrimination tests are sometimes abused as follows. a.
Extrapolating “degree of difference” from nominal data. Nominal data are obtained from most discrimination tests. These are only counts and do not provide any information on order or magnitude. Therefore, no conclusions can be drawn regarding the degree of difference based on counts (e.g., a sample is not “twice” as different if 40 versus 20 people obtained a correct response).
b.
Incorrectly interpreting the qualitative information from discrimination tests (comments). Often, panelists are asked to describe how products differ. When a significant difference is obtained, these comments are summarized and assessed. This qualitative information should be carefully interpreted and used. In addition, the comments should be assessed with the right
286
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
perspective. Attention should only be given to a comment or a product characteristic that was emphasized by a sufficiently large number of participants. Also, someone experienced in sensory characteristics should review these data to determine the relationship among comments, link them and draw conclusions based on all comments. Some researchers tend to incorrectly focus only on those comments that address the test variable(s), even if only a few people mention that characteristic. Often, experienced sensory professionals may not be involved in the interpretation of these results. Instead, they have their technicians, who often may lack the experience in sensory attributes, interpret the data. c.
Misusing multiple attribute discrimination tests. When attribute information is needed and a company does not have a descriptive capability, multiple attribute discrimination tests may be conducted. Includingmultiple attributes in a discrimination test is not “wrong.” However, this practice should be carefully applied. Attributes should be carefully selected (e.g., is there duplication?, are there attributes missing?) The correct terminology should be selected. How fatiguing is the task if panelists have to re-taste, re-smell or reapply the products in completing multiple discrimination tests?
If multiple attribute discrimination tests are being conducted continuously because of the need for attribute information, then the development of a descriptive capability should be considered. The use of a descriptive panel to generate product attribute information is a better and more complete approach compared to completing multiple attribute discrimination tests.
(3) Collecting Hedonic or Preference Responses After Discrimination Tests. This practice is applied when companies lack the funds or other resources to conduct consumer tests. As mentioned before, a well-established sensorykonsumer insights group uses the gamut of sensory and Consumer tests to answer different questions. Discrimination panels (formed by employees or local residents) should be used to answer the question of product differences and similarities. Consumer tests should be used to obtain consumer, hedonic or preference information. Asking hedonidpreference information after discrimination tests has flaws. If employees are used for the test, then “untrue” consumer information is obtained. A small group of employees is not representative of the consumer population. Conversely, if consumers are used, the test is also flawed. Firstly, only a small group of consumers could have been recruited for the discrimina-
THE FOUR D’S OF SENSORY SCIENCE
287
tion test (e.g., 20-40). This sample size may be large enough for a discrimination test, but not sufficiently large for a consumer test. Secondly, there are also problems when naive consumers are recruited and used for participation in a discrimination test followed by a consumer test. As discussed above, naive consumers have a difficult time with the discrimination task, unless they have been oriented to this task. As stated by this author, they need to be orientedkemitrained to be proficient at completing discrimination tests. Then, if this training is conducted, at what point are they no longer representativehaive consumers, but semitrained/proficient discrimination panelists and thus these participants become unqualified for the consumer part of the test? (4) Incorrect Data Analysis. Sensory professionals should be knowledgeable of the statistical tests that are used for the analysis of nominal data. Most discrimination tests yield nominal data. This implies knowing the correct applications and limitations of each of the statistical tests. Therefore, one should be cognizant of the alternative analyses that are available for similarity tests (Meilgaard ef al. 1999), replicated discrimination tests (Priso ef al. 1994; Brockhoff and Schlich 1998; Bi 2002), etc.
REFERENCES AUST, L.B., GACULA, JR., M.C., BEARD, S.A. and WASHAM 11, R.W. 1985. Degree of difference test method in sensory evaluation of heterogeneous product types. J. Food Sci. 50, 51 1-513. BI, J. 2002. Comparison of correlated proportions in replicated product tests. J. Sensory Studies 17, 105-114. BRADLEY, R.A. 1953. Some statistical methods in taste testing and quality evaluation. Biometrics 9, 22-38. BROCKHOFF, P.B. and SCHLICH, P. 1998. Handling replications in discrimination tests. Food Quality and Preference 9(5), 303-3 12. CHAMBERS, E. 2002. Opinion on setting the critical p level.
[email protected] CLIFF, M.A., O’MAHONY, M., FUKUMOTO, L. and KING, M.C. 2000. Development of a “bipolar” R index. J. Sensory Studies 15(2), 219- 229. DACREMONT, C. and SAUVAGEOT, F. 1997. Are replicate evaluations of triangle tests during a session good practice? Food Quality and Preference 8(5-6), 367-373. ENNIS, D.M. 1993. The power of sensory discrimination methods. J. Sensory Studies 8, 353-370. ENNIS, D.M. 1998. Thurstonian scaling for difference tests. IFPressO 1(3), 2-3.
288
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
ENNIS, D.M. and BI, J. 1998. The Beta-Binomial model: Accounting for intertrial variation in replicated difference and preference tests. J. Sensory Studies 13, 389-412. GACULA, JR., M.C. 1993. Design and Analysis of Sensory Optimization. Food & Nutrition Press, Trumbull, Conn. ISHII, R. and O’MAHONY, M. 1987. Defining a taste by a single standard: Aspects of salty and umami tastes. J. Food Sci. 52, 1405-1409. ISHII, R. and O’MAHONY, M. 1991. The use of multiple standards to define sensory characteristics for descriptive analysis: Aspects of concept formation. J. Food Sci. 56, 838-842. LAWLESS, H.T. and HEYMANN, H. 1998. Sensory Evaluation of Food. Chapman & Hall, New York. LOMBARDI, G.J. 1951. The sequential selection of judges for organoleptic testing. Statistical methods for sensory difference tests of food quality. Virginia Agr. Expt. Sta. Bi-Annual Rept 2. Appendix E: 1-37. MEILGAARD, M., CIVILLE, G.V. and CARR, B.T. 1999. Sensory Evaluation Techniques, 3rd Ed., CRC Press, Boca Raton, FL. MUROZ, A.M., CIVILLE, G.V. and CARR, B.T. 1992. Sensory Evaluation in Quality Control. Van Nostrand Reinhold, New York. 0’ MAHONY, M. 1979. Short cut signal detection measures for sensory science. J. Food Sci. 44(1), 302-303. O’MAHONY, M. 1995. Who told you the triangle test was simple? Food Quality and Preference 6(4), 227-238. PRISO, H.E., DANZART, M. and HOSSENLOPP, J. 1994. A statistical analysis of difference tests with replications. J. Sensory Studies 9, 121-130. ROUSSEAU, B. 2001. The beta-strategy: An alternative and powerful cognitive strategy when performing sensory discrimination tests. J. Sensory Studies 16, 301-319.
ROUSSEAU, B. and O’MAHONY, M. 2001. Investigation of the dual-pair method as a possible alternative to the triangle and same-different tests. J. Sensory Studies 16, 161-178. ROUSSEAU, B., MEYER, A. and O’MAHONY, M. 1998. Power and sensitivity of the same-different test: Comparison with triangle and duo-trio methods. J. Sensory Studies 23, 149-173. ROUSSEAU, B., STROH, S.and O’MAHONY, M. 2002. Investigating more powerful discrimination tests with consumers: Effects of memory and response bias. Food Quality and Preference 13, 39-45. STONE, H. and SIDEL, J.L. 1993. Sensory Evaluation Practices, Academic Press, New York.
THE FOUR D’S OF SENSORY SCIENCE
289
MAXIM0 C. GACULA, JR. Moskowitz brought out two important issues that sensory scientists face in the choice between discrimination testing and the use of scaling:
(1) The mapping of a continuous variable from perception into an all-or-none response variable. (2) The rule that the panelist uses to classify two products as same or different. There is no clear-cut rule on the choice of these two methods of sensory evaluation. In practice it is the general rule that the choice is based on study objectives, the type of products, and how the results of the study will be used. Tradition and experience also contribute to this choice. In the maintenance of product quality in manufacturing, the discrimination test is frequently used to provide the necessary data, i.e., padfail, acceptkeject. It should be recognized that results obtained can be sometimes solely the result of product and production variability. In some applications, the degree of difference test is used instead. Consider its application to shelf-life determination that uses a 7-point off flavor category scale, where “1 = none” and “7 = very strong” (Gacula et al. 1986; Gacula er al. 1975). In using this procedure, one must first establish the demarcation line on the scale that divides it into “acceptable“ and “unacceptable” as determined by a consumer test. Using the hot dog as the product, it has been shown that a median or an average score of 3.5 represents the minimum average score that the product must achieve to be deemed acceptable. That is, a score 5 3.5 is acceptable, and a score > 3.5 is not acceptable. Such results were also supported by the pH of the product. Thus, in this application, consumer input and sources of product variability are both considered in the determination of shelf-life of the product. Such applications can be extended to other types of products. Experience shows that a product sample may be judged unacceptable in the plant, but acceptable in a consumer test. Professionals should realize that we are surrounded with various types of variability and that “throwing the mean and processing of Variability” can lead to misleading results, and would be a move backward for Sensory Science. That variability which is a fundamental characteristic of sensory data should always be an integral part of data analysis. Thus, this author (MCG) believes that assessing degree of difference by scaling has more experimental information than discrimination testing.
290
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
Issues with Discrimination Testing Moskowitz stated that “discrimination represents the more extreme aspect of dissimilarity measurement, thus difference and discrimination are just different parts of the same task. ” I fully concur, and such a statement is in fact unknowingly practiced today. The point that both belong to the same task, suggests that experiments often conducted are possibly redundant. For example, when dealing with a finished product one sometimes runs a discrimination test, i.e., triangle test, and conducts only a degree of difference test, i.e., using a rating scale, if the discrimination test showed statistical significance between products. From my perspective, I generally recommend using a discrimination test only if homogenous samples can be obtained, i.e., different levels of a sweetener in a liquid medium, and would not recommend it for finished products because of the complex nature of sensory attributes. Since we do not know the expected size of the difference and in most cases more than two samples are evaluated, it is therefore cost effective to bypass the discrimination test. The researcher and sensory scientist play a major role in making this type of decision. Can homogenous samples be procured for discrimination testing? Instead of discrimination tests, why not use descriptive analysis that considers various sources of variability to obtain degree of difference? There is no problem of “go/no go,” “accept/reject,” or “change/do not change” in making a business decision. From my viewpoint, such a decision should be based on degree of difference test properly analyzed. This brings up the common question of how to statistically analyze results from similarity tests. Muiioz emphasized the role of Type I and Type I1 errors in similarity testing which I fully agree with. In the following discussion, the statistical aspects will be presented. Similarity testing is the same concept as: (1) Clinical bioequivalence, i.e., clinical efficacy of a new drug is equivalent to an existing drug. (2) Sensory equivalence (Gacula 1993), i.e., ingredient substitution study to maintain cost. (3) Product parity in claims substantiation (Buchanan et al. 1989), i.e., product A is as good as the leading brand. For simplicity, we will be using the same alphabetical letters to denote both population and sample parameters. For two products or stimuli A and B, the classical Null Hypothesis is written as H,: A = B and the alternative hypothesis is written as
THE FOUR D’S OF SENSORY SCIENCE
29 1
Ha: A # B. The classical hypothesis is used to disprove the Null Hypothesis. However, its application to the above situation leads to difficulties of interpretation because the desired result in similarity testing is the acceptance of the null. Hence the use of the traditional tests of significance no longer applies or is at least questionable. In the past decades, several papers have been published to address this issue as applied to clinical trials resulting in both agreements and disagreements in their findings. Overall, it has been suggested that the confidence interval test is more appropriate (Westlake 1972; Metzler 1974; Shirley 1976; Blackwelder 1982) for this purpose. Meilgaard e?al. (1999) recommended the classical hypothesis testing for similarity test, but allows the Type I and Type I1 errors to vary. A practical advantage of confidence interval test is that it provides a range of difference between A and B that one can use to conclude equivalency. In another context, Chow and Liu (1998) using the Schuirmann (1987) idea of interval hypotheses used the above classical hypotheses as follows: Ha,: Ha,:
A - B 2 U, product A superior to B A is not superior to B
Hoz: H,:
A - B S L, product A inferior to B A is not inferior to B
where L and U are meaningful clinical lower and upper limits in a particular study. In this situation, we have two Null Hypotheses to show equivalence by rejecting the Null Hypothesis of inequivalence. Rejection of both Null Hypotheses H,, and H, will lead to the conclusion of clinical equivalence. However, the specific values of U and L to use in sensory evaluation will be a difficult choice, if not impossible or impractical. Is -0.25 to 0.25 a logical sensory difference for equivalence? That is L < A - B C U = -0.25 c A - B C 0.25. Let D = A - B; if D C 0.25, then A is not inferior to B; similarly, if D C (- 0.25), then A is not superior to B. If this occurs, then one can conclude sensory equivalence. From my perspective, it is suggested that the confidence interval method should be used to analyze similarity-test data. Although the formula for computing confidence interval (CI) is well-known, it is felt that it should be reviewed here. For the paired comparison design, the formula is
where
292
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
D = (A
- B)/N, N
= total number of pairs of observations.
L,lX - value of t statistic at the a level of significance and n degrees of freedom (which is the number of pairs of observations - 1). If the sample size is large (greater than 120), then the Z value from the standard normal distribution can be used, the most common values being 1.960for a 95% CI and 2.576 for a 99% CI, or equivalently from the t-distribution at infinity degrees of freedom.
SE,, = standard error of the difference D. For the independent comparison (unpaired) design, we have the following sample estimates that are computed: XA = mean for sample A, XB = mean for sample B, SEA = standard error for sample A, SE, = standard error for sample B. The confidence interval for each sample is
XA f 1.96O(SEA) XB f 1.960(Sb) where 1.960 is as defined earlier. It is a common practice in the medical and biological sciences where the CI for each sample is computed as given above, and clinical significance of the sample difference is concluded when the confidence interval of the two samples overlapped. However, recent work by Schenker er al. (2001)showed that this method is not optimal and has low power. As a result of this finding, it is suggested that the formula given below should be used:
where n, and n2 are the number of observations for products A and B, respectively, and S, the pooled standard deviation obtained from the variances of products A and B. All other terms in the formula are similarly defined as that of the paired comparison design. Quoting Nelson (1990),“confidence intervals are superior to hypothesis tests in that they not only show what parameter values would be rejected if they were used as a Null Hypothesis, but also the width of the interval gives an idea of the precision of the estimation.” In summary, the following points are suggested for the statistical analysis of sensory data obtained from similarity testing:
THE FOUR D’S OF SENSORY SCIENCE
293
(1) The confidence interval method should be used. (2) Since the primary objective of similarity testing is the acceptance of the Null Hypothesis, an adequate power of the test should be prescribed. (3) A statistically determined sample size should be used. (4) A computer simulation should follow using the sample statistics obtained from the results of the study. This should provide information on the validity of the sensory equivalence result. The decision rule for test of equivalence as stated by Gacula (1993) is as follows: “If the CI includes zero, one concludes that the two treatment means are equivalent with a confidence level of (1 - a)lOO%. If in particular D = 0, then the means are considered nearly equal.” The four points stated above can be illustrated by an example. Consider the hypothetical descriptive analysis results for two finished products A and B, based on a 0 to 15 rating scale: Mean A = 7.68 Mean B = 8.33 No. of judges = 6 No. of replications = 2 Variance of D = 2.10 Standard deviation of D = 1.45 Standard error SE, = 1.45/@ Significance level a = 0.05 t(ll,0.05) = 2.201
= 1.4Y3.46 = 0.419
Although the products were evaluated one at a time, it can be viewed as a paired comparison design since both products were evaluated in one session. Substituting the above values in the CI formula we obtain, CI = 0.65 f 2.201(0.419) = 0.65 f 0.92 or a CI of (-0.27, 1.57). Since the CI includes zero, products A and B are sensorially equivalent. Note the many possible values of the Null Hypothesis. This is one of the reasons why we do not prove the Null Hypothesis - we do not know the value of the null or its value is unknown in the traditional hypothesis testing. The computer simulation using SAS (SAS Institute Inc.) is such that a sensory scientist can do it competently once the SAS code is written. For our example, the SAS code or program was written and is given in Table 16.1.
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
294
TABLE 16.1. SAS PROGRAM FOR THE SIMULATION RUN USING THE NORMAL DISTRIBUTION
*prog chaptl6.sas; data normal; retain seed1 seed2 seed3 seed4 seed5 500; do
i = l to 12; runl =0.65 run2=0.65 run3 =0.65 run4=0.65 run5 =0.65
if
i = 1 then do; seed 1 =500; seed2 =500; seed3=500; seed4 =500; seed5 =500; end; output; end; run;
+ sqrt(2.l0)*rannor(seedl); + sqrt(2.l0)*ra~or(seed2); + sqrt(2.10)*rannor(seed3); + sqrt(2.10)*ra~or(seed4); + sqrt(2.10)*rannor(seed5);
proc means mean n std maxdec=3; var runl run2 run3 run4 run5; title "Simulated mean difference D for m=0.65 variance=2.10"; run;
In this program, five descriptive analyses (run, - run,) were simulated using the mean difference of 0.65 and variance of 2.10. These are the statistical sample parameters that describe the population. These two parameters are the only values that the sensory scientist enters in the program. The result of sensory equivalence would be confirmed if the five simulated mean differences fall within the 95% of the CI of -0.27 to 1.57. The SAS output (Table 16.2) shows the simulation results.
THE FOUR D'S OF SENSORY SCIENCE
295
TABLE 16.2. SIMULATED MEAN DIFFERENCE D FOR MEAN = 0.65 AND VARIANCE = 2.10 Variable run1 Nn2 run3 run4
run5
Mean D
N
Std Dev
-0.171 -0.138 -0.213 1.071 0.834
12 12 12 12 12
1.630 0.984 1.511 1.105 1.817
As shown in Table 16.2, all the simulated mean differences for the five runs, ranging from -0.213 to 1.077, lie inside the 95% CI, confirming the Null Hypothesis of sensory equivalence. The results in Table 16.2 can be presented in the form of a quality control chart as given in Fig. 16.1 using SASIGRAPH (1990). In this figure, the upper and lower dash lines represent the 95% confidence interval limits and the middle line represents the mean D.
D
I
.a
1.6
I .4 1 .z
I .o 0.1
0.9 0.4
0.2 0.0 -0.2 -0.4 -0.6
4
5
FIG. 16.1. SIMULATED MEAN DIFFERENCE OF EACH RUN GRAPHED AS A QUALITY CONTROL CHART
An important point in the similarity testing method is the power of the test, which directly relates to sample size N. In this method, and unlike the classical hypothesis testing, the Type I1 error (probability of accepting the Null Hypothesis when it is false) is more important to control than the Type I error (probability of rejecting the Null Hypothesis when it is true). In this case one can set Type I error (Y = 0.50 or some other values. In theory, increasing a!
296
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
increases the power of the test. Remember that Power = 1 - Type I1 error. Figure 16.2 shows the plot of power and sample size using PASS 6.0 (NCSS 1996).
Power vs
10
N
I
I
1'5
2'0
m
N FIG. 16.2. PLOT OF POWER OF THE TEST AND SAMPLE SIZE N: TYPE I ERROR = 0.50, D = 0.65, AND STANDARD DEVIATION = 1.45
For our example with N = 12, the power of the test is 0.82. Increasing the sample size to N = 24 gives a power of 0.94. Note that power of the test denotes the ability of the test statistic to reject a false hypothesis.
REFERENCES CHOW, S.C. and LIU, J.P. 1998. Design and Analysis of Clinical Trials. John Wiley & Sons, New York. BLACKWELDER, W.C. 1982. Proving the null hypothesis in clinical trials. Controlled Clinical Trials 3, 345-353. BUCHANAN, B. and SMITHIES, R.H. 1989. Substantiating a parity position. J. Advertising Res. 29, 9-20. GACULA, JR., M.C. 1993. Design and Analysis of Sensory Optimization. Food & Nutrition Press, Trumbull, Conn.
THE FOUR D’SOF SENSORY SCIENCE
297
GACULA, JR., M.C. and KUBALA, J.J. 1975. Statistical models for shelf-life failures. J. Food Sci. 40, 404-409. GACULA, JR., M.C. and WASHAM 11, R.W. 1986. Scaling word anchors for measuring off flavor. J. Food Quality 9, 57-65. MEILGAARD, M., CIVILLE, G.V. and CARR, B.T. 1999. Sensory Evaluation Techniques. CRC Press, Boca Raton, FL. METZLER, C.M. 1974. Bioavailability - a problem in equivalence. Biometrics 30, 309-317. NCSS. 1996. PASS 6.0 User’s Guide. NCSS Statistical Software, Kaysville, UT. NELSON, L.S. 1990. Comments on significance tests and confidence intervals. J. Quality Technol. 22, 328-330. SAS INSTITUTE INC. 1999. SAYSTAT User’s Guide, Version 8, SAS Institute, Cary, NC. SAS INSTITUTE INC. 1990. SAYGRAPH Software, Version 6, SAS Institute, Cary, NC. SCHENKER, N. and GENTLEMAN, J.F. 2001. On judging the significance of differences by examining the overlap between confidence intervals. The American Statistician 55, 182-186. SCHUIRMANN, D.J. 1987. A comparison of the two one-sided tests procedure and the power approach for assessing the equivalence of average bioequivalence. J. Pharmacokinetics and Biopharmaceutics 15, 657-680. SHIRLEY, E. 1976. The use of confidence intervals in biopharmaceutics. J. Pharm. Pharmacol. 28, 312-313. WESTLAKE, W.J. 1972. Use of confidence intervals in analysis of comparative bioavailability. J. Pharmaceutical Sci. 61, 1340-1341.
CHAPTER 17 REPLICATION IN SENSORY AND CONSUMER TESTING HOWARD R. MOSKOWITZ Replication and reliability are two areas in which sensory practitioners argue, occasionally productively, almost always interminably. These two aspects of the scientific endeavor comprise important factors in the goal to acquire data on which one can act. Replication consists of running the same panelist twice or more on the evaluation task. Reliability is the statistical analysis of the results of this replication. Researchers replicate panelist data for at least two reasons. (1) Assess Consistency. In science it is important to obtain consistent data, if only to feel that the world is orderly. From one rating it is impossible to measure the reliability of the panelist data. Only with replicate samples can the researcher determine whether or not the panelist assigns numbers reliably. Reliability is extremely important when the researcher must depend upon a small group of panelists to assess a specific product characteristic, e.g., an expert panel whose job is to evaluate “off taste.” To the degree that the panelist assigns ratings reliably one can feel sure that the measuring instrument (viz., the panelist) is reliable. It should come as no surprise, therefore, that a great deal of emphasis has been placed on replication for expert panelists, upon whose ratings many business decisions lie. It is impossible to average out random variability as one could do with a large number of panelists, who only evaluate the product one time. The next best thing is to reduce the random variability by working with a reliable measuring tool. (2) Faster Convergence Towards The Mean. Reliable ratings mean that the true average can be more quickly obtained. There is a greater chance to estimate the mean value, since random error cancels out. If the panelist is not reliable, then part of the rating is valid (containing information pertaining to the mean) and part of the data is random noise. Eventually random noise will cancel out, but it will take more data to cancel out random noise than would be the case if there were no random noise at all.
Why Replicate Observations? We take it for granted that replication is a good thing. For one, it allows us to cancel out extraneous noise. Second, if we want to create an individual model, then the more observations we have for any one particular individual the 299
CHAPTER 20 TRAINING TIME IN DESCRIPTIVE ANALYSIS ALEJANDRA M. -0Z The Impact of Training Time in the Approval of a Descriptive Program The establishment of a descriptive analysis program represents a considerable investment for a company. Funds, resources and time need to be invested at the program start-up and indefinitely throughout the life of the panel (i.e., for panel monitoring, product evaluations, review or training sessions, etc.). This investment encompasses (Muiioz 1999): (1) Personnel’s time (professionals, technicians, and other employees). (2) Outside resources such as consultants, statisticians, technicians. (3) Panelists salaries if residents from the community are used. (4) Supplies. Consequently, the time which translates into cost and the funds required for a training program play an important role in the approval of a descriptive program. If funds are limited, then a comprehensive, yet expensive training program requiring a long time to complete may not be approved. In addition, if project work needs to be initiated promptly, long training programs will not be approved. Management and the professionals involved in the approval of the descriptive program need to know the details and the return on this investment, including benefits and uses of the descriptive program. A sensory professional requesting approval for the program needs to summarize and present this information to management. One of the most important elements - the total time required to train a panel - should be figured out for management. The steps involved in training a panel, which determine the total training time are (Schwartz 1975; Keane 1992; Muiioz ef al. 1992b; Muiioz and Civille 1992; Stone 1992; Stampanoni 1994) are as follows: (1) Recruitment and screening of panelists. (2) Program development. a. Selection of training and practice exercises. b. Development and documentation of the training program’s protocols. c. Preparation of references and products to be produced by chemists/ product developers.
35 1
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
352
(3) Selection of the training site. (4) Acquisition of materials such as samples, references, utensils. (5) Training of panelists. a. Training sessions. b. Practice sessions. (6) Validation study. (7) Reorientation if needed.
For most programs, steps 2,4, and 5 are the most time consuming steps in the process, thus these steps have the greatest impact on the time required to train a panel.
Factors Affecting the Panel Training Time How long does it take to train a panel? The duration of a training program depends on the type of panel to be trained (Muiloz 1995, 1999). In this author’s opinion, there are five elements that determine the duration of a descriptive training program. Sometimes, it is difficult to separate these factors, since some of them are confounded. The five factors influencing the training time are: Type of descriptive method chosen, type of panel (universal versus product specific), type of product category, sensory dimensions/number of attributes (for a product specific training), and level of training desired. (1) Descriptive Method. As described in Chap. 18, the Free Choice Profile method requires the shortest training time, which represents this method’s biggest appeal (Williams and Langron 1984; Williams el al. 1981). In fact, there is almost no training involved. Each panelist chooses and uses the descriptors he/she wants, thus eliminating the time needed to develop and standardize the lexicon, review references, and learn descriptive concepts. The training time required for a QDA (Quantitative Descriptive Analysis) panel is also relatively short. Stone and Side1 (1998) indicate that with focused recruiting, screening and training, an operational panel could be developed in as little as 3-4 weeks. This short program set-up can be completed, because of the following: a. b.
QDA panels are product specific, hence the training is completed for one product category alone. Limited time is spent on the development of the descriptive lexicon, since it is based on non-technical consumer language. Therefore, the extensive time required to review references and teach panelists technical language is not necessary.
TRAINING TIME IN DESCRIPTIVE ANALYSIS
c.
353
No intensity references are used, thus no time is invested in reviewing and learning these references.
Flavor and Texture Profile programs, and the derivatives of these methods (e.g., Spectrum) require longer training periods (Caul 1957; Brandt et al. 1963; Keane 1992; Murioz and Civille 1992; Muiioz et al. 1992b; Meilgaard et af. 1999) since: a. b.
c.
The training programs are universal, hence several product categories are covered in the training program. The developed lexicons are technical requiring the presentation of many more references and longer discussion periods to teach panelists these terms. Intensity references may be used thus increasing the complexity of the training program and the time involved.
Muiioz and Bleibaum (2001) discussed the origins, differences and controversies of the fundamental descriptive analysis methods in a workshop presented at the 4th Pangborn Sensory Science Symposium. The reasons behind the differences in the time involved to train a QDA@ panel versus a Profile/modified Profile panel were covered. (2) Types of Panel: Universal Versus Product Specific. A universal panel is trained when more than one product category, such as lotions and creams, soaps, hairstyling agents, is included in the training. Once the training program is completed, a universal panel is able to evaluate products from the product categories covered in the training (Murioz and Civille 1998). A product specific training focuses on one product category alone, such as salad dressings. In this type of program, the products shown for ballot development, as well as all references, exclusively belong to one product category. Therefore, the training time is much shorter in a product specific than a universal training program. For example, a universal training program including five product categories, will take approximately three to four times longer to implement than a product specific program, where only one product category is addressed. (3) Type of Product Category for a Product Specific Training. The time required to train a product specific panel depends on the type of product category addressed. Training programs for complex categories like tobacco, chocolate, and meats are longer than those for simple categories like potato chips. More references need to be shown, more concepts are covered, more discussions take place, and additional practice time is needed for complex product categories (Muiioz 1999).
354
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
(4) Sensory Dimensions or Number of Attributes. The sensory dimensions (i.e., appearance, fragrance, skinfeel, flavor, texture) or number of product attributes that need to be covered in the training program affects the duration of the program. Sometimes, a Profile panel may be trained only on Flavor or Texture. The training time is shorter when only one dimension, such as flavor versus all of the product’s sensory dimensions are covered. In addition, the more attributes to learn, the longer the training period. The R&D programs are longer, since panels are required to learn and rate all product attributes. The R&D evaluations usually include as many attributes as needed to describe a product category in detail. Depending on the type of descriptive method chosen and the product category, the list of attributes for R&D evaluations can range from 15 to 50 attributes. Other programs, such as QC or shelf-life panels require less detailed evaluations. A QC/shelf-life panel focuses only on the product’s critical attributes. The list of attributes in a QC evaluation may range between 2 and 15 attributes (Muiloz ef al. 1992a). Therefore, the training time is shorter for these panels, In Chap. 18 the different approaches that are followed in the lexicon/language formation of comprehensive R&D versus attribute (e.g., QC, shelf-life) panels were discussed. This step is critical in determining the needed training time.
(5) Level of Training Desired. One of the critical factors affecting the duration of a training program is the type of training level desired. Very short and general training programs only produce semi-trained panels. Long training programs are required to produce a well-trained and calibrated panel. In long and involved programs more product references are shown, more detailed discussions are held, and more evaluations and practice sessions are completed. The more practice, feedback and calibration, the better trained the panel becomes (Muiioz 1999). Therefore, this effort translates into more involved training programs. In summary, many factors play a role in determining the length of a descriptive training program (ASTM 1992). In general, given all factors discussed above, the training time can vary as follows: For a QC or shelf-life programs, from a few hours to two or three weeks. For an R&D panel the training program may take between two to three weeks (Stone 1998) to six months (Caul 1957; Brandt ef al. 1963; Schwartz 1975; Keane 1992; Muiioz ef al. 1992b; Mufioz and Civille 1992).
TRAINING TIME IN DESCRIPTIVE ANALYSIS
355
REFERENCES
ASTM. 1992. ASTM Manual series MNL 13. Manual on descriptive analysis testing, R. Hootman, ed. ASTM, West Conshohocken, Penn. BRANDT, M.A., SKINNER, E.Z. and COLEMAN, J.A. 1963. The Texture Profile Method. J. Food Sci. 28, 404-409. CAUL, J.F. 1957. The profile method of flavor analysis. Advances in Food Res. 7(1), 1-40. KEANE, P. 1992. The Flavor Profile. In: ASTM Manual series MNL 13. Manual on descriptive analysis testing. ASTM, West Conshohocken, Penn. MEILGAARD, M., CIVILLE, G.V. and CARR, B.T. 1999. Sensory Evaluation Techniques, 3rd Ed., CRC Press, Boca Raton, FL. MUROZ, A.M. 1995. Descriptive Analysis Techniques to Evaluate Flavors. Presented at: Rutgers’ Advances in Flavor Research and Technology Symposium. Woodbridge, NJ. MUROZ, A.M. 1999. Different approaches for training a descriptive panel. What do I invest and what do I get in return? Presented at: Sensiber 99. I1 Simposio Iberoamericano de Andisis Sensorial. Mexico City, Mexico. MUNOZ, A.M. and BLEIBAUM, R.N. 2001. Fundamental Descriptive Analysis techniques. Explorationof their origins, differences and controversies. Workshop presented at “2001: A Sense Odyssey”, 4th Pangborn Sensory Science Symposium, Dijon, France. MUROZ, A.M. and CIVILLE, G.V. 1992. Spectrum Descriptive Analysis Method. In: ASTM Manual series MNL 13. Manual on descriptive analysis testing. ASTM, West Conshohocken, Penn. MUROZ, A.M. and CIVILLE, G.V. 1998. Universal, product and attribute specific scaling and the development of common lexicons in descriptive analysis. J. Sensory Studies 13, 57-75. MUROZ, A.M., CIVILLE, G.V. and CARR, B.T. 1992a. Sensory Evaluation in Quality Control. Van Nostrand Reinhold, New York. MUROZ, A.M., SZCZESNIAK, A.S., EINSTEIN, M.A. and SCHWARTZ, N.O. 1992b. The Texture Profile. In: ASTM Manual series MNL 13. Manual on descriptive analysis testing. ASTM, West Conshohocken, Penn. SCHWARTZ, N.O. 1975. Adaptation of the Sensory Texture Profile method to skin care products. J. Texture Studies 6, 33-42. STAMPANONI, C.R. 1994. The use of standardized flavor languages and Quantitative Flavor Profiling Technique for flavored dairy products. J. Sensory Studies 9, 383-400. STONE, H. 1992. Quantitative Descriptive Analysis (QDA). In:ASTM Manual series MNL 13. Manual on descriptive analysis testing. ASTM, West Conshohocken, Penn.
356
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
STONE, H. and SIDEL, J.L. 1998. Quantitative descriptive Analysis: developments, applications and the future. Food Technol. 5209, 48-52. WILLIAMS, A.A., BAINES, C.B., LANGRON, S.P. and COLLINS, A.J. 1981. Evaluating tasters' performance in the profiling of foods and beverages. In:Flavour '81, pp. 83-92, P. Schreier, ed. Walter de Gruyter, Berlin. WILLIAMS, A.A. and LANGRON, S.P. 1984. Theuse of free-choice profiling for the evaluation of commercial ports. J. Sci. Food Agric. 35, 558-568.
MAXIM0 C. GACULA, JR. What is the basis for determining the adequate training time for descriptive analysis? This is a fundamental question that sensory scientists encounter in practice. Muiioz provided an excellent picture of this important area. Although covered by Muiioz, the author feels that the success of descriptive analysis depends on four crucial factors (Gacula 1997). (1) Training and Experience of the Judges. Training is product-dependent because the sensory attributes vary among products. For example, attributes for lotion products differ from those of wines. The length of training also depends on the product, within a descriptive method. Some products require longer training than do others. An experienced judge, by virtue of product research exposure and product usage, should not be considered a trained judge, because the individual was not specifically trained in scaling procedures, attribute definition and other aspects of product-related training.
(2) The Descriptive Analysis Panel Leader. The panel leader or program administrator plays a critical role in the establishment and maintenance of descriptive analysis panel, particularly in maintaining motivation of panel members. (3) The Execution of the Sensory Test. Strict adherence to standard operating procedures must be employed for the choice of control and reference standards, conduct of the test, and choice of test design.
(4) Long-term Commitment by the Company Management. This factor is the prime mover for a successful sensory program. Development of a descriptive analysis program, as everyone knows, requires ample time and special physical facilities that require capital investment.
TRAINING TIME IN DESCRIPTIVE ANALYSIS
357
The training time for the widely used descriptive analysis methods as compiled by Hootman (1992) is summarized in Table 20.1. Specific references for these methods are given in this important ASTM publication. In addition, the Profile Attribute Analysis, an extension of the Flavor Profile Method which incorporates scaling of sensory attributes, appears in this table. Finally Table 20.1 presents information for Free-Choice Profiling, which has the unique property that the panel members have the freedom of using their own sensory terms/words to describe the stimulus or products (Williams and Langron 1984). The training time among the methods varies with some commonalities. This is likely due to differing philosophies that characterize each method. Unfortunately there is no published work on this topic at the present time, and should be a challenge in the next decade.
TABLE 20.1. TRAINING TIME FOR WIDELY USED DESCRIF'I'IVE ANALYSIS METHODS Number of Panelists
Descriptive Analysis Method Quantitative Descriptive Analysis
Training
8-15
_____
Sensory Spectrum Method
12-15
Flavor Profile / Profile Attribute Analysis
Minimum of 4
Texture Profile
6-10
Free-Choice Profiling
10-15
--i
I
3-4 months
-1
About 6 months
4-6 months
The training time depends on the particular product or stimulus being evaluated, and therefore even for one descriptive analysis method, the numbers for a particular product cannot be generalized to all products. Experience in the use of the method and familiarity with the product category play a major role in the determination of the length of training time. An issue not clearly addressed or perhaps indirectly addressed is how to determine the endpoint of training time. The Quantitative Descriptive Analysis Method reported a minimum of 65 % panelist discrimination ability using the paired comparison and duo trio tests. For rating scale responses, this author (MCG) believes that the simple quality control technique as described by the author in Chap. 18 will suffice. The endpoint of training time is reached when all panel members are inside the lower and upper limits of the control chart.
358
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
REFERENCES GACULA, JR., M.C. 1997. ed. Descriptive Sensory Science in Practice. Food & Nutrition Press, Trumbull, COM. HOOTMAN, R.C. ed. Descriptive Analysis Testing for Sensory Evaluation. ASTM Manual Series: MNL 13, ASTM, West Conshohocken, Perm. WILLIAMS, A.A. and LANGRON, S.P. 1984. The use of free-choice profiling for the evaluation of commercial ports. J. Sci. Food Agric. 35, 558-568.
CHAPTER 21 CONSUMER-DESCRIPTIVE DATA RELATIONSHIPS IN SENSORY SCIENCE HOWARD R. MOSKOWITZ The interrelation of databases, and especially those that contain consumer attribute ratings, is of growing importance to sensory scientists. Some of this interest comes from their often newly acquired responsibility for early stage consumer testing, even before the market researcher becomes involved. Sensory Science, traditionally having concentrated on expert panels, now is being given the opportunity to interrelate consumer data with available expert panels. Furthermore, since only the sensory scientist has access to expert panel data, the sensory scientist stands to gain a great deal from, as well as contribute a great deal to, the interrelations that can be developed.
Psychophysical Based Stimulus-Response (S-R) Models The easiest way to create these models correlates two variables, much as was described in the chapter on relating experts to consumers. For many years researchers were content to establish the existence of a positive relation by simply correlating two data sets. Indeed for many of the initial studies in Sensory Science, such as those reported in the Journal of Food Science from its inception to the middle 1970s, it sufficed to report the magnitude of the correlation between the sensory and the objective measures. Perhaps the reason for this satisfaction was the state of development of Sensory Science. Up to the 1970s, most food scientists were content to discover the existence of relations between physical/chemical data, but did not generally use these data to create psychophysical relations. It was the pioneering work of such food scientists as the late Amihud Kramer (1976) and John Powers (1976) that forced food scientists into the search for relationships between variables, not just correlations. The migration of psychophysicists from the realms of pure psychological research to product-oriented research changed the nature of the game. Psychophysicists search for quantitative relations, and more importantly, seek appropriate, relevant functions that describe the relation between two variables, or between one subjective variable and a host of independent physical variables. These equations are models. They may be simple curve fits (e.g., multiple linear regression, relating multiple formula variables to attribute ratings (Schutz 1983), or they may be actual equations that describe the underlying transformation of physical variables to subjective response attributes (Cussler er al. 1979). 359
360
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
Foods are more complex than the simple stimuli used by the basic researcher. How does the researcher create a model for a complex food, e.g., relating the descriptive attributes of the food to physical properties on the one hand, and overall liking on the other? These are the two key issues today ... the psychophysical question (relation to physical variables) and the psychological question (key drivers of a complex subjective response). As noted above, correlations simply will not do. On the one hand, when it comes to the physical variables, there are a host of physical variables that may correlate with the sensory rating. Which of these physical variables is best? For aroma and flavor, there are literally hundreds of aroma chemicals that can be isolated from the gas chromatogram and mass spectrogram. Which are the relevant flavor chemicals that underlie the perception of an attribute? Correlations will be low. Furthermore, the only time that single chemical components co-vary strongly with a single sensory attribute is in the case of so-called character impact compounds, which by themselves generate the specific sensory impression, such as amyl acetate for banana aroma or nookatone for grapefruit. In an article written almost a quarter of a century ago, the author (Moskowitz 1979) stated that it would be impossible to understand the psychological laws of “why an odor has a given smell,” if all we know are the physical components and what each component smells like. We simply do not know the algebra of combination. Furthermore, the algebra is not simple at all... the order in which the odorants combine to generate a compound stimulus affects the predictability of the odor quality of that compound stimulus. Similar problems in the combination of basic qualities pervade the perception of textural attributes (Moskowitz and Kapsalis 1974), although these problems seem neither to be widely recognized nor discussed. The foregoing paragraphs can be summed up quite simply. We are quite far away from developing a model that shows the transformation of a physical stimulus of complex nature into a sensory percept. This type of psychophysical consumer models, therefore, awaits the future researcher, and provides an untapped field of enormous opportunity.
Psychological Based Response-Response (R-R) Models Another type of model relates the consumer responses respectively either to expert panel ratings, or to the ratings of the same consumers on other attributes. A good example of this is the relation of liking as the dependent variable to sensory attributes rated either by experts or by consumers. This is known as R-R or response-response analysis. R-R analysis is a favorite of sensory scientists because it reveals what attributes drive liking, what attributes drive image attributes, etc.
CONSUMER-DESCRIPTIVE DATA RELATIONSHIPS IN SENSORY SCIENCE
361
From the author’s experience, relatively few manufacturers develop consumer-based models. Thirty years ago, as noted above, the primary statistic used was the correlation coefficient, answering the question about what attributes linearly co-vary with liking. Computational power was limited, and the researcher did not want to spend the effort to create a quantitative model. It sufficed to state that a specific attribute correlated highly with liking. Perhaps the state of the art was sufficiently undeveloped so that the notion was unthinkable that the quantitative sensory models could be of any use beyond scientific publications. One can understand the chagrin of many researchers when the only variables that continued to correlate with overall liking were other liking attributes, and image attributes, respectively. This occurs because the fundamental relation between sensory attribute level and liking is a quadratic function. Given the quadratic function, linear correlations often cannot perform very well. Regression modeling, originally predicated on linear terms only, expanded the predictor set by adding more terms in order to explain more variability. One modification by Schutz (personal communication 1982) was to perform a factor analysis on the sensory attributes, identify the key factors, and then choose the specific sensory attributes most highly correlating with the factors. When the modeling was done with linear terms alone, the model had to suggest extremes of sensory attributes corresponding to optimal acceptance. When the modeling was done with quadratic terms, quite often some of the sensory attributes showed optimal liking in the middle range of sensory magnitude. Modeling such as the foregoing, using sensory attributes as predictors, may have made sense a decade or more ago, but in recent years this type of simplistic modeling has been replaced by modeling using factor scores. Approaches such as Partial Least-squares (Kermadec ef al. 1997) and Reverse Engineering (Moskowitz 1999) use factor scores, rather than sensory attribute values, as independent variables, for four different reasons: (1) Statistical Independence. The factor structure comprises statistically independent variables, rather than correlated variables. Even Schutz’s use of sensory attributes that correlate most highly with factors does not remove multicollinearity, whereas factor scores eliminate multicollinearity and ensure that the predictors are statistically independent. (2) Parsimony. The factor structure is relatively parsimonious. Rarely are there more than 3-5 independent factors. However, if truth be known, the researcher should use more independent factors, since it is some of these single factors, that are “unique,” which may add significant predictive power in the model.
362
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
(3) Comprehensiveness. The factors “span the space.” That is, with a reasonable factor solution many of the attributes are accounted for, which would not be accounted for by looking at the sensory attributes most correlated with the factor scores. (4) Non-Linearities. The factors can be used in both linear and square forms, as well as in cross terms (e.g., factor 1 x factor 2). Given that there is a quadratic relation between the sensory attribute and liking (Moskowitz 1981) it stands to reason that the model should allow quadratic terms. These quadratic terms often do not appear in standard modeling (viz., sensory attributes predicting liking), but they do appear easily in factor modeling.
After All Is Said and Done - What Should Modeling Accomplish? What is the ultimate goal of modeling? Is it to create equations that describe the data, using as few terms as possible? Or, is it to use the equations in order to identify some theoretically important, underlying parameters of behavior? That is, if we consider the model to represent some aspect of behavior, then the parameters of the model may be interpretable. An example is the slopes relating attribute liking to overall liking show the relative importance of a sensory input (Moskowitz and Krieger 1995). Or, in the end, is the goal of the model simply to describe relations between variables, in the manner of a convenient shorthand description, and then to use the model as an instrument to relate one set of variables to another set of the variables? Researchers, and sensory researchers among them, take themselves too seriously. It may well turn out that in the end the key benefit of modeling (whether S-R or R-R) is to enable practical applications of the model, such as the interrelations of consumers, experts and instruments (in reverse engineering), or the identification of what particular sensory attributes drive liking. REFERENCES CUSSLER, E., KOKINI, J.L., WEINHEIMER, R.L. and MOSKOWITZ, H.R. 1979. Food texture in the mouth. Food Technol. 33(10), 89-92. KERMADEC, F.H.D, DURAND, J.F. and SABATIER, R. 1997. Comparison between linear and non-linear PLS methods to explain overall liking from characteristics, Food Quality and Preference 8, 395-402. KRAMER, A. 1976. General guidelines for selecting objective tests and multiple regression application. In: ASTM STP 594, Correlating Sensory and Objective Measures, 48-55, J. Powers and H. Moskowitz, eds. ASTM, West Conshohocken, Penn.
CONSUMER-DESCRIPTIVE DATA RELATIONSHIPS IN SENSORY SCIENCE
363
MOSKOWITZ, H.R. 979. Odor psychophysics and sensory engineering. Chemical Senses and Flavor 4, 163-184. MOSKOWITZ, H.R. 1981. Sensory intensity versus hedonic functions: Classical psychophysical approaches. J. Food Quality 5, 109- 138. MOSKOWITZ, H.R. 1999. Inter-relating data sets for product development: The reverse engineering approach. Food Quality and Preference 1I, 105-1 19.
MOSKOWITZ, H.R. and KAPSALIS, J.G. 1974. Towards a general theory of the psychophysics of food texture, Proceedings of the Fourth International Congress of Food Science and Technology, Madrid, Spain. MOSKOWITZ, H.R. and KRIEGER, B. 1995 The contribution of sensory liking to overall liking: An analysis of six food categories. Food Quality and Preference 6, 83-90. POWERS, J.J. 1976. Experiences with subjective/objective correlations. In: Correlating Sensory and Objective Measurements - New Methods for Answering Old Problems, pp. 111-122, J. Powers and H. Moskowitz, eds. ASTM, STP 594, West Conshohocken, Penn. SCHUTZ, H.G. 1982. Personal communication. SCHUTZ, H.G. 1983. Multiple regression approach to optimization. Food Technol. 37, 47-62. ALEJANDRA M. m O Z
Importance of Consumer-Descriptive Data Relationships Sensory professionals design and conduct quantitative consumer and descriptive tests at different project stages and for different objectives. Consumer data are collected to study consumer acceptance, preferences and opinions about the products tested. In contrast, descriptive studies are conducted in order to obtain the qualitative and quantitative characterization of the sensory properties of products as perceived by a descriptive/expert panel. In the past decade there has been an increased interest in relating both sets of data, so as to create consumer-descriptive models, and/or study their data relationships (Greenhoff el al. 1994; McEwan 1996; Muiioz, e? al. 1996; Helgensen ef al. 1997; Kermadec e? al. 1997; Pagliarini ef al. 1997; Popper ef al. 1997; Muiioz 1998; Elmore ef al. 1999; Guinard ef al. 2001; Malundo ef al. 2001). These data relationships enable the researcher to understand or predict one data set based on the other. Usually, the data set that is easier or less expensive to obtain, is used to understand or to predict the more costly data set. In general, the consumer-descriptive data relationships may be developed with one of these two objectives (Muiloz 1997):
364
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
to better understand and interpret consumer responses through descriptive data to predict consumer acceptance and other consumer responses based on descriptive data The development of consumer-descriptive predictive models is straightforward. In this author’s opinion, the task is mainly a statistical endeavor, which focuses on regression analysis techniques (Neter et al. 1996). Models are developed to predict a consumer response, and not to understand or interpret consumer responses. This discussion focuses primarily on research associated with the first objective stated above. Data relationships are not needed for every project. First, not all consumer responses need to be interpreted and decoded. Consumers are able to rate attributes and reliably provide product information, as long as the attributes are simple and clear (see Chap. 10).Second, once a data relationship study has been completed for a given product category, all projects for that category can use and thus benefit from the data relationship previously created. Data relationship studies should be considered research projects that are completed only periodically (Muiloz et al. 1996). The results can be used for several years, until a substantial change occurs in the category, the market conditions or the consumer perceptions (see Chap. 13). A new studyhpdate is needed at that time. The rationale behind building consumer-descriptive data relationships is to utilize the best component of each data set (Mufioz 1997): (1) Consumer data are the sole source of information on liking/acceptance and preference. Only the consumers, users of the product, can validly judge whether products are liked or disliked. Consumers are also able to rate product attributes and provide reliable information, as long as the attributes are simple, easy to understand, and not ambiguous. However, there may be some limitations with attribute/diagnostic information obtained from consumers, as explained below.
(2) Descriptive data provide technical, and more precise product qualitative and quantitative sensory information. Descriptive panelists are trained to evaluate sensory properties and can describe the product’s characteristics in technical and precise ways. In addition, one consequence of their training is that panelists evaluate products in an unbiased way, unaffected by context and physiological factors (e.g., stimulus and logical error, halo effect, contrast effect, etc.). However, because of their training, they should not be asked to provide liking/preference information. They simply
CONSUMER-DESCRIPTIVE DATA RELATIONSHIPS IN SENSORY SCIENCE
365
cannot do so in the manner that the more appropriate consumer panel can do. Given the above, linking consumer to descriptive data allows the researcher to select the best component, and most reliable part of each data set. The consumer data can provide the necessary input for liking. The descriptive panel can provide the necessary input for the technical description of the product, thus can be used to interpret the consumer responses. In tandem, the two data sets provide the specific guidance for
formulation/reformulation. Resulting Information The information that consumer-descriptive data relationships can provide, includes (Muiioz 1997): (1) “Actionable” Product Guidance for Product Formulation/reformulation. The guidance is considered “actionable” since the direction for change is given based on descriptive and not consumer information. The consumer information indicates the needed change (e.g., increase consumer liking). The descriptive information shows the specific type of change (i.e., product attributes and the magnitude for that change expressed in technical/descriptive terms). Many researchers obtain this guidance through the exclusive use of consumer information. Compared to the above technique where consumer liking and descriptive/expert attribute data are linked, consumer liking is related to consumer attribute information. This is possible but may present some risks. Attributes included in a questionnaire may fall under one of three categories: Category # 1:
attributes that consumers understand and reliably evaluate.
Category #2:
attributes that consumers do not understand.
Category #3:
attributes that consumers either do not understand or confuse with other attributes.
Therefore, the direct use of consumer attribute information should be assessed, since: consumer attribute information in Category #1 can be used directly for product decisions
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
366
.
consumer attribute information in Category #2 should not be used. These terms used in consumer questionnaires should be changed and/or eliminated, since they may be too technical, integrated with other nonsensory issues, etc. consumer attribute information in Category #3 can be used, but should be first decoded and interpreted using descriptive results. This analysis unveils the product attributes that consumers are responding to when rating these attributes.
(2) The Drivers of Liking or Product’s Critical Attributes. These are the attributes that affect consumer acceptance, expressed in descriptive terms. (3) Predictions. Models that predict consumer responses (e.g., liking, chocolaty) based on descriptive data.
(4) Drivers of Consumer Attribute Responses. Product attributes (expressed in descriptive terms) that signal consumer responses of interest (e.g ., liking, “spicy,” “gourmet”) and integrated terms (e.g., “fresh,” “moisturizing,” “rich”). These applications have been discussed and exemplified by several researchers (Greenhoff ef al. 1994; McEwan 1996; Muiioz ef al. 1996; Helgensen ef al. 1997; Kermadec ef al. 1997; Pagliarini ef al. 1997; Popper ef al. 1997; Muiioz 1998; Elmore ef al. 1999; Guinard ef al. 2001; Malundo ef al. 2001).
Potential Problems with the Exclusive Use of Consumer Data As discussed above, the relation between liking and perceived product attributes can be obtained using consumer data exclusively. In this case, the attribute information used is the data obtained from consumer diagnostidattribute information. This is the standard practice followed by market researchers and sensory professionals who do not have access to a descriptive panel or do not believe in the benefits of descriptive data. Even though this is an accepted and common practice, researchers need to be aware of the characteristics of the consumer attribute information and the implications. Specifically, some diagnostidattribute information obtained from consumers may be either not sufficiently actionable and/or misleading. Specifically, consumer data used by itself may be (Muiioz 1997):
(1) Not Specific and Actionable Enough. Since consumers are not (and should not) be trained, the consumer terms need to be simple (e.g., bland) or
CONSUMER-DESCRIPTIVE DATA RELATIONSHIPS IN SENSORY SCIENCE
367
they have to be “integrated” (e.g., “rich,” “moisturizing,” “homemade,” “creamy”), if those are the terms consumers use to describe certain product attributes. This simple and integrated lexicon is not actionable or specific enough for guidance. What does the product developer do to make a product “less bland” or “more creamy”?
(2) Potentially Misleading. Potentially misleading direction from consumers may be obtained if they do not understand the term(s) in the questionnaire. Consumers will answer all questions, even if they are complex, too technical, and misunderstood. Therefore, these consumer results and the guidance provided may lead to the wrong direction for reformulation guidance. (3) Affected by Psychological Factors. Consumer responses may be affected by psychological errors, such as halo effects (i.e., the response to one attribute affects the responses to others in the questionnaire) and stimulus errors (e.g., effect of color, package). Therefore, the results for these affected attributes do not represent the true consumer response, and the resulting guidance may be misleading.
(4) Affected by the Connotation of Certain Attributes (e.g., very positive or negative attributes). There are some attributes that consumers rate based on their belief of or attitude towards that attribute and not based on their perception. Attributes that fall in this category are very positive (“moisturizing,” “chocolaty”) or very negative (salty, fatty). The responses to such attributes may mislead, since they do not reflect the true consumer opinion towards that attribute. The consumer’s intensity responses to those attributes are usually high for very negative attributes and low for very positive attributes. In these cases, consumers are only conveying their attitude and not their perception; i.e., that negative attributes are bad (therefore their intensity is always “too high,” thus “bad”), and positive attributes are good (therefore their intensity is always “too low” in a product). Consumer-descriptive data relationships can be used to unmask misleading responses and explain them, e.g., which attributes are unclear to consumers, which are affected by psychological factors, and which ones are affected by the connotation of the attributeherm. For example, Muiioz and Chambers (1993) discussed a case of misleading consumer attribute information and how descriptive-consumer data relationships can help explain the results. They showed that physical spiciness and consumer spice perception are not related in a product such as hot dogs. Therefore, if consumers would indicate that they want a “spicier” product, then subsequently increasing the spice intensity would be misleading, since this change would not result in an increase in “consumer spiciness. ”
368
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
Requirements A brief discussion of the needed elements to properly build and use consumer-descriptive data relationships and models and ensure valid relationships (Peryam and MuiIoz 1997) follows. A more detailed discussion of the requirements is presented by Muiioz ez al. (1996) and Irving et al. (1997).
(1) Products That Span the Product Category or the Variables to be Studied. Enough and sufficiently different products are needed in these studies. The characteristics of these products (specifically the variabledattributes and their ranges) will determine the “sensory” space to be studied and its boundaries. Conclusions can only be drawn within that sensory space. One of the most important decisions to make in establishing the boundaries of the sensory space to study for a complex product category is the segments to be investigated. Specifically a decision has to be made regarding the inclusion of all category segments (e.g., all salad dressings), or one segment only (e.g., Italian salad dressings). (2) Sound Consumer Data About the Products or Category of Interest. The consumer data set should be sound and contain those responses to be interpreted or predicted.
(3) Sound Descriptive Data. The descriptive data to be used should be generated by a very well-calibrated and experienced panel. Since these data are used to understand and/or predict consumer responses, it is assumed to be the most precise and valid data set. If the descriptive panel is only marginal, then data relationships should only be built with consumer data. (4) Adequate Statistical Support. This support is required to complete all statistical analyses needed to build models and study data relationships.
REFERENCES ELMORE, J.R., HEYMANN, H., JOHNSON. J. and HEWETT. J.E. 1999. Preference mapping: Relating acceptance of “creaminess” to a descriptive sensory map of a semi-solid. Food Quality and Preference 10(6), 465-476. GREENHOFF, K. and MACFIE, H.J.H. 1994. Preference mapping in practice. In: Measurement of Food Preferences. H.J.H. MacFie and D.M.H. Thomson, eds. Blackie Academics, London.
CONSUMER-DESCRIPTIVE DATA RELATIONSHIPS IN SENSORY SCIENCE
369
GUINARD, J.X.,UOTANI, B. and SCHLICH, P. 2001. Internal and external mapping of preferences for commercial lager beers: comparison of hedonic ratings by consumers blind versus with knowledge of brand and price. Food Quality and Preference 12(4), 243-255. HELGENSEN, H., SOLHEIM, R. and NAES, T. 1997.Consumer preference mapping of dry fermented lamb sausages. Food Quality and Preference 8, 97-109. IRVING, D., CHINN J., HERSKOVIC, J., KING, C.C. and STOUFFER, J. 1997. Requirements and special considerations for consumer data relationships. In: ASTM Manual 30,Relating consumer, descriptive and laboratory data to better understand consumer responses. A.M. Muiioz, ed. ASTM, West Conshohocken, Penn. KERMADEC, F., HUON DE, F., DURAND, J.F. and SABATIER, R. 1997. Comparison between linear and nonlinear PLS methods to explain overall liking from sensory characteristics. Food Quality and Preference 8, (5-6), 395-400. MALUNDO, T.M.M., SHEWFELT, R.L., WARE, G.O. and BALDWIN, E.A. 2001. An alternative method for relating consumer and descriptive data used to identify critical flavor properties of mango (Mungiferu indicu L.) J. Sensory Studies 16, 199-214. MCEWAN, J.A. 1996. Preference Mapping For Product Optimization. In: Multivariate Analysis of Data in Sensory Science, T. Naes and E. Risvik, eds. Elsevier Applied Science, New York. MUROZ, A.M. 1997. Importance, types and applications of consumer data relationships. In: ASTM Manual 30. Relating consumer, descriptive and laboratory data to better understand consumer responses. A.M. Muiioz, ed., ASTM, West Conshohocken, Penn. MUROZ, A.M. 1998. Consumer perceptions of meat. Understanding these results through descriptive analysis. Meat Science 40, 287-295. MUROZ, A.M. and CHAMBERS, E. IV. 1993. Relating sensory measurements to consumer acceptance of meat products. Food Technol. 47(11), 118-131, 134. MUROZ, A.M., CHAMBERS, E. IV. and HUMMER, S. 1996.A multifaceted category research study: How to understand a product category and its consumer responses. J. Sensory Studies 11, 261-294. NETER, J., KUTNER, M.H., NACHTSHEIM. C.J. and WASSERMAN, W. 1996.Applied Linear Statistical Models. Irwin, Chicago. PAGLIARINI, E., MONTELEONE, E. and WAKELING, I. 1997. Sensory profile description of Mozzarella cheese and its relationship with consumer preference. J. Sensory Studies I2(4), 285-301.
370
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
PERYAM, D. and MUROZ, A.M. 1997. Validity. In: ASTM Manual 30. Relating consumer, descriptive and laboratory data to better understand consumer responses. A.M. Muiioz,ed. ASTM, West Conshohocken, Penn. POPPER, R., HEYMANN, H. and ROSSI, F. 1997. Three multivariate approaches to relating consumer to descriptive data. In:ASTM Manual 30. Relating consumer, descriptive and laboratory data to better understand consumer responses. A.M. Muiioz, ed. ASTM, West Conshohocken, Penn.
MAXIM0 C. GACULA, JR. With the advent of computer technology, creating models using sensory and consumer data, as well as their relationships to chemical and instrumental measures have become common activities in both the academia and industry. Using language from experimental psychology, Moskowitz adapted the terms SR model (Stimulus-Responsemodel) and R-R model (Response-Response model) for interrelating data sets. The R-R model, which involves relating consumer and trained panel data, is the most difficult challenge in the statistical analysis due to so-called multicollinearity. Muiioz discussed the problems of gathering and relating descriptive and consumer data, particularly the problem of inconsistent understanding of sensory attributes between the consumer and the descriptive panel. As a result, creating consumer-descriptive models requires extremely careful planning and statistical analysis. In the following discussion, the author expresses his perspectives on model building and the semantics of sensory attributes as perceived by the trainedlexpert panel and the consumer. In addition, the statistical aspects of model building are reviewed. Exploratory Data Analysis
Exploratory Data Analysis or EDA has been an exciting area in statistics for the last 20 years or so. The activities encompassed by EDA include: analysis of observational data, fitting equations to data, and most importantly the problem of data collinearity. These are the major activities encountered in the exploratory analysis of sensory and consumer data. Relating consumer and descriptive data is an example of observational data analysis, in the sense that an equation is fitted to the data, without knowing the underlying relation between consumer responses and those from descriptive panel. Typically, old consumer-descriptive data are collected and explored, merged, and various types of models fitted. The fact that the resulting data come from human responses suggest that often the variables are correlated. The issue
CONSUMER-DESCRIPTIVE DATA RELATIONSHIPS IN SENSORY SCIENCE
371
of multicollinearity comes into play, which then requires particular statistical techniques in order to achieve data independence and ease of interpretation.
Multicollinearity In regression analysis, multicollinearity occurs when there is a correlation or near-linear relationships among the independent variables (predictor variable X) used in the study. Correlated predictor variable is commonly encountered in sensory and consumer testing, and is often beyond the control of the sensory scientists. As a result, special statistical techniques are used in the analysis to control or minimize the effects of multicollinearity. These effects are known in the statistical literature (see Neter er al. 1996) and are summarized below: (1) A set of biased estimate of regression coefficients, and their test of significance, resulting from large sampling variability. The affected regression coefficient estimates would be unstable with high standard errors. The regression coefficients no longer reflect inherent effects of a particular predictor variable. Thus, studies dealing with the effects of predictor variable on the response become a problem to interpret validly. Useful statistical methods to deal with this problem will be briefly reviewed below.
(2) In spite of multicollinearity, it is still possible to obtain inferences on mean responses and predictions of new observations, provided that the predictions are not unduly extrapolated. See, for example, Ward and Gacula (2000) on an in-vitro assay study. Here the interest is the prediction of a response given a point of interest in the log concentration X, such as those shown in Figs. 21.1-21.3. Polynomial Regression Perhaps, this is one of the commonly used models to relate two data sets because of its simplicity. For a quadratic model (2nd degree polynomial regression), the equation is Yi = B,
+ B,X + B2X2 + Ei
One can continually add the X term of increasing exponent, i.e, X3, X4,until the R2 statistic approaches 1.O. The terms in the equation are defined as follows: Y = consumer data (response variable), X = descriptive data (predictor variable), B, = intercept, B, = linear regression coefficient due to X, B, = second degree regression coefficient due to X2,and Ei = random error. A study using the quadratic model relating hedonics and sensory attributes was reported by
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
312
Moskowitz (2000) together with four discussants of the paper. Polynomial regression fits most data regardless of how the curves look. Thus, the curve may be visually different, but is describable by the nth degree polynomial. The graphs in Figs. 21.1, 21.2, and 21.3 are all polynomials of the 3rd degree (polynomial cubic regression) where X = Logcon. The curves are visually different. Hence the interpretation of the results differs. There should be experimental or biological reasons for interpreting these curves. The dash lines in these figures are the lower and upper 95% confidence limits of the cubic regression line.
I
-1
Polynomial Cubic Fbgrrrlcil
-to
-08
4 6
-0.4
-0.2
ao
0.2
a4
a6
0.0
1.0
u
14
t6
LO
2.0
logcon
FIG. 21.1. EQUATION: Y1 = 88.62 - 45.25X - 20.03X2 ADJUSTED R* = 0.97
+ 14.31X’.
These three graphs illustrate the danger of extrapolating the response beyond those used in the experiment, which are from -1.0 to 2.4 (log,, scale). As expected there will be multicollinearity among the X variables since they are not independent, being the square and cube of the original variable X. Thus, interpretation of their respective regression coefficients as to their effects on the response variable Y must be interpreted with caution and in some cases may be meaningless. However, as indicated earlier the regression line can still be used to predict an intersection point of interest on the regression line.
Remedial Measures The popular statistical methods to deal with multicollinearity in the data are factor analysis, principal component analysis, partial least-squares regression,
CONSUMER-DESCRIPTIVE DATA RELATIONSHIPS IN SENSORY SCIENCE
373
Polynomial Cubic Regression
FIG. 21.2. EQUATION: Y4 = 93.59 - 35.57X - 12.21X2
w-.05-
wIS 80. 76. 70.
05 .
8Q. 55. 60.048 : 40 :
3s:
IIm. 0I0-
.--_ ____.____...__ ...---*------**
+ 8.1OX'. ADJUSTED Rz = 0.93
374
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
and ridge regression. Except for ridge regression, these statistical methods are widely used in Sensory Science and discussed in an ASTM publication edited by Muiioz (1997), and Moskowitz in this chapter on the use of factor scores derived from factor and principal component analyses. See also Martens and Martens on partial least-squares regression given as Chap. 9 in Piggott (1986). What is needed at the present time is an extensive comparison of the statistical methods using actual data or computer simulated population that would answer the following questions: (1) What method should the researcher use for medium to highly correlated data or does it matter? (2) Would the same results emerge using different statistical methods? (3) What are the limitations of the various methods? (4) Which methods are easy to use and give results that are simple to interpret?
Computer software is widely available for conducting studies comparing these statistical methods. Among them are NCSS 2000 (Number Cruncher Statistical Systems), SAS/STAT (SAS Institute Inc.), and The Unscrambler (CAMO, Inc.). We hope that such statistical comparisons would be seen within the next decade.
REFERENCES NCSS 2000. 1998. NCSS 2000 User’s Guide - 11. Number Cruncher Statistical Systems, Kaysville, UT. MARTENS, M. and MARTENS, H. 1986. Partial Least-squares Regression. Chap. 9. In: Statistical Procedures in Food Research, J.R. Piggott, ed. Elsevier Applied Science, London and New York. MOSKOWITZ, H.R. 2000. On fitting equations to sensory data: A point of view, and a paradox in modeling and optimization. J. Sensory Studies 15, 1-30. MUROZ, A.M. ed. 1997. Relating Consumer, Descriptive, and Laboratory Data. ASTM Manual 30, ASTM, West Conshohocken, Penn. NETER, J., KUTNER, M.H., NACHTSHEIM, C.J. and WASSERMAN, W. 1996. Applied Linear Statistical Models. Irwin, Chicago. SAS INSTITUTE. 1999. SAS/STAT User’s Guide, Version 8, Cary, NC. THE UNSCRAMBLER. CAMO, Corvallis, OR. WARD, S.L. and GACULA, JR., M.C. 2000. Performance ofthe HCE-T TEP human corneal epithelial transepithelial fluorescein permeability assay. Presented at the Alternative Toxicological Methods for the New Millennium, Nov. 28-Dec. 1, 2000, Bethesda, Maryland.
CHAPTER 22 PRODUCT AND PANELIST VARIABILITY IN SENSORY TESTING HOWARD R. MOSKOWITZ Sensory Science prides itself on precise measurement of sensory experience, e.g., through expert panels, or by correlation with physical measures. Often variability is considered as a secondary factor, an annoyance to be eliminated, rather than a fact of life that must be dealt with in a constructive, and even instructive manner. Thus variability in test conditions has been reduced through the use of laboratory-grade test facilities (e.g., white booths, that isolate the panelist, and eliminate any distractions). This attempt at noise suppression, discussed in previous sections, reflects the world-view of the sensory scientist.
Product Variability The reduction in variability cannot always be accomplished. Products by their very nature vary. Although it may be commendable to maintain product constancy in laboratory situations, the practical applications of Sensory Science must deal with the random variation across products of the same type. Some of this is inevitable, some is the result of unexpected, unwanted chance. Even within product batches there is variation from product to product, caused by the natural variation of the food item. We need only think about the pizza to recognize that the product will never be the same on two occasions. This variability is a fact of life. The real key, however, is how to integrate the variability into the analysis, so the regular or expected variability in the product, i.e., noise in the system, does not affect the analysis. What is surprising, however, is that we form stable concepts of many foods. Even though the precise experience of one sample differs from that of another sample, we can state with reasonable accuracy in the real world that a product either conforms to what we expect or else does not conform. Problems arise when we begin to deal with product variability at the intellectual level required by product tests. Most product tests rely upon the attribute scales, requiring the panelist first to think about the attribute, and then to profile the magnitude of that attribute. Other product tests instruct the panelist to rate the degree to which the product sample resembles the “gold standard.” If there is product variability, then quite often the analyst finds himself in a quandary. Product variability, as distinct from panelist variability, is difficult to deal with on a statistical level because one does not know whether the variation is due to the product itself, the panelist, or even the order of the evaluations. 315
376
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
What causes the variability in the ratings? What should the researcher conclude? The astute, but perhaps not particularly inquisitive researcher, can lay blame on one or another methodological issue. The more inquiring researcher will be faced with the problem of what to conclude. If there are differences in the ratings, then should the researcher conclude that the products differ from each other? How much of the difference should be attributed to normal product variation, and how much of the difference should be attributed to true product differences from the gold standard? The issue of product variability reduces down to the issue of mapping a continuous variable (or set of variables, such as sensory attribute profiles) to a binary response (same/different). The different “natural variations of a specific product generate correspondingly different descriptive profiles. The question then becomes one of classifying each of these profiles as representative of a product or not representative of the product. The classification is subjective. The classification cannot be initially accomplished using statistical programs. The classification depends upon the panelist. Afterwards, however, statistical programs can be developed which classify a given profile as representative of the target product, albeit different quantitatively from that profile because of natural variation. Dealing with products that vary naturally from instance to instance brings Sensory Science into the realm of cognitive psychology. Cognitive psychology deals with the ways in which we process external information to arrive at a decision. To a cognitive psychologist, the problem faced by the sensory scientist falls into the category of “concept formation. The concept is the panelist’s idea or conception of the product. Each specific variation tested may either be an example of that idedconcept or not an example of that idedconcept. The cognitive psychologist tries to understand the rules used by the panelist in order to classify a product as representative or not representative of a target product. On a more practical level, variation in a product is critical to understand from the point of view of quality assurance, i.e., does the product lie within specification or outside of specification? Cognitive psychology can help here. O’Mahony and his colleagues (O’Mahony er al. 1990) have dealt with the issue of concept alignment in Sensory Science - viz., do all the panelists have the same idea of what the product is. O’Mahony’s point of view can be extended to product variation/product integrity, representing the other side of the same coin. The key issue is to create an approach that allows the researcher to conclude that the natural product variability does not produce a product that is out of specification.
PRODUCT AND PANELIST VARIABILITY IN SENSORY TESTING
377
Panelist Variability - What Are the Sources and the Remedies? Inter-individual differences are pervasive. A lot of the effort expended by sensory scientists tries to reduce this variability, either by training the panelists or by averaging together the data from many panelists. There is also a wellentrenched effort to eliminate panelist variability by locating the test venue in some type of sterile environment, such as the white booth, so that the panelists will not be distracted. It is presumed that people differ in their reactions to the external environment, and that at least some of the panelist-to-panelist variability will disappear in this well-controlled situation. We are still left, however, with differences among people. Can we ever hope to eliminate those differences? The answer is probably not, despite the heroic efforts to train panelists to become instruments. One can cope with panelist-to-panelist variability by assuming that there exists a common underlying scale of magnitude for liking or sensory intensity, or a common underlying geometric space of perceptual differences between products. Interindividual differences in such a world simply become parameters to be adjusted, with the parameters representing how the individual modifies the ”presumably true” underlying scale. This type of approach assumes a common world, and individual variation of that common world. A different approach is to recognize that people do differ, and that one can use common scaling techniques, but that the results will differ from person to person (Moskowitz er al. 1985). The subtle distinction between this position and the one just described is that the current notion is that the inter-individual differences cannot be simply accounted for by a single parameter that somehow “adjusts” a basic, common underlying system. There is no single bias parameter. An example of this second approach is sensory segmentation which uses the relation between sensory magnitude and liking to show the dynamics of different groups of people, but does not assume that one group can be “transformed” into another group by some simple statistical operation. Sensory segmentation posits that there are common general relations between sensory magnitude and liking, but there are radically different groups, not differing by a single bias parameter, but rather differing fundamentally in their response patterns.
REFERENCES MOSKOWITZ, H.R., JACOBS, B.E. and LAZAR, N. 1985. Product response segmentation and the analysis of individual differences in liking. J. Food Quality 8, 168-191.
378
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
O’MAHONY, M., ROTHMAN, L., ELLISON, T., SHAW, D. andBUTEAU, L. 1990. Taste descriptive analysis: concept formation, alignment and appropriateness. J. Sensory Studies 5, 71-104.
ALEJANDRA M. M O Z In Sensory Science, the topic of variability is of paramount importance. In most cases, sensory professionals are “introduced” to variability when they start working with panels. In this capacity, they become acquainted with the nature of this “variable sensory instrument” (i.e,, the panel), and apply panel training techniques to reduce the panel variability. However, sensory professionals must be aware o f (1) all types of variability involved in consumer and analytical sensory tests, besides the panel variability (2) the sources of variability in a study (3) the effect of this variability on the data (4) the non-statistical procedures used to minimize the existing variability (5) the statistical and experimental design tools that can be used to identify, measure and/or account for that variability.
A brief discussion of some of these issues follows. The Need To Identify All Sources of Variability Sensory professionals recognize the panel “noise” as an important source of variability in sensory evaluations. This author emphatically recommends that sensory professionals ask these questions to understand and know how to deal with the encountered variability: is the observed variability due to panelists? what are the non-statistical procedures that can be applied to minimize this variability? how can statistics and experimental design be used to identify, measure and account for this variability? The answers to these questions will guide the sensory scientists in understanding, minimizing and accounting for the observed variability.
PRODUCT AND PANELIST VARIABILITY IN SENSORY TESTING
379
Separating Panelist Variability from Other Sources of Variability A common mistake made by sensory professionals is to exclusively ascribe the observed variability to panelist variability. Since in Sensory Science we work with a variable instrument (e.g., the panel), it is believed that the observed variability must be due to panelists! Often this may be true, however, there can be many other sources of variability responsible for the outcome, specifically product variability and other sources of variability introduced into the test. The test conditions and the products must be assessed before we can attribute the variability to panelists. Test conditions that may have been responsible for the observed variability include: temperature and humidity differences, unusual environmental effects or differences occurring across evaluations and sessions (e.g., condition of room, carpets), etc. Product variability may be either inherent or created during sample preparation and presentation, or even both, as explained below. When using instruments, the product variability can be easily identified, because of the inherent lack of or low variability in instrumental measures. Therefore, the variability measured from sample to sample can be attributed to product variability. However, the total variability observed in sensory studies may result from panel variability, or product variability, or both. Therefore, the observed variability should not be exclusively attributed to the panel. Other sources of variability may exist and need to be explained before any action is taken with the panel. The following steps are recommended to separate the sources of variability when unexpected and a considerably large variability or inconsistent results are obtained in sensory tests: (1) First, check for product variability or lack of test controls, as follows:
a. Inspect product samples (e.g., different batches, packages) to determine if there is a considerable degree of variability in the product (ideally, remaining test products should be available for this and other assessments). b. Assess the test conditions and execution of the test to unveil possible sources of variability introduced to the test (e.g., differences in room temperature, sample serving conditions, preparation, etc.). (2) Conclude that the panel is the source of variability, if the assessment of
products and test conditions shows that product variability is not the cause of the large observed variability.
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
380
The Different Sources of Variability Encountered: Product, Panelist and Consumer Variability 1. Product variability
Firstly, product variability is discussed, since it should be assessed before the panel variability is considered. As discussed above, product variability may be either inherent or created during sample preparation and presentation, or even both. Inherent variability is the product variation that exists from batch to batch, lot to lot, package to package, and season to season. In addition, product variability reflected in differences from sample to sample, or session to session may be created during sample preparation or through environmental factors. Product variability may exist when:
(1) introduced by improperly cooking different food batches (e.g., meats, sauces), (2) created through serving samples at different temperatures, or serving nonuniform portions (e.g., different proportions of cereal, fruit and nut pieces across servings), (3) created through nonuniform environmental temperature and humidity conditions across evaluation sessions, etc. In addressing product variability, sensory professionals should: (1) measure the inherent product variability (2) carefully control all test conditions to avoid the introduction of additional variability due to product (e.g., provide homogenous samples, such as serving uniform servings of cereal/fruit pieces/nuts, or soup/vegetables/pasta, etc.). (3) account for this variability in the experimental design used (Gacula and Singh 1984; O'Mahony 1986) (4) use product variability information in the data interpretation.
a. Determining inherent product variability There are two ways to obtain information on inherent product variability, which is due mainly to production or distribution variability: Informally assess different product batches, lots or packages prior to a test to gather some information on the degree of product variability Conduct planned and formal studies geared to study production or distribution variability.
.
PRODUCT AND PANELIST VARIABILITY IN SENSORY TESTING
381
Production or distribution variability studies should be completed to obtain reliable product variability information. These studies are time consuming and expensive. Therefore, these studies should at least be conducted for the company’s main product categories, if they cannot be completed for every product category. The procedure to design and complete variability studies is explained by Muiioz et al. (1992). The technique is explained for a quality control application to study production variability, but the approach can be used to study any product variability: retail, distribution, cross plant, cross country, etc. The assessment of product variability requires the sampling/collection, organization and storage of products, their assessment or evaluation, and the data collection and analysis of this variability information (Muiioz et al. 1992). b. The value of product variability information Variability studies are time consuming and expensive because of the large amount of samples involved. This translates into a considerable amount of technician and professional time needed to acquire, organize and evaluate products. However, there is a wealth of information gathered, such as: i. across categories: The product categories with the largest variability ii. within a category the product attributes with no appreciable variability the degree of variability of nonunifodvariable product attributes: small, medium, large the ranges of variability for each variable attribute the frequency or percentages of products distributed across the variability ranges This information can be used for QC and R&D applications. For a QC application, this information is used to either: provide guidance to decrease the production variability investigate consumer acceptance to production variability establish QC specifications Details of these procedures are reported by Muiioz et al. (1992).
382
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
For R&D uses, the variability study results can aid a sensory professional to separate the product from the panel variability in the data interpretation. Also product developers can use these variability results for reformulation guidance. For example, in improvement projects, these results can be used to determine the degree to which a product attribute needs to change to yield a difference larger than the product differences observed in production. This application is explained by Mufioz (2002). 2 . Panel variability Only after the non-panel effects on variability have been studied, can the panel variability be addressed. To assess panel variability, several questions should be asked: a. Are there only a few attributes that show variability, or is the variability encountered in many/most attributes? b. Which are the attributes that frequently show variability? c. Is the variability due to qualitative/attribute difficulties? d. Is the variability due to quantitative issues? (e.g., type of scale, problems with quantitative references) e. Is the variability due to one, a few or all the panelists? f. Who are the outliers? A brief discussion on the remedial approaches for the above cases is discussed below.
3 . Consumer variability
Sensory professionals are not concerned with consumer variability because it is recognized and accepted as a source of variation that can only be addressed in the design of consumer tests, and in the data analysis. This variability is handled differently than panelist variability, since consumer data variability cannot and should not be decreased through training. Consumer variability information is used when determining the required sample sizdconsumer pool, and when choosing a test design. In addition, this variability is accounted for by blocking and by applying the appropriate data analysis.
The Non-statistical Procedures That Can Be Applied to Minimize Variability The discussion above dealt with the assessment and identification of the sources of variability before remedial procedures can be applied. The next
PRODUCT AND PANELIST VARIABILITY IN SENSORY TESTING
383
course of action is the reduction of this variability. Statistics should not be considered at this point other than those procedures that can assist the sensory professional to identify or measure the degree of variability. The chosen remedial procedures depend on the source of variability. 1. If variability is introduced by lack of test controls, sensory professionals need to develop new protocols to avoid this variability/noise. 2. If the variability is due to panel issues, methodological changes need to be implemented in the descriptive program, or the calibration/retraining of panelists is needed. This chapter is not meant to provide an in-depth discussion of techniques to address panel issues and calibration. Therefore, only brief recommendations for the above situations are given below: a. Are there only a few attributes that show variability, or is the variability encountered in many/most attributes? When the variability is encountered in many attributes, the complete panel and ballot need to be assessed. When the variability is only encountered in a few attributes, those attributes should be reviewed with the panel to unveil problems in the definition, procedure, etc. Alternatively, they need to be identified as inherently variable, as discussed below. b. Which are the attributes that frequently show variability?
Attributes that frequently show variability, require special attention, particularly if retraining sessions have been conducted, There are indeed attributes that always present high variability because of actual physiological reasons. Attributes in this category include: in skinfeel evaluations: absorbency, oily/greasy/waxy residues. In flavor: bitterness, chemical feeling factors (e.g., chemical heat, coolness), caramelized. In oral texture: moisture absorption, moistness, etc. Because of the inherent variability in these attributes, significant differences between products will be found only when differences are large. c. Is the variability due to qualitative/attribute difficulties? If the problems have been identified as a result of qualitative problems, the attribute or the complete ballot must be reviewed, clarified and/or modified (refer to Chap. 18).
384
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
d. Is the variability due to quantitative issues? (e.g., type of scale, problems with quantitative references) A new scale or quantitative references should be considered. If quantitative references are already used, they need to be reviewed or modified. e. Is the variability due to one, a few, or all the panelists? If only a few panelists are the cause of variability, the remedial procedures should be addressed only with those panelists. f. Who are the outliers? Panelists who are outliers must be identified. If a panelist is an outlier in attributes or products, his dismissal from the panel should be considered. Equally important is the case when a panelist is a “consistent” outlier in one attribute. This author considers that if the assessment was complete and unbiased, the panelist’s scores for that attribute could be disregarded without eliminating his scores for other attributes. Also, if this practice is followed, the panelist’s scores for the problem attribute should always be eliminated. 3. If the variability has been identified as inherent product variability and cannot be minimized, this source of variability should be considered in the experimental design used and the data analysis.
How Can Statistics and Experimental Design Be Used To Identify, Measure and/or Account for Variability? Experimental design and other statistical tools are critical in identifying, measuring and/or accounting for the different sources of variability encountered in sensory and consumer studies. This discussion is being covered lastly, not to undermine its value. This author wanted to emphasize the importance of first assessing the different sources of variability and applying non-statistical procedures to reduce variability. Statistics help us identify, measure and account for variability in the analysis, not reduce it. However, the methodological approaches above do, and thus should be considered first. Sensory professionals and statisticians have worked for years in the application of statistical and experimental design procedures to measure and account for panelist’s variability (Gacula and Singh 1984; O’Mahony 1986; Naes and Solheim 1991; Gatchalian ef al. 1991; Schlich 1993; King and Arents 1994; Dijksterhuis 1997).
PRODUCT AND PANELIST VARIABILITY IN SENSORY TESTING
385
The discussion in this chapter has emphasized the importance of reducing and/or accounting for the different sources of variability, since it affects the power of tests. Since high variability reduces the ability to detect product differences, every sensory professional should strive to minimize panelist variability. This topic will always be of great importance to professionals in the area of Sensory Science. The Sensometrics meetings address this issue, and have done so for several years. Sensometrics, the forum that is dedicated to combine mathematical and statistical techniques to sensory and consumer data, deals with one or another topic of panelist variability at each meeting (The Sensometrics Society 2003). Recently, within ASTM committee El8 on Sensory Evaluation, task group E18.03.07 on Panel Performance and Tracking was formed to develop a guide that discusses the methods to track panel performance and deal with panel variability, among other important performance measures (ASTM 2003).
REFERENCES ASTM. 2003. Standard guide for measuring and tracking sensory descriptive panelist performance. ASTM, West Conshohocken, Penn. (in preparation). DIJKSTERHUIS, G.B. 1997. Multivariate Data Analysis in Sensory and Consumer Science. Food & Nutrition Press, Trumbull, Conn. GACULA, JR., M.C. and SINGH, J. 1984. Statistical Methods in Food and Consumer Research. Academic Press, San Diego. GATCHALIAN, M.M., DE LEON, S.Y. and YANO, T. 1991. Control chart technique: A feasible approach to measurement of panelists performance in product profile development. J. Sensory Studies 3, 239-254. KING, B.M. and ARENTS, P. 1994. Measuring sources of error in sensory texture profiling of ice cream. J. Sensory Studies 9, 69-86. MUROZ, A.M., CIVILLE, G.V. and CARR, B.T. 1992. Sensory Evaluation in Quality Control. Chapman & Hall, New York. MUROZ, A.M. 2002. Sensory Evaluation in Quality Control: An overview, new developments and future opportunities. Food Quality and Preference 13, 329-339. NAES, T. and SOLHEIM, R. 1991. Detection and interpretation of variation within and between assessors in sensory profiling. J. Sensory Studies 6, 159- 177. O’MAHONY, M. 1986. Sensory Evaluation of Food: Statistical Methods and Procedures. Marcel Dekker, New York. SCHLICH, P. 1993. GRAPES: A method and a SAS program for Graphical Representation of Assessor Performances. J. Sensory Studies 9, 157- 170. THE SENSOMETRICS SOCIETY. 2003. www.Sensometric.org.
386
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
MAXIM0 C. GACULA, JR. As Moskowitz stated “variability is a fact of life.” In statistics, the basic foundation of various statistical methods is the estimation of random variability. If random variation cannot be estimated, a statistical test does not exist. Thus in statistical analysis each observation is defined by the so-called statistical model which describes the composition of that observation. The statistical model facilitates isolation of random variability. In the application of statistics to assess sensory response variability, it thrives on chaos and innovation. Muxioz and Moskowitz emphasized that product variability and panelist variability together constitute the error that should be reduced in the data. Procedures for experimental reduction of these variabilities are widely known in practice. Statistical reduction of these variabilities are also well-known by using the appropriate experimental design, randomization, blocking, random sampling of product samples, and sampling of target consumers, among others. In this chapter, Moskowitz and Muiioz discussed the origin of product and panelist variability and provided possible guides for their reduction. Except for mapping stated by Moskowitz for product variability, the author concurs with their discussions. In this section, the statistical aspect is presented and opens an avenue of studying sensory variability by computer simulation and bootstrapping. Current applied books on this subject have been written by Manly (1997) and Good (1999). Partitioning of Variabilities How much of product and panelist variations exist in the data? In practice, the F-test in the analysis of variance is used to detect whether these variations are real, but this does not answer the question on relative magnitude. The amount of variations can be estimated using the variance component method, assuming that the appropriate experimental design has been used. To estimate product variation, one should include sampling time (batch to batch), manufacturing plants, and others, in the design. These are the extraneous variations that should be separated from product variability. For a research guidance consumer test that involves company products, the foregoing extraneous sources of variation can be easily identified and separated in the collection of the experimental samples. An observation can be described by a statistical model written as
where
PRODUCT AND PANELIST VARIABILITY IN SENSORY TESTING
Y,,
= an observation,
M Ai Bj C, (AB),, (AC),, (BC)j, E,,
= overall mean,
381
effect of the ithproduct, effect of the jrhbatch, effect of the kthplant, interaction effect between products and batches, = interaction effect between products and plants, = interaction effect between batches and plants, = random error. = = = =
The test of significance for differences among products is free of the extraneous variation as they are partitioned in the statistical model. Notice that panelist variation is part of random error, since replication is generally not done in a consumer test. There is nothing wrong with this, as panelist variability is an inherent composition of consumer data. What is needed is a good sampling procedure so that each panelist has an equal chance of being selected. The next step is to estimate the variance component of each effect in the statistical model. This provides the percentage of variance contributed by each effect that is simpler for the sensory scientist to evaluate. An effect in the model may be statistically significant, particularly due to high degrees of freedom, but its contribution to the total variability is small and thus has no practical implication. For a large-scale consumer test where we do not have control of the product samples, product sampling in the marketplace is critical. This can be accomplished by obtaining products with different code dates, but same “sell by date” and from different locations. The statistical model for an observation in this test is
Y,, = M
+ A, + Bj + E,,
where
Y,, M A, Bj E,,
= an observation,
= overall mean, = effect of the i” product,
= effect of the j” location, = random errors.
In this model a random sample of the respondents from each location is vital in order to obtain a reasonable estimate of panelist variation. Specifically, the composition of the random errors in this model is
E,, = randodchance variation
+ panelist variation.
388
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
Furthermore, random or chance variation may come from products and locations that are not controllable. Notice that we did not suppress panelist variation, but have to obtain a realistic estimate of panelist natural variation by random sampling. In so doing the estimate of the various effects in the model is robust, that is, results of a study would have a high degree of repeatability or confirmability. To confirm this, simulation or bootstrapping can be used.
Simulation and Bootstrapping The effect of variability on the observation from a consumer test is difficult to obtain experimentally because one must repeat the experiment. However, with the availability of computer technology, computer simulation can be done to study variation and its effect on the data. The parameters to be used in the simulation would come from the results of the study to be simulated. As expected, it is considerably cheaper than conducting another consumer test. The success of computer simulation in an environment where the data are difficult to obtain and very expensive can be exemplified by an example from risk analysis research. In the bootstrap method, the consumer data from the study are re-sampled repeatedly. The bootstrap data are statistically analyzed similar to that of the original consumer data and the results are compared. The result can be interpreted as if we have repeated the study using the same population; hence it would serve as a cross-validation of the original study. This author (MCG)feels that simulation and bootstrap methods to evaluate product and panelist variability can look forward to a bright future in the next decade. Because of its considerably low cost to conduct, the author welcomes case studies. One is presented below. There are others, and indeed most studies can furnish the necessary raw materials for a bootstrap exercise.
Case Study on Antiperspirant Descriptive analysis data for five products using six judges in two replications were simulated using SAS (SAS Institute 1999). Table 22.1 shows the descriptive statistics for the five products for “application drag” attribute. In order to cover 95% of the sample data, two standard deviations were used in the simulation. The SAS program code is shown in Table 22.2. In this code, notice the role of the mean and standard deviation in simulating the sample data for products X, to X,. Table 22.3 shows the results which indicate the closeness of the mean estimates using h=20. Of course, one can use higher n in the program code to 150 (do i= 1 to 150), but for illustration purposes this will suffice. A value of zero should be used for negative estimates, which correspond to the lowest value
PRODUCT AND PANELIST VARIABILITY IN SENSORY TESTING
389
on the 0-15 rating scale. One can proceed applying the analysis of variance of the simulated data in Table 22.3 for comparison to the original data. TABLE 22.1. DESCRIPTIVE STATISTICS FOR FIVE SAMPLES USED IN THE COMPUTER SIMULATION
1.134
1.286
Note: 2 STD indicates a 2-standard deviation value. Means with the same letter are not significantly different at the 5% level by Duncan's multiple comparison test.
TABLE 22.2. COMPUTER SIMULATION SAS PROGRAM CODE
*
prog simula3.sas; data n o d ; retain seedl seed2 seed3 seed4 seed5 1ooO; do i = l to20; xl=4.36 + sqrt(0.316)*rannor(seedl); x2= 3.99 + sqrt(0.484)*rannor(seed2); x3= 3.39 + sqrt(0.702)*rannor(seed3); x4=2.74 + sqrt(7.355)*rannor(seed4); x5=2.60 + sqrt(l.286)*rannor(seed5); if i = l then do; seed2=1OOo; seed3=1OOo;
seed4= 1OOo; seedS=lOOo; end; output; end; run; proc print; id i; var seedl seed2 seed3 seed4 seed5 xl x2 x3 x4 x5; title"Simu1ating products xl x2 x3 x4 x5"; run; proc means mean n std rnin max rnaxdec=3; var xl x2 x3 x4 x5; title"Descriptive statistics"; run:
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
390
TABLE 22.3. RESULTS OF COMPUTER SIMULATION FOR FIVE PRODUCTS USED IN DESCRIPTIVE ANALYSIS seed1 seed2
i
1 2 3 4 5
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
1000 1000 lo00 1000 1000 lo00 1000 1000 lo00 lo00 lo00 1000 1000 1000 1000 lo00 1000 1000 lo00 lo00
1000 lo00 lo00 lo00 1000 lo00 lo00 lo00 lo00 lo00 lo00 lo00 lo00 1000 lo00 lo00 lo00 1000 1000 lo00
seed3 seed4 lo00 lo00 lo00 1000 lo00 lo00 1000 1000 lo00 lo00 1000 1000 1000 lo00 1000 1000 lo00 lo00 lo00 lo00
1000 lo00 lo00 lo00 1000 lo00 lo00 1000 lo00 lo00 lo00 lo00 1000 1000 lo00 1000 lo00 1000 1000 lo00
seed5 lo00 lo00 lo00 lo00 lo00 lo00 1000 lo00 1000 lo00 lo00 1000 1000 lo00 1000 1000 lo00 lo00 lo00 lo00
xl
x2
x3
4.49177 4.94407 4.44306 3.72191 4.86369 5.52118 3.84324 4.51509 4.79939 4.95051 3.21328 4.56113 4.54799 3.46121 5.27838 3.60726 5.14711 4.41057 4.15179 5.07266
3.64230 4.36926 4.10539 4.88084 3.68924 3.57041 4.42662 3.94331 4.16391 3.61380 4.04659 3.75751 3.32467 3.91420 3.15276 4.03481 3.50377 3.41206 3.50424 5.76685
x4
x5
3.24579 2.54078 4.17977 5.16996 3.96443 0.70617 3.51960 -3.41371 3.60466 0.20228 2.44010 1.14471 5.34933 3.52530 2.47687 4.25229 3.97971 3.02602 3.57778 6.51384 5.56595 8.79230 4.89286 3.34683 3.19575 5.53872 3.93845 4.46158 4.64938 1.14793 2.36248 -2.52184 3.23362 -1.22396 2.66389 5.29521 3.67904 2.40498 3.15821 0.38685
1.89616 1.77060 3.77663 3.69283 3.08352 2.10928 3.45995 3.12735 6.15007 2.52495 1.22805 1.23932 0.95717 0.14944 2.90083 1.53969 2.48612 2.54949 4.04045 3.57045
The MEANS Procedure Variable xl x2 x3 x4 x5
Mean 4.477 3.941 3.684 2.565 2.613
N 20 20 20 20 20
Std Dev 0.637 0.599 0.916 3.073 1.348
Minimum 3.213 3.153 2.362 -3.414 0.149
Maximum 5.521 5.767 5.566 8.792 6.150
REFERENCES GOOD, P.I. 1999. Resampling Methods: A Practical Guide to Data Analysis. Birkhauser, Boston. MANLY, B.F.J. 1997. Randomization, Bootstrap and Monte Car10 Methods in Biology. Chapman & Hall, New York. SAS INSTITUTE INC. 1999. SAS/STAT User’s Guide, Version 8. C a y , NC.
CHAPTER 23 FOUNDATIONS OF SENSORY SCIENCE DANIEL M. ENNIS Introduction The previous pages contain several well-written essays by the other contributors, based on their business and consulting experiences, on a wide range of topics related to product testing. Many of these essays deal with nonscientific issues in politics, management and history. Examples include discussions on turf battles between market research and sensory departments, conflicts among academics and consultants about rating methods, and opinions on whether experts or consumers should be sources of descriptive data. These commentaries are informative and interesting but do not address the requirements of Sensory Science as a scientific discipline. In this chapter an attempt is made to describe and illustrate a set of scientific principles that can be used as a foundation for the field. There may be several such sets of principles and foundations, but here, at least, is one of them. In particular the discussion will focus on the application of these principles to consumer product and concept testing. Science is fundamentally concerned with building and testing models of what we observe. A model is a way of thinking about something. There are management models, animal models, molecular models, mathematical models,. .. and all of them purport to represent observation. Albert Einstein said, “Science is the posterior reconstruction of existence through the process of conceptualization. In other words, through Science we construct theories that give us a way of representing our existence. We cannot directly measure anything; we only infer an understanding of the world through a theory about how our instrument relays information to us. Scientific “knowledge” is really an accepted body of theory that represents what we observe, but has no claim on the truth. In fact it is not even desirable to believe in scientific theories because this would reduce flexibility when new theories are proposed. This may seem somewhat unsettling to people who believe that scientific discoveries have been made. New advances in Science involve better, more reliable or more extensive, models. These better models allow us to represent our existence more fully. The field of Statistics is a particular area of applied modeling and is fundamentally concerned with distributions. Many types of univariate and multivariate statistical tools have become so commonly used in product testing ”
391
392
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
that they are often used without considering the models on which they are based. If the parameters estimated using a statistical model are not interpretable or help to advance understanding of what is observed, they have no value. The most successful models in any scientific field are those constructed from the processes involved with the minimum of arbitrary parameters. Basic Principles
Almost all responses made by consumers or experts who evaluate products or concepts are categorical. For instance, if I ask a consumer which of two alternatives is sweeter, I will get one of two responses if the task is forced [2Alternative Forced Choice (2-AFC) method.] Similarly, responses from the triangular method, the duo-trio method, the same-different method, the dual pair method and preferential choice all involve categorical responses. Now consider ratings. A person who responds “5” on a 7-point liking or intensity scale has given a categorical response by choosing one of 7 categories with which to identify the response. The same can be said about degree of difference (a generalization of the same-different method), just-about-right ratings, relative scales, agreement with statements, and so forth. Instead of assuming that a person who gives a “5” rating has somehow given direct intensity information, we might ask: Is there a process by which these categorical responses arise? There is always uncertainty associated with the measurement of any state. Assume that the information used in any of the above mentioned tasks is probabilistic. It is unlikely that we will get the same percept for a product twice with the same consumer for at least three reasons. First, products vary. Second, the conditions in the mouth and nasal cavity do not remain constant from moment to moment (for example, peri-receptor variability due to changes in salivary composition). Third, the neural system that transmits the signal does not behave exactly the same from moment to moment. So, when we build models for any of the categorical methods just described, we must allow for the fact that variability is present and consider the consequences for our results. In a product testing task with instructions that are followed by subjects, it seems reasonable to assume that people make decisions for reasons. In other words, for each method that we model there is a corresponding decision rule. This decision rule depends on the instructions given to subjects and how they interpret those instructions. In product testing the most useful methods are usually those that involve the same decision rule for each subject. If two methods, such as the triangular method and the duo-trio method, had well understood decision rules, we could predict the results of one method from the other. This would free us from method-specific results and allow us to compare the power of different methods on a common basis. Irrespective of the method, we assume that there is some underlying information that we want to estimate
FOUNDATIONS OF SENSORY SCIENCE
393
(how different the products are, how far they are from an ideal, how strong on some attribute) and that by understanding the decision rule we can get at this information without the baggage associated with the method. There must be many ways, for instance, to measure molecular weight, but in the end all useful methods converge on the same value within experimental error if the theory underlying the measuring techniques is understood. If we want to know how sweet something is, it should not matter whether we use a 5-point or a 15-point scale, as long as we understand how the category responses for each scale are generated from the common underlying probabilistic information. In the last two paragraphs I have described the two basic ideas behind probabilistic (Thurstonian) modeling - that information is probabilistic and that people use decision rules. Thurstone developed models for very simple tasks and stimuli (ratings and 2-AFC). These ideas have been incorporated now into models for just about every conceivable type of categorical measuring device in product testing with univariate and multivariate assumptions about the probabilistic information. This structure is not only comprehensive and beautiful, but holds the promise of providing a unifying theoretical basis for understanding human perception. From this foundation we can answer many questions that have been discussed in the previous chapters of this book. The mathematical models associated with this theory will not be given in this chapter, but there is a rich source of references for the theory provided in the discussions that follow. The following sections describe practical problems and solutions involving different product testing methods. In each case, a problem in product testing is posed and resolved using an appropriate probabilistic model. Scenario 1: Resolving Inconsistent Difference Testing Results Background: Suppose that an ingredient supplier has changed a key ingredient used in the manufacture of your company’s chocolate chip cookies. Based on the baking chemistry of this new ingredient, there is reason to suspect that the new ingredient will make your cookies harder. Additionally, recent market research data have connected cookie hardness with consumer liking. For these reasons, you decide to conduct a test to determine what effect the change in ingredient will have on your cookies. After some deliberation, you decide to use the duo-trio method to determine whether cookies made using the new ingredient are perceptibly different from cookies made using the current ingredient. One hundred experienced panelists are assembled, and these panelists are divided into two groups. Panelists in the first group each receive a sample from the current production to use as a reference, while panelists in the second group each receive a sample from the new production for the same purpose. Panelists
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
394
are then presented with samples from the current and new productions, and are instructed to choose the sample most similar to the reference. From this test you are unable to confirm a difference between the cookies from the new and the current productions. Given your knowledge of the new ingredient’s baking chemistry, you are perplexed by this result, and decide to conduct a second, smaller test. You suspect that by specifying the attribute in question, namely hardness, a difference between the new and current production cookies will be detected. For this reason you decide to use the 2-alternative forced choice (2-AFC) method. Thirty of the previous 100 panelists participate in the test, and in a counterbalanced design each panelist is presented with a sample from the new production and a sample from the current production. The panelists are asked to identify which sample is harder. When the test is analyzed, you find that the new ingredient makes your cookies significantly harder. TABLE 23.1. RESULTS OF DIFFERENCE TESTS USING THE DUO-TRIO AND THE 2-AFC METHODS
Method
Proportion of Correct
SampIe Size
Test of the Guessing Model
Responses (P,) Duo-Trio
0.55
100
NS
2-AFC
0.71
30
S
Gridgeman’s Paradox. These experiments and results, typical of routine tests conducted in consumer product testing, contain a profound message for the interpretation of product testing results. The two methods employed have exactly the same Null hypothesis, or guessing model, but show large differences in sensitivity to the products tested. This type of result, referred to as Gridgeman’s Paradox (Frijters 1979; Ennis 1993). has been extensively discussed over the last 45 years. A common misconception is that the difference between the 2-AFC and the duo-trio methods occurs because an attribute has been specified in the 2-AFC test. Experience demonstrates, however, that Gridgeman’s Paradox would arise even if an attribute were specified in the duo-trio test. For this reason other resolutions of the paradox must be sought. Comparison of d’ Values from Different Methods. From psychometric functions or tables, we can convert P, values into estimates of 6 values, or d’s.
FOUNDATIONS OF SENSORY SCIENCE
395
These estimates have variances that can be obtained either from tables (Bi er al. 1997) or by direct computation using the method of maximum likelihood. Table 23.2 shows the results of the cookie example in terms of d’ values.
Perceptual Variability and Decision Rules. The duo-trio and the 2-AFC methods share a common guessing model with a probability of correct response = 0.5. The difference between the two methods lies in the decision rules that are used to produce responses. Returning to the cookie example, a 2-AFC correct response occurs when a sample from the new production is harder than a sample from the current production. How often this occurs determines the proportion of correct responses, P,. Contrastingly, a duo-trio test with a sample from the new production as reference yields a correct response when the new production sample is more similar to the reference than the current production sample. How do we measure “more similar?” One approach is to base the decision on perceptual distances so that a correct response occurs when the distance between the new production sample and the reference is less than the distance between the current production sample and the reference. How do incorrect responses arise? We suppose that the perceptual magnitudes (hardness in this case) for the two products follow normal distributions with different means but unit variances. The difference between the means of these distributions is called 6 and its estimate, d’. The units of 6 are perceptual standard deviations. Variances in perceptual magnitudes explain why sometimes the current production sample may appear harder than the new production, even when the new production hardness mean is higher. Thurstonian Models and Psychometric Functions. Thurstonian models require the two assumptions we have just made: (1) Perceptual variability exists and can be modeled using the normality assumption, and (2) methods have associated decision rules. Figure 23.1 shows the psychometric functions for the 2-AFC and the duotrio methods. Psychometric functions link the probability of a correct response to 6. For instance, if 6 is 1. the probability of a correct response for the 2-AFC is 0.76. Assuming that one perceptual dimension is used in the duo-trio decision, we can use Fig. 23.1 to see that for 6 = 1, the probability of a correct response for the duo-trio method is 0.58. Marked on Fig. 23.1 are the results from Table 23.1 for each of the methods. Although the P, values for these methods are dissimilar, the estimated 6 values are almost identical. The difference between the P, values for the duo-trio and the 2-AFC methods for similar or the same 6 values is due to the difference between the two methods’ decision rules. Note that consideration of the decision rules has yielded this result; specification of attributes has not been mentioned.
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
396
0
I
1
0.0
03
1.O
1.s
2.0
2.5
3.0
Delta
FIG. 23.1. PSYCHOMETRIC FUNCTIONS FOR THE 2-AFC AND DUO-TRIO
M8thud
Proportion of Correct Responses (PJ
d’
Variance of d’
Duo-Trio
0.55
0.76
0.16
2-AFC
0.71
0.74
0.12
From these results we can determine whether there is a significant difference between the d’ values obtained using the two methods. This table shows no significant difference between the two d’ values. Combining the results of the two tests, we find that the new ingredient imparts a difference, probably hardness, and the perceptual difference is 0.75 f 0.51 using 95% confidence intervals.
Power. Given that there was a difference between the cookies from the new production and the cookies from the current production, why did the duo-trio method fail to detect the difference? To answer this question, we consider Table 23.2. Although the 6 values for the duo-trio and the 2-AFC methods are similar, the P, values are 0.55 and 0.71, respectively. As the two methods share a
FOUNDATIONS OF SENSORY SCIENCE
397
common guessing model, the P, values for each method are compared to the same value to decide significance. Given this information, we expect the 2-AFC method to declare a significant difference in this situation more often than the duo-trio method. This result holds in general, and we are able to say that the 2-AFC method is more powerful, or will declare a given d’ difference significant more often, than the duo-trio method. Figure 23.1 verifies this result. For any 6 value greater than 0, the probability of a correct response for the 2-AFC method is greater than that for the duo-trio method.
Sample Size Requirements for a Given Power. Figure 23.2 shows the sample sizes required to be 80% certain of detecting a 6 of 0.5 at an 01 = 0.05 for four different methods. In addition to the 2-AFC and duo-trio methods, Fig. 23.2 shows the 3-AFC and the triangular methods. The 3-AFC method is similar in instructions to the 2-AFC, but presents the subject with two products that are the same and one that is different. In the triangular method, there are two products that are the same and one different and the instruction is to choose the odd sample.
2-AFC
Duo-Trio
3-AFC
Triangle
FIG. 23.2. SAMPLE SIZE NEEDED FOR 80% CHANCE OF DETECTING A b OF 0.5 (64:36 SPLIT) AT AN 01 OF 0.05
Conclusion. The ideas discussed here apply to a situation in which there is a single attribute that subjects attend to in making decisions. Both the duo-trio and the triangular methods are not restricted in their applications to unidimensional perceptions. For this reason, the excellent agreement among the methods shown here may not always occur. A lack of agreement among the
398
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
methods may, in fact, suggest multidimensionality. Sequence effects may also cause disagreement among methods. To deal with these issues, multidimensional Thurstonian models (Ennis 1998a) and models for sequence effects (Ennis el al. 1995) have been developed. These more sophisticated models are often unnecessary, however. Numerous practical applications of Gridgeman’s Paradox have been observed, and the resolution of the paradox using Thurstonian models has underscored the importance of decision rules and variability in modeling choice experiments.
Scenario 2: Relating Rating Methodologies Background. Products and concepts are often evaluated on rating scales to quantify degree of liking, level of purchase interest, intensity of an attribute, degree of difference or level of agreement with a statement. Rating scales are constructed in many different ways, but usually involve options labeled with numbers, words or symbols. In some cases a particular option has a special label, such as “just right” or “same as reference.” Scales may involve labels for every option, such as the words on the 9-point acceptability scale; other scales have labels only at the end points. Rating responses are encoded as numbers. These numbers correspond to perceived intensities and express the order of these intensities, but ratings data may not possess the properties of the numbers used to encode them. For instance, if we encode the categories on the 9-point acceptability scale as integers from 1 to 9, we may assume that “3” corresponds to a higher liking value than “2” and that “9” corresponds to a higher liking value than “8.” Nevertheless we cannot assume that the difference between a “3” response and a “2” response is the same as the difference between a “9” response and an “8” response. We characterize this deficiency by saying that ratings data do not exhibit equal interval properties. One reason that the equal interval assumption cannot be applied to ratings data is that certain rating options may be avoided or favored. For example, on the 9-point acceptability scale the 3rd and 7th categories corresponding to “dislike moderately” and “like moderately” are favored and the 5th category, “neither like nor dislike” is avoided. This means that there is a smaller intensity range corresponding to a “5” response than to a “3” or to a ”7.” Consequently, the difference between an average “7” intensity and an average “6” intensity will be larger than the difference between an average “6” intensity and an average “5” intensity. Moreover, since we are only interested in differences between values; adding a fixed number to all values on an equal interval scale produces an equivalent scale. It follows that the zero point of an equal interval scale is arbitrary.
FOUNDATIONS OF SENSORY SCIENCE
399
Scenario. You are interested in the relative sweetness intensities of two beverage products; one of these products is your regular brand and the other is a modification. Two experiments have been conducted. In one experiment a numerical 7-point rating scale was used, and in the second experiment a 5-point word-category scale was used. Both experiments were conducted with 100 heavy users of your brand. The numerical scale consisted of the numbers “1” to “7” with “1” labeled as “not sweet” and “7” labeled as “extremely sweet.” The remaining numbers were not labeled. The word-category scale consisted of 5 options: “not sweet,” “slightly sweet,” “moderately sweet,” “very sweet” and “extremely sweet.” The results of these two experiments are shown in Tables 23.3a and 23.3b. Your question is: Do these two experiments reflect the same sweetness intensity difference?
u1n
46299
u399
66499
u p
u6n
u7n
Current Brand
4
10
40
34
9
2
1
Modification
0
2
21
41
23
9
2
Category Label
TABLE 23.3b. RATING FREQUENCIES FOR TWO BEVERAGE PRODUCTS ON A 5-POINT WORD CATEGORY SCALE
Category Label
“Not Sweet”
“Slightly Sweet”
“Moderately Sweet”
“Very Sweetn
“Extremely Sweet”
Current Brand
7
24
62
4
3
Modification
1
9
66
13
11
The Problem. To answer your question it is important to realize the distinction between perceived sweetness values and rated sweetness values. Perceived intensity can be placed on an equal interval scale while, as previously noted, rated intensities may not have equal interval properties. A problem arises: How can we measure sweetness intensities on an equal interval scale when all we have are non-equal interval ratings?
400
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
A Solution. Represent the sweetness values that the consumer experiences as values from a normal distribution with mean 0 and standard deviation 1. The zero mean is consistent with the arbitrary zero point of an interval scale, and the perceptual standard deviation is the unit of measurement. This representation leads to an equal interval intensity scale for sweetness. Figure 23.3a illustrates how consumers would produce the ratings given in Table 23.3a. For a 7-point scale, the consumer uses 6 points on the sweetness axis to decide how to rate a particular product. These points are marked by vertical lines and are called decision boundaries. If a sweetness value falls between two of these decision boundaries, the consumer responds with a rating appropriate to that interval. For example, the area between the first decision boundary and the second illustrates the probability of a "2" response.
Rating Means and Scale Means. Rating means are the means of the rating values, while scale means are the means of the intensity distributions, like sweetness, on an equal interval scale. Scale means are expressed relative to the scale means of a particular product. Accordingly, this product will have a scale mean of zero. Since the zero of an interval scale is arbitrary, it is immaterial which product we choose as reference. Differences between scale means are called 6 (delta) values. Rating variances depend on perceptual variances, but are also affected by the location of decision boundaries. For instance, it is wellknown that rating variances are generally lower for large rating means than at more intermediate values. This occurs because the number of rating options, determined by the decision boundaries, becomes reduced as the mean increases above intermediate values. The size of the rating variance may have little or no relationship to the size of the perceptual variance. Estimating Scale Means and Variances. Scale means do not depend on the location of decision boundaries, and scale means are not affected by the method used to obtain them. In fact, the same scale means can be obtained from a 5, 7- or 9-point scale; words, symbols, or numbers; or even ordinal measurement, such as "choose the sweetest." The key to obtaining scaling information is to use a model that relates the rating results to the scale means and decision boundaries. Estimates of these parameters and their variances can be obtained using the method of maximum likelihood (Dorfman ef al. 1969). Using this procedure, we find values for the scale means and decision boundaries that best correspond to the results given in Tables 23.3a and 23.3b. It should be noted that this method can be applied to any number of products, not just to two as discussed here.
FOUNDATIONS OF SENSORY SCIENCE
I
1
40 1
,,\r I
I
.
Modification (Mean = 0.83)
Current Brand (Mean 4 . 0 )
.N
1:2
2
2.0
2
4
1 FIG.23.3a. PRODUCT DISTRIBUTIONS AND DECISION BOUNDARIES ON AN INTERVAL SCALE MODELED FROM 7-POINT NUMERICAL RATINGS
Boundary Lines
(Mean = 0.77)
Current Brand
FIG. 23.3b. PRODUCT DISTRIBUTIONS AND DECISION BOUNDARIES ON AN INTERVAL SCALE MODELED FROM 5-POINT WORD-CATEGORY RATINGS
402
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
The means of the two distributions in Fig. 23.3a and 23.3b show where the two products fall on the equal interval sweetness intensity scale. For the numerical ratings, the means are 0.0 and 0.81. For the word-category ratings, the means are 0.0 and 0.77. The units for each of these scales are the same: perceptual standard deviations. The fact that the intensity units for both scales are the same allows us to compare directly the results of the two experiments.
Relating Methodologies. Fundamental measures like scale means are not method specific. Consequently, these measures can be used to predict the results of other methodologies. Table 23.4 gives the results of modeling the data in Tables 23.3a and 23.3b.
'&Point Scale
5-Point Scale
Mean (6)
0.81
0.77
Variance
0.026
0.029
Boundary Estimates
-1.8, -1.1, .l, 1.2, 2, 2.8 -1.5, .5, 1.5, 2
t-test on the equality of the 6 values: not significant at (Y = 0.05
In this table, the results from the two rating experiments are comparable because they have been modeled using a measure that is independent of the rating method used. From this analysis you conclude that the 7-point scale and the 5-point scale produce similar estimates of the sweetness intensity difference between the current brand and the modification. Averaging these results, you find that the overall d' value is 0.79. and in a separate test you could determine whether this result is significantly larger than zero. A valuable feature of Thurstonian models (Thurstone 1927a, b, c) is the ability to interlink the results of categorical methods, such as rating methods. There are many benefits to this aspect of Thurstonian modeling, one of which is the capability to make power comparisons among methods to determine optimum sample sizes. This point was made in Scenario 1 (Ennis 1998a). The ability to predict the results of other methodologies from scale means is quite general and applies to many types of categorical data.
Conclusion. Although rating methods may not produce interval scale data, a decision boundary Thurstonian model can extract interval scale information
FOUNDATIONS OF SENSORY SCIENCE
403
from ratings data. Hypothesis tests can be conducted and the results of different methods can be compared and combined. Scenario 3: How Retasting Can Improve the Power of Product Testing Background. The cost and timing of product testing depends on the size of the experiment. The power of a difference test depends on the number of judgments, a component of the experiment’s size. Power is the probability of correctly concluding that a difference exists given a significance level, sample size and specified difference. Efforts to improve the power of difference tests are important because they are rewarded with the development of more resource-efficient methods. Theoretical and applied research on difference testing has shown that some methods may require as much as 100 times the sampling of other methods to detect the same difference with the same power (Ennis 1993). Some recent experimental work in taste research (Rousseau et al. 2000) has shown that retasting during an experiment may improve test power. In order to exploit this finding, it is useful to understand why this might occur. In this report, we develop a possible explanation of this effect and from this model we can predict a broad range of experimental outcomes. Scenario. One of your main responsibilities is to investigate the sensory characteristics of yogurt products following a process or formulation change. You use the 2-AFC (ZAlternative Forced Choice) method routinely in discrimination testing to evaluate such product modifications. A sensory attribute, such as smoothness, is first chosen prior to the test. Sets of the two alternatives are presented to each member of your panel who is instructed to select the more intense sample on this attribute. You are interested in investigating the effect of retasting samples in the 2AFC task. On the one hand, performance might improve by providing more information on the product attribute before generating a response. On the other hand, performance might decrease due to sensory fatigue and confusion. You conduct an experiment in which panelists are tested in two conditions. In the first condition, no retasting is allowed. In the second condition, the panelists are required to retaste each sample once before responding. Table 23.5 gives a summary of the results. You observe that retasting appears to increase the discrimination ability of your panelists. These results confirm published work on retasting (Rousseau et al. 2000).
404
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
TABLE 23.5. 2-AFC RESULTS WITH AND WITHOUT RETASTING
No. of Correct Responses
Total No. of Responses
0.017 One Retasting
156
200
1.09
0.02
d' values are significantly different at P< 0.05
Variance in Models for Difference Tests. Many Thurstonian models for discrimination testing assume an equal variance normal distribution of percepts for each product. Models using this simple assumption have proven to be very useful in explaining many experimental findings, such as Gridgeman's paradox (Ennis 1998b). The difference between the products is called 6 and its experimental estimate is d'. The units of 6 are perceptual standard deviations (Fig. 23.4.) If the perceptual variance of a product increases, judgments made about that product will be more uncertain and since the units of 6 are perceptual standard deviations, the size of 6 and the precision of the experiment will decrease. Alternatively, if the perceptual variance decreases, the size of 6 will increase, thus improving the statistical power of the experiment. There are several factors that may affect perceptual variance. Some of these factors might induce its decrease [familiarity with the experimental protocol, degree of training provided to a subject, inherent sensitivity of individuals, and extent of retasting allowed (Rousseau 2000)], while some of these factors might trigger its increase [influence of sensory adaptation and irritation (Rousseau et al. 1997, 1998, 1999), extent to which the memory trace for a product changes during a test and number of memory traces interfering in memory (Rousseau et al. 2002)l. As was just mentioned, the effect of retasting can be seen as reducing the variance of the perceptual distributions used to make a product testing decision. Table 23.5 gives d' values for the two experiments previously described and also shows that these values are significantly different from each other; retasting increases the size of d'. The d' values, their variances and the test conducted on them were obtained using the IFPrograms software. Effect of Retasting on Perceptual Variance. Figure 23.5 illustrates how the variance might decrease as a result of retasting. Consider two products X and Y with means pXand py and variance of 1. Since the variance is 1,6 = py px. When sampling each product once, the panelist will experience one sensation for each product, x and y. In order to make a decision, y will be compared to x and the product perceived as greatest in intensity will be selected.
FOUNDATIONS OF SENSORY SCIENCE
405
This decision rule can be described as selecting y when y-x is positive, and x when y-x is negative. The probability of this choice can be seen clearly in terms of the distribution of differences between the sensations as shown in Fig. 23.5. This distribution has a variance of 2 (sum of the individual distribution variances.) The subject will be correct when y-x is positive; the proportion of correct answers is the dark area under the curve.
FIG. 23.4. PROBABILISTIC REPRESENTATION OF THE DEGREE OF DIFFERENCE 6 BETWEEN TWO PRODUCTS X AND Y
0
0 FIG. 23.5. PERCEITUAL VARIANCE: EFFECT OF RESAMPLING IN THE 2-AFC METHOD
406
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
As a panelist retastes the samples, the variance of the difference distribution will decrease (Juslin ef al. 1997). If the samples are tasted r times, the variance will decrease by a factor of r if the panelist uses an average value of the sensations from the r tastings. From the third distribution in Fig. 23.5, we see that the proportion of times y-x is positive increases, thus yielding a higher proportion of correct answers. In the absence of other factors that may interfere with performance, this model predicts that performance will improve upon retasting and that the value of 6 with r tastings will equal dr times 6 with no retasting. The extent of improvement depends on how much information can be remembered and used in the decision-making process. Effect of Retasting on Power. Fig. 23.6 shows the power of the 2-AFC method using a sample size of 50 as a function of the sensory difference, py px, and the number of tastings, r. When r is 1, 6 = py - px. As r increases, the perceptual variance decreases leading to greater power at a given value of py LC,. For instance, when py - px is 0.5, Fig. 23.6 shows that the probability of declaring a significant difference at a = 0.05 is almost 1 when the samples are tasted five times. Contrast this with the power of about 62% when the samples are tasted once. Although these results are specific to the 2-AFC, improvements due to retasting in the power of other methods, such as the triangular and duotrio methods, can also be predicted and tested.
0 r
al 0 -
e-
8:
b
N
0 -
0 0
,
I
FIG. 23.6. POWER AS A FUNCTION OF SENSORY DIFFERENCE AND NUMBER OF TASTINGS (r)
FOUNDATIONS OF SENSORY SCIENCE
407
Conclusion. Retasting may increase the power of a difference testing experiment by decreasing the size of the variance of the information used to make a decision. Since sensory differences are expressed in standard units (6), which are the perceptual standard deviations, a consequence of retasting may be to increase the size of 6. This effect is very general since it applies to many sensory evaluation methods (Rousseau ef al. 2000) and suggests that retasting will improve the discrimination power of a sensory protocol provided that sensory fatigue does not reduce performance. When conducting methodological research in order to optimize the reliability of experimental outcomes, it is essential to consider any variable that can hinder or improve the measurement of products’ sensory characteristics. Comparing the performance of various protocols using models of the discrimination process will provide valuable information concerning those variables, Probabilistic or Thurstonian models identify and quantify these effects and provide a basis for developing a general foundation to interpret producttesting measurements. Scenario 4: Discrimination Testing with Multiple Samples Background. The fact that variation occurs in the manufacture of consumer products leads to a challenging problem in difference testing. A product with a particular identity may be composed of different variants due to the fact that it is produced at different locations or on different machines. Difference testing designed to evaluate modifications of this product should consider the fact that the product is composed of variants. Classical difference tests such as the triangular, duo-trio or rn-alternative forced choice methods are not capable of accounting for the possibility that a product may exist as one of several variants. In this report, an extension of the duo-trio method is used to quantify additional variation such as that caused by different production facilities. Scenario. Your supplier has proposed an ingredient change for your vanilla flavored yogurt and you would like to know what effect this change has on your product. Your inquiry is complicated by the fact that your main factory has two production lines that might produce slightly different products. The current product is labeled “A”, while the reformulated product is labeled “B”. Subscripts on A and B refer to different production lines. In a trial a subject is given three products, for instance AIA2Bl,in which the first sample (A,) is a reference. The task of the subject is to select one of the two alternatives (A, or B,) most similar to the reference. Each subject evaluates one of each of the 12 possible triads. This method is referred to as Torgerson’s method of triads. The results are presented in Table 23.6.
408
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
TABLE 23.6. NUMBER OF SUBJECTS CHOOSING THE FIRST SAMPLE OF THE ALTERNATIVE PAIR AS MOST SIMILAR TO THE REFERENCE
Torgerson’s Method of Triads. It can be seen from a description of a trial given above, that Torgerson’s method of triads is an extension of the duo-trio method. Unlike the duo-trio method, all three samples may be of different products. This paradigm was first proposed by Torgerson (Torgerson 1958). A Thurstonian model for this method is the same as that used for the ABX method, preferential choice (Ennis 1999) (paired preference test) and two-stimulus identification, in which two alternatives are compared to a reference point (Ennis 1993). Figure 23.7 illustrates the similarity of these different protocols.
Coding Ttm Samples Two References
Diio- Trio Torgcrsnn
ABX
Two Sainplcs
Prckrciicc ldriiti licotion
I Reference Samplc Ideal 4nitiplc
FIG. 23.7. SIMILARITIES OF THE DUO-TRIO METHOD, TORGERSON’S METHOD, THE ABX METHOD, PREFERENTIAL CHOICE AND TWO-STIMULUS IDENTIFICATION
FOUNDATIONS OF SENSORY SCIENCE
409
Generalizations of other protocols also exist. These include Richardson’s method of triads (Ennis er al. 1988), which is a generalization of the triangular method and the multiple dual-pair (Rousseau er al. 2002) method, which is a generalization of the dual-pair or 41AX procedure. A unidimensional Thurstonian model for Torgerson’s method has been published (Ennis er al. 1988). In a manner similar to that used for traditional discrimination tests (Ennis 1998; Rousseau et al. 2001), analysis of data from Torgerson’s method provides scaled product similarities in units of the perceptual standard deviation. These scaled units separating products are commonly referred to as d‘ values. If four products are compared, analysis of data from Torgerson’s method will provide the 6 possible d‘ values, as well as their variances.
Fitting Hierarchical Thurstonian Models. In the scenario presented above, it is possible that all four products (2 products x 2 production lines) may be different. It is also possible that there are no differences due to the production lines and that the only difference is that due to the product modification. It is even conceivable that the modification has no effect. Which of these or other alternatives represents the best account of the data? Using the method of maximum likelihood, various models can be fit to the data. These models are presented in a hierarchical framework in Fig. 23.8. Each node in this figure represents a model. Between the nodes there are probabilities obtained from statistical tests on differences between the models. The most complex model is presented at the top in which the number of estimated parameters equals the number of different triads. The simplest model, which is the guessing model, has no estimated parameters and is presented at the bottom. Between these extremes there are other models that require fewer parameters than the saturated model, but more parameters than the guessing model. In order to find the appropriate model for your experiment, imagine that you are trying to find your way from the top of the figure to the bottom through gates that are only open if the inter-node test probability is greater than 0.05. The goal is to find the model with the fewest number of parameters that still explains the data. If there is a significant difference between the saturated model and the model immediately below it, then unidimensional Thurstonian modeling may not be adequate. This can occur, for instance, if the percepts are multidimensional. Calculating d’ Values with Torgerson’s Method. Based on the results presented in Table 23.6, you use the unidimensional Thurstonian model to analyze the results for the vanilla yogurt study. The model comparisons are given in Fig. 23.8, while the d‘ values are given in Table 23.7 and illustrated in Figure 23.9.
410
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
I
Saturatsdmodel
I
+ 4distribulionmodsl I
3 dirtibuhonmodel
1
A
Owrdngmodsl 1 dirtribulion mochl
FIG. 23.8. HIERARCHICAL MODEL STRUCTURE FOR THE VANILLA YOGURT STUDY (the distributions given are for illustration purposes.)
The model that assumes four distinct distributions is found to be not significantly different from the saturated model (P = 0.64), while both models with only 3 distributions are significantly different from the model with 4 distributions. This shows that the unidimensional model is sufficient to describe the product similarities and that a model with fewer parameters is not justified. In our imaginary trip down Fig. 23.8, the gates below the 4-distribution model are closed. Taking into account the production line variation, the overall weighted average d’ value between A and B is 0.50, which is significantly different from 0. The d’ value induced by the production lines is a weighted
FOUNDATIONS OF SENSORY SCIENCE
41 1
average of 1.47. This means that, while the new ingredient significantly altered your vanilla yogurt, the extent of this change is smaller than the existing difference between your two production lines. If the difference between the two lines is considered acceptable, the product with the new ingredient may be an adequate alternative.
Product A,
Bl
A*
Bl
B2
1.407
0.50
2.04
0.033
0.029
0.036
-0.91
0.63
0.03
0.029
II0.50
1.54
0.033
0.91
0.63
FIG. 23.9. VANILLA FLAVORED YOGURTS: REPRESENTATION OF PRODUCT SIMILARITIES USING THE UNIDIMENSIONAL THURSTONIAN MODEL
Conclusion. Torgerson’s method of triads is a practical way to provide simultaneous estimation of multiple product similarities. Applications include the comparison of multiple reformulations to a current standard product, or the consideration of batch-to-batch variability when investigating any kind of product change. The method has potential to be useful in product testing when simultaneous comparisons of more than two products are required.
412
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
Scenario 5: Replicated Difference and Preference Testing [This section contains a description of a general problem that occurs when there are individual differences. The model discussed is not a probabilistic model of the type so far discussed but provides results that can be linked easily to the probabilistic structure being described.]
Background. As part of an effort to compare a new deodorant with a competitor’s product, 10 experienced judges evaluate the two products on the left and right arms of 30 subjects. In a counterbalanced design in which the two products are alternately placed on the left or right arms within a subject, each judge reports the least malodorous arm. In this example, there is the possibility that there may be trial-to-trial differences in malodor. The chemical reactions of subjects to the deodorants may differ so that one product may be less effective on some subjects than others. This type of result leads to the need to account for inter-trial variability either because of the need to provide defensible claims in the face of inter-trial variation or because this type of variation is of fundamental interest itself, as may occur when one is identifying preference segments. In a recent paper (Ennis ef al. 1998a), the use of the Beta-Binomial (BB) model was discussed for replicated difference and preference tests to deal with exactly the type of problem just discussed. The BB model is a natural extension and generalization of the binomial model. In the BB model, two sources of variation (inter-trial and intra-trial variation) are accounted for and two parameters are used to fit the data collected from a replicated difference or preference test. The BB model provides a much better fit to many data sets than the binomial model. This model should improve the validity of sensory difference and preference tests where inter-trial variation occurs. Inapplicability of the Binomial Model. Sensory difference and consumer preference tests are widely used in sensory and consumer studies. The traditional model for these experiments is the binomial distribution. The binomial model is valid under the assumption that there is only one source of variation in the data because the choice probability (preference or probability of a correct response, for instance) is assumed to be constant from trial-to-trial. However, this assumption is almost always violated in practice. If variation due to inter-trial differences cannot be ignored, then there are two sources of variation: variation due to judgments and variation due to trials. Returning to the earlier example about the treatment of malodor, if the judges are homogeneous, but subject reactions to deodorants are not, then variation within a trial (the assessment of one subject) may be very different from variation across subjects. This extra
FOUNDATIONS OF SENSORY SCIENCE
413
variation is called overdispersion and the BB model is designed to account for this extra variation. Product Superiority. When there is more than one source of variance in a difference or preference test, the variability in the data may exceed binomial variability. If we still use the binomial model, an underestimate of the standard error can be obtained and so a seriously misleading conclusion may be drawn about the superiority of one product over another. A simulation study based on actual product testing experience was reported by Ennis and Bi (1998b). The results of the study showed that, for overdispersed binomial data, the Type I error may be 0.44 when the experimenter thinks it is 0.05. This means that in a test to demonstrate product superiority, a product may be declared significantly better at the 5 % level, when its superiority can only be demonstrated at the 44% level! It is clear that the traditional binomial model should not be applied to overdispersed binomial data and the current approach to analyzing difference and preference tests needs to be revised. Binomial, Beta and Beta-Binomial Models. The binomial distribution is a discrete distribution and when applied to choice data models the probability that a particular choice outcome will occur. For instance in the malodor example, the binomial distribution models the probability that of the 10 judges evaluating a particular subject, 0, 1, 2, ..., 10 of them will choose your competitor’s brand as least malodorous. If we pool data across subjects, we assume that each subject reacts to both products in the same way. But suppose that when your competitor’s product is used on subject 1 there is a 60% chance of being less malodorous but that on subject 2, there is only a 20% chance. By combining data we are mixing data binomially distributed in different ways and cannot assume that the combined data follows a binomial distribution. Figure 23.10 shows what the binomial distribution looks like for n = 10 and P = 0.6 and shows the probability of 0 out of 10 to 10 out of 10 correct responses. As we consider the possibility that P may change from trial (subject) to trial, how is P distributed? One very general possibility is to consider that P follows a beta distribution. The beta distribution allows a broad variety of shapes for the distribution of P, four of which are shown in Figure 23.1 1. In one case (the 4th line), the most likely values of P are close to 0 or 1. Applied to the deodorant problem, this means that your product is more effective on some subjects and worse on others than your competitor’s product. Combining the binomial and beta distributions produces the beta-binomial distribution, one example is in Fig. 23.12. In this distribution, we assume that a binomial distribution with P = 0.6 applies within a trial and that as we move from trial to trial, the P‘s follow a beta distribution, like the 4th line in Fig. 23.11. In order to estimate the P for a particular trial, we need replications. In
414
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
order to estimate the parameters of the beta distribution, we need more than one replicated trial. Data fit by a beta-binomial are always in the form of replications and trials. The meaning of these two terms varies depending on the problem.
FIG. 23.10. BINOMIAL: n = 10, P = 0.6
I
I
/
0.2
0.5
0.4
0.8
P
FIG.23.11. SHAPES OF THE BETA DISTRIBUTION
FIG. 23.12. BETA-BINOMIAL: n = 10
3
FOUNDATIONS OF SENSORY SCIENCE
415
Fitting the BB Model to Data. Two parameters are needed to fit the BB model. These parameters are p and 8 and they measure the mean and spread of the distribution of the choice probability, P. If 8 is zero, the BB reduces to the binomial distribution with a single parameter P. The BB parameters are fitted to replicated difference and preference tests using the method of maximum likelihood. In the deodorant claim example, the estimate of p was 0.60 and the estimate of 8 was 0.47. A comparison of the binomial and BB models showed that the BB model fits the data significantly better. This means that 8 cannot be assumed to be zero. When the binomial was used to compare products on the combined data from subjects and judges, your competitor’s product was significantly better (P c 0.001). However, when we used the BB model and accounted for inter-trial variability we found that the products were not significantly different at a! = 0.05. The apparent superiority of your competitor’s brand may have been due to overdispersion.
How Many Trials and Replications? In the malodor example, 30 trials with 10 replications per trial were used. Analysis showed that the two products did not differ significantly at the 5% level using the BB model. How many replications and trials would be needed to be 80% sure that if the real p and 8 were 0.6 and 0.47 the result would be significant? Figure 23.13 shows contours of equal power for values of n (replications) and k (trials) when p = 0.6 and 8 = 0.47. It can be seen that if 10 judges were used, the number of subjects would need to be increased to 74. Other combinations of n and k can be used to achieve 80% power. The choice would depend on the relative cost of n and
k. 0
z
-
-.
U.3 0 0 0 -
n = 10. k
= 74
0 (D d
0
9
-
-------
I1
5
,. U.4 U.3
0
N -
--
1
4 15
10
20
n FIG. 23.13. POWER VS. n AND k
Other Applications. In a separate project, you have an interest in determining whether a new deodorant has a broader appeal to young males than
416
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
your current product or whether there are latent preference segments. These segments may differ in their preferences for the new or current deodorant. A sample of 300 young male consumers provides triplicate forced choice preference responses to pairs of the current and new deodorant. In this preference test, some consumers may prefer one deodorant over the other and other consumers may have the opposite preference. Some may be indifferent. We can use the BB model to test for the existence of latent preference groups very much the same way that we used it to study subject effects in the malodor example. In the preference test, the trials are consumers and replications occur within a consumer. In this case if the BB is significantly different from the binomial, then we would conclude that there are latent preference groups. Another way of expressing this is that B would be greater than zero. There are many sources of inter-trial variance. Differences in experimental material (such as the subjects in the malodor example), individual preferences, individual sensory acuity, manufacturing locations and time are all sources of inter-trial variance. In some cases, we may use the BB to find interesting latent groups. In other cases, we use the BB to deal with extra nuisance variation that we would like to work toward eliminating. If inter-trial variation could be eliminated, we would be justified in using the traditional binomial model. Conclusion. Product superiority clairns in terms of preference or the intensity of some attribute (product A is less malodorous than product B, for instance) are often based on binomial tests of choice proportions. If overdispersion exists, then a superiority claim could be refuted on the basis of an erroneous Type I error. Conversely, a superiority claim based on the BB model can be defended. Scenario 6: Probabilistic Multidimensional Scaling of Similarity Background. Solutions to many market research, product development and quality assurance problems require the use of various types of product maps (Ennis 1999; Ennis ef al. 1998; Ennis 2001). Sometimes it is of interest to know whether products differ without regard to liking and preference. For instance, product maps are used to guide quality assurance, cost reduction formulations, and “me-too” product development. It is natural to think of products as points and when two products are less similar than two other products that they are further apart in some product space. In this report it will be shown how this idea has significant limitations, and that a more meaningful interpretation of similarity data can be made when products are treated as distributions.
FOUNDATIONS OF SENSORY SCIENCE
417
Scenario. An unusually high number of consumer complaints concerning product consistency have been received regarding your orange juice brand. An investigation of this problem leads to the conclusion that five of your ten plants are largely responsible for the offending product. In order to understand differences among the products produced at your ten plants, you conduct a consumer study of paired product similarities. One hundred consumers evaluate ten variants (one from each plant) of your current orange juice brand. Table 23.8 presents the proportion of times (out of 100) that pairs were declared to be the same. Notice that although pairs of identical product are on the diagonal, the diagonal elements do not appear to be identical. Based on the data in Table 23.8, your goal is to construct a map of the products from the ten plants. TABLE 23.8. NUMBER OF TIMES (OUT OF 100) THAT PAIRS OF PRODUCTS WERE DECLARED TO BE THE SAME. PLANT NUMBERS ARE GIVEN IN THE FIRST ROW AND THE FIRST COLUMN
Scaling Multiple Attributes. The first significant contributions to the scaling of similarities occurred in the late 1950s. In this form of scaling, each product was represented as a discrete point, rather than a distribution. We will refer to this type of multidimensional scaling (MDS) as deterministic. Probabilistic MDS treats each product as a distribution and the scaling units are perceptual standard deviations. This approach can be viewed as a multivariate
418
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
extension of Thurstonian scaling. Deterministic MDS suffers from serious flaws that are absent in its probabilistic counterpart. When we use deterministic MDS, we assume that each entry in Table 23.8 corresponds to a unique distance in a perceptual space and as the similarity increases, the distance decreases. Specifically, this model requires that pairs of identical products correspond to identical similarities. In practice, pairs of identical products do not produce identical “same” response probabilities, as illustrated in Table 23.8. Hence, the deterministic MDS model is handicapped. Figure 23.14 shows products as distributions. Notice that the distributions labeled 1, 2 and 3 all differ in variance. If an individual is presented with pairs of samples from a single one of these distributions and asked to decide if the pairs are the same or different, it is very c1r:ar that the object with the smaller distribution will be declared “same” more often than the other two. A probabilistic account of the product information provides an intuitive understanding of why pairs of identical products do not produce identical “same” probabilities.
FIG. 23.14. MULTIVARIATE DISTRIBUTIONS CORRESPONDING TO PRODUCTS THAT DIFFER IN VARIANCE
FOUNDATIONS OF SENSORY SCIENCE
419
A second idea from deterministic MDS is that the similarity measure is monotonically related to inter-product distance. This assumption implies that when a pair of products are less similar than a second pair, the first pair are further apart than the second pair. This assumption is also violated in practice. Probabilistic MDS has no problem with violations of this assumption, however. The distance between the means of A and B is equal to the distance between the means of B and C, but their similarity measures are not the same. A is more similar to B than B is to C. A deterministic MDS model would push B and C further apart to account for their greater dissimilarity. In other words, this type of model confounds variance with distance since it only has distance to work with. Most consumer products are variable either because they contain natural ingredients or because manufacturing precision is not absolute. In addition, consumer perception of products varies from time to time. In view of these facts about products and people, we need to consider variance when mapping consumer product perceptions.
How Same-Different Judgments Arise. Multivariate probabilistic models for same-different judgments have been published (Ennis ef al. 1988, 1993). One of these models has a very simple form and we will used it to interpret Table 23.8.In this model we assume that the probability of a “same’ response is a decreasing function of distance between momentary values. When the distance is zero, the probability is 1. However, unlike a deterministic model, this function only applies to momentary values from the product distributions. Products with large variances will yield lower “same” probabilities than those with smaller variances, explaining differences in the “ same” probability between pairs of identical products. In this model, the “same” probability is not monotonically related to inter-product distances, as it also depends on the variance and the relative orientation of products to one another. Mapping Table 23.8. Figure 23.15 shows a fit to Table 23.8 as a map constructed from the probabilistic similarity model. Products are shown with confidence limit ellipses and it can be seen that as the first dimension intensity increases, so does its variance. Variances of the products on the second dimension remain constant. Differences in total variance for the products explain differences seen in the diagonal of Table 23.8. For instance, the variance for product 3 is smaller than product 8 and the self-similarity values of 0.94 and 0.68,respectively, are consistent with this. Once the map of similarities has been constructed, descriptive information is used to explain the dimensions identified. The product means on the first dimension correspond mainly to perceived pulpiness while sweetness primarily explains the second dimension. The fact that as products become pulpier the variance increases suggests that control of pulpiness is less easily achieved than sweetness. The problem appears
420
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
to be that the five plants, 2, 4, 5 , 7 and 8, produce excessively pulpy product and that this product is more inconsistent because of the difficulty in maintaining high pulp levels from one container to another. A deterministic map of Table 23.8, Fig. 23.16, suggests that products &om the plants producing pulpier product differ more among themselves relative to products from the remaining plants as shown in Fig. 23.15. It can be seen in this figure how products with greater variability are pushed away from each other to accommodate the variance effect using distance. This model cannot explain the lower selfsimilarity of the pulpy products and provides no insight into the variability of product produced at specific plants.
Conclusion. Probabilistic MDS provides compelling solutions to problems in which product and perceptual variability arise. They also provide interesting diagnostic information for researchers interested in the dimensionality of product differences and the variances of products. Insights from these analyses provide useful tools to guide product quality management. Scenario 7: Probabilistic Multivariate Preference Mapping Background. In a previous report (Ennis er al. 1998). we discussed a method for the analysis of liking data. The goal of this method was to discover attributes that drive liking. The method, called probabilistic unfolding, displays products and ideals as distributions. Preference data can also be unfolded to provide insights into ideal product characteristics, and can be used to improve products. In fact, preference data are more valuable than liking data to determine the basis for consumer hedonics using unfolding models. Unfortunately, preference experiments are often more expensive to conduct than liking experiments. In this report, an overview of the benefits of multivariate preference unfolding is given using new techniques that specify products and ideals as distributions rather than discrete points. The value of this approach in modeling preference data will be illustrated. Scenario. In preference tests among category users, your chocolate chip cookie product usually places 3rd or 4th. You would like to diagnose the basis for this preference ordering and improve your product’s performance in these tests. In a recent large-scale preference test. among heavy users of the product category, your current product and a new product prototype were compared with products of three of your major competitors. The results of this test are shown in Table 23.9.
FOUNDATIONS OF SENSORY SCIENCE
42 1
FIG. 23.15. PROBABILISTIC MULTIDIMENSIONAL SCALING OF SAME-DIFFERENT JUDGMENTS FOR ORANGE JUICE PRODUCTS FROM TEN PLANTS. NUMBERS CORRESPOND TO THE PLANTS
-02
00
02
04
08
08
10
0imenSl.m 1
FIG. 23.16. DETERMINISTIC MULTIDIMENSIONAL SCALING OF SAME-DIFFERENT JUDGMENTS FOR ORANGE JUICE PRODUCTS FROM TEN PLANTS
422
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
TABLE 23.9. PREFERENCE PROPORTIONS FOR FIVE PRODUCTS BASED ON 300 CONSUMER PREFERENCES PER CELL
Stochastic Transitivity. When A > B and B > C, transitivity implies that A > C. Since A > B, we know that A - C > B - C. If we think of product hedonic values as points on a line such as A, B, and C, then preference proportions might be thought of as monotonically related to differences on this line, i.e., A - B, A - C and B - C. We expect that if A is preferred to B and B is preferred to C, then A should be preferred to C by at least as much as B is preferred to C. This is a form of transitivity known as strong stochastic transitivity. C2 is preferred to C1 by a small margin (54:46). However, C1 beats CP by 8 percentage points more than C2 (88% vs 80%).A similar, though smaller, effect occurs with CP and C3. These two products appear to be equally preferred, but C3 appears to perform better against C1 than CP does. Thus, an assumption of strong stochastic transitivity disagrees with the data in Table 23.9. Are these results due to experimental error? Variance in the preference proportions will not explain these observations. There is, however, a compelling model of preferential choice that can explain these results. This model does not require strong stochastic transitivity. From this model we will learn what preference results can tell us about how products are located in a perceptual space. Two Assumptions. Preferential choices are not consistent among a market segment of consumers or even within one consumer. One way of thinking about how preferential choices are generated is to assume that consumers base their choices on information obtained from products tested and on their opinion at that moment about what they ideally prefer. It is assumed that the consumer considers the perceptual variables on which herhis choice depends and selects the product from the pair that is closest to herhis imagined ideal product at that time. Figure 23.17 illustrates how P1 would be preferred to P2 because it is closer to an ideal value, I, in a relevant perceptual space.
FOUNDATIONS OF SENSORY SCIENCE
I i
423
p2 I
I
Dimension 1 FIG. 23.17. IDEAL AND PRODUCT VALUES IN A PERCEPTUAL SPACE RELEVANT TO THE PREFERENCE DECISION
Two assumptions that we will use to explain the preferential choice results of Table 23.9 are: (a) That product and ideal perceptions can be represented as distributions rather than discrete points, and (b) a consumer chooses the product that has the least distance to an ideal value on variables that are pertinent to preference. There is a parallel between Thurstonian models for difference tests and the model described here. In each case we have the same assumptions - a distribution assumption about product percepts and an assumption about the decision rule used by consumers to make choices.
Unfolding. Preferential choice proportions are unidimensional variables, but Fig. 23.17 shows how choice decisions may be based on distance comparisons in a multidimensional space. Preferential choice unfolding is a process through which multivariate ideal and product positions are estimated based only on preferential choice proportions. Later we add attribute variables to the unfolded preference map to describe the dimensions of the perceptual space. Fig. 23.17 shows one subject making a choice at one moment in time, but cannot explain inconsistent choice behavior. In Fig. 23.18, I, P1 and P2 are particular values drawn at one moment from three distributions. The use of distributions for products and ideals not only explains inconsistent choice behavior, but is motivated by the idea that both products and consumer perceptions of products vary.
424
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
-_
.-
Dimension 1 FIG. 23.18. DISTRIBUTIONS OF MOMENTARY PERCEPTS AND THEIR 95 96 CONFIDENCE LIMITS
A Multivariate Preference Map. Multivariate unfolding models for preferential choice (Mullen et al. 1991; Ennis et al. 1994) and preference ratios (Mackay et al. 1995) have been published. These models provide the mathematical basis for estimating the location of products and ideals (their means) as well as the size and shape of the distributions (their variancecovariance matrices). Using the preferential choice model, the map in Fig. 23.19 was estimated as the best fit to the data in Table 23.9. The method of maximum likelihood was used to obtain these fits. Figure 23.19 shows that the product and ideal distributions share a common feature - the variance of one of the dimensions is larger than the other and within each dimension the variances are equal. From preference unfolding we determine the location of the product and ideal distributions in a relevant attribute space. Although the unfolding solution displayed in Fig. 23.19 is two-dimensional, the technique is not limited to any number of dimensions. Tests can be conducted, in fact, to determine the most parsimonious dimensionality. Once unfolding has been accomplished, we may describe the space by finding the best Wing scales that match product projections onto these scales with product rating means. This type of analysis led to the identification of the hardness and chocolate flavor dimensions shown in Fig. 23.19.
FOUNDATIONS OF SENSORY SCIENCE
425
c3
FIG. 23.19. THE PREFERENCE DATA OF TABLE 23.9 UNFOLDED TO SHOW THE RELATIVE POSITIONS OF PRODUCTS IN HARDNESS/CHOCOLATE FLAVOR SPACE
Interpretation of the Preference Scenario. For the ideal product, the variance for hardness is greater than the variance for chocolate flavor. This means that consumers are more sensitive to changes in chocolate flavor than changes in hardness. Although the means of C2 and C1 are equidistant to the ideal mean, C2 is preferred because it is closer to ideal on the most relevant attribute. We can now see why strong stochastic transitivity does not apply to the preference proportions. C1 is preferred to CP by a greater margin than C2 because: (a) C1 and CP share the same chocolate flavor mean, but C1 is better positioned on hardness, and (b) C2 loses to the current product when hardness ideal values are low - cases on which C1 wins. The new product appears to be a slight improvement over the current product because its hardness is closer to ideal. More attention should be paid to increasing the chocolate flavor of the current product. It is also evident why CP and C3 are equally preferred. They are symmetrically equidistant from the ideal in different directions on chocolate flavor. Finally this new model also explains why C1 is preferred to CP by a larger margin (88% vs. 81%) than C3. CP exhibits the same low chocolate flavor as C 1, but is also softer than the ideal. On those occasions when the ideal product is perceived to be high in chocolate flavor, C3 is preferred to C1. Scenario 8: Locating Individual Ideals and Products in a Joint Space Background. When considering product differences that drive consumer liking, it is helpful to imagine a space described by sensory characteristics
426
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
in which each product has a location. In this space a person’s ideal product may be found, so that if we understood this space we could predict the person’s degree of liking for each product. By looking into this space we would also notice that each product and each ideal does not exist as a point, but as a cluster of variously similar points as illustrated in Fig. 23.20. Some people clearly know what they like - their ideals are tightly clustered together; others are more uncertain - their ideal points form a larger cloud of points. Individuals do not have absolute ideal points; the ideal points vary momentarily depending on variables such as mood, time of day, arid recent consumption experience. Similarly products do not have exactly determined positions. Products are represented by clusters of varying size due to differences in momentary perception. Since some people like similar things, collections of individual ideal clusters may form what we generally describe as market segments. These segments may have simple demographic markers, such as age or gender. The markers may be more complex and derive from sensory experience, such as identification based on liking for sweet products. From the size of these collections of ideals, we could assess the potential of products that appeal to particular segments. If we
(0-
9 -
N -
. -6
4
-2
0
. 2
. 4
6
FIG.23.20. A PRODUCT AND INDIVIIXJAL IDEAL POINT SPACE
FOUNDATIONS OF SENSORY SCIENCE
421
could describe this space using reliable information about product characteristics, the result would be of immense value in product development and marketing. The vision of creating and exploring this space has stimulated considerable research (Ennis et al. 1988; Mullen et al. 1991; Ennis et al. 1994; Mackay ef al. 1995; De Soete et al. 1986). This report is an introduction to mapping individual ideals in relation to product positions in a sensory space. The resulting maps can be used to guide product development and marketing to provide products that satisfy clearly defined groups of consumers.
Scenario. You work for a company that produces orange juice, and would like to apply the above ideas to a selection of ten orange juice brands and prototypes. From descriptive analysis of these products, you know that the products differ in a number of respects. Some attributes on which they differ include sourness, pulpiness, sweetness, peely, burnt flavor, and bitterness. You would like to know whether there is a market for a relatively bitter orange juice product. In a recent consumer test, 200 consumers (composed of a demographically balanced set of 100 regular coffee drinkers and 100 non-coffee drinkers) evaluated the ten orange juice products on a 9-point liking scale. Group Versus Individual Ideals. In the previous scenario, I discussed the development of maps of a sensory space using preference data and a probabilistic ideal point model. The model was probabilistic because products and ideals were treated as distributions. In that scenario, I explained how preferences for products among a group of consumers could be displayed using a single ideal point distribution for the group. In an earlier report (Ennis et al. 1998), I demonstrated how liking ratings from a group of consumers could be used to find the drivers of consumer liking for the group. In both of these cases, it was assumed that the consumers belong to a homogeneous group represented by a single cluster or distribution. This assumption is justified if there is little evidence for segmentation. In this report we take the level of analysis deeper. Using liking ratings, our goal is to produce a map of products and individual ideals in which each consumer is represented by his or her own distribution. From this map, it is possible to identify collections of consumers that share similar ideal locations. The technique to achieve this goal does not use any sensory information from the products or any descriptive information about the consumers. The analysis relies entirely on liking ratings. Sensory and consumer descriptive information is used to describe the dimensions and the segments uncovered in the analysis. How Liking Ratings Arise. When a consumer rates a product “7” on a 9point liking scale, one can interpret this rating as a measure of distance between
428
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
the ideal value generated by the consumer and his or her perception of the product. The higher the rating, the smaller the distance. Each consumer’s ideal varies from moment to moment, as does the perception of each product. A consumer who gives a ‘‘9’’ rating to a product on one occasion, may not do so if presented with the same product again because the product or the ideal may not remain constant. When the same consumer rates multiple products, each rating gives information about the location of that person’s ideal point distribution. When data from multiple consumers are used, it is possible to locate the positions of products and individual ideal point means.
Individual Ideal Point Maps, Individual ideal point analysis of the orange juice products is shown in Fig, 23.21. This figure is a contour plot with a scatter plot of the ideal points superimposed on the contours. The plots were obtained from an ideal point model of the liking ratings. Although the individual distributions or clusters are not shown, each point in Fig. 23.21 is the mean of a distribution for each consumer. Consumer ideal points are identified by the symbols. From the location of all the ideal point means, a measure of density or congestion of each ideal point mean can be obtained. The contours correspond to the height of the density, which measures how close each consumer’s ideal is to others in the map. Figure 23.22 is a contour plot of the individual ideals along with a scatter plot of the products. The direction of descriptive attribute information is also included in Fig. 23.22, so that the dimensions of the space can be identified. Projections of the product points onto the scales shown provide a measure of the sensory intensity of each product on each attribute. The atlributes displayed drive consumer liking. It can be seen from the contour plot in Fig. 23.22 that there are two segments of consumers, one of which is concentrated around product 8, the most bitter product. The less bitter products 1, 4 and 10 are placed near the peak of the second segment. You conclude that the ideal points for coffee drinkers are closer to a relatively bitter orange juice. In addition to preference for bitterness, you also conclude that consumers differ in their preferences for sourness, peely and sweetness. Conclusion. Liking ratings, treated as a measure of similarity between products and ideals, can be used to locate product and individual ideal points on a sensory map. Once this map has been constructed, sensory descriptive and analytical data may be added to the map to interpret the dimensions driving individual liking decisions. By constructing an account of liking data at the individual level, this method reveals the existence of latent ideal point segments. Since it relies entirely on liking data to construct the ideal point map, it uncovers the underlying sensory dimensions that drive individual liking. The method has
FOUNDATIONS OF SENSORY SCIENCE
429
U
z
0
E U
x
x
3
N
430
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
FIG.23.22. SENSORY DIRECTIONS ON A CONTOUR PLOT WITH PRODUCTS
value to researchers interested in identifying ideal point segments, in determining drivers of consumer liking and in locating products on these drivers to maximize consumer satisfaction. REFERENCES [The IFPress@references can be obtained at the website www.ifpress.com] BI, J., ENNIS, D.M. and O'MAHONY, M. 1997. How to estimate and use the variance of d' from difference tests. J. Sensory Studies 12, 87-104. DE SOETE. G.,CARROLL, J.D. and DeSARBO, W.S.1986. The wandering ideal point model: A probabilistic multidimensional unfolding model for paired comparisons data. J. Mathematical Psychol. 30, 28-41. DORFMAN, D.D. and ALF JR., E. 1969. Maximum likelihood estimation of parameters of signal detection theory and determination of confidence intervals - Rating method data. J. Mathematical Psychol. 6, 487-496. ENNIS, D.M. 1993a. The power of sensory discrimination methods. J. Sensory Studies 8, 353-370. ENNIS, D.M. 1993b. A single multidimensional model for discrimination, identification and preferential choice. .Acta Psychologica 84, 17-27.
FOUNDATIONS OF SENSORY SCIENCE
43 1
ENNIS, D.M. 1998a. Thurstonian scaling for difference tests. IFPress 1(3), 2-3. ENNIS, D.M. 1998b. Foundations of sensory science and a vision for the future. Food Technol. 52, 78-89. ENNIS, D.M. 1999. Multivariate preference mapping. IFPress 2(2), 2-3. ENNIS, D.M. 2001. Drivers of Liking@for multiple segments. IFPress #(1), 2-3. ENNIS, D.M. and BI, J. 1998a. The Beta-Binomial model: Accounting for inter-trial variation in replicated difference and preference tests. J . Sensory Studies 13, 389-412. ENNIS, D.M. and BI, J. 1998b. Drivers of Liking. IFPress 1(1), 2-3. ENNIS, D.M. and JOHNSON, N.L. 1993. Thurstone-Shepard similarity models as special cases of moment-generating functions. J. Mathematical Psychol. 37, 104-110. ENNIS, D.M. and JOHNSON, N.L. 1994. A general model for preferential and triadic choice in terms of central F distribution functions. Psychometrika 59, 91-96. ENNIS, D. M., MULLEN, K. and FRIJTERS, J.E. 1988. Variants of the method of triads: Unidimensional Thurstonian models. British J. Mathematical and Statistical Psychol. 41, 25-36. ENNIS, D.M. and O’MAHONY, M. 1995. Probabilistic models for sequential taste effects in triadic choice. J. Experimental Psychol. Human Perception and Performance 21, 1-10. ENNIS, D.M., PALEN, J. and MULLEN, K. 1988, A multidimensional stochastic theory of similarity. 3. Mathematical Psychol. 32, 449-465. FRIJTERS, J.E.R. 1979. The paradox of the discriminatory non-discriminators resolved. Chemical Senses and Flavour 4, 355. JUSLIN, P. and OLSSON, H. 1997. Thurstonian and Brunswikian origins of uncertainty in judgment: A sampling model of confidence in sensory discrimination. Psychological Rev. 104,344-366. MACKAY, D.B., EASLEY, R.F. and ZINNES, J.L. 1995. A single ideal point model for market structure analysis. J. Marketing Res. 32, 433-443. MULLEN, K. and ENNIS, D.M. 1991. A simple multivariate probabilistic model for preferential and triadic choices. Psychometrika 56, 69-75. ROUSSEAU, B. and ENNIS, D.M. 2001. How retasting can improve the power of product testing. IFPress #(2), 2-3. ROUSSEAU, B. and ENNIS, D.M. 2002. The multiple dual-pair. Perception & Psychophysics (In Press). ROUSSEAU, B., MEYER, A. and O’MAHONY, M. 1998. Power and sensitivity of the same-different test: Comparison with triangle and duo-trio methods. J. Sensory Studies 13, 149-173.
432
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
ROUSSEAU, B. and O’MAHONY, M. 1997. Sensory difference tests: Thurstonian and SSA predictions for vanilla flavored yogurts. J. Sensory Studies 12, 127-146. ROUSSEAU, B. and O’MAHONY, M. 2000. Investigation of the effect of within-trial retasting and comparison of the dual-pair, same-different and triangle paradigms. Food Quality and Preference 11, 457-464. ROUSSEAU, B., ROGEAUX, M. and O’MAHONY, M. 1999. Mustard discrimination by same-different and triangle tests: Aspects of irritation, memory and t criteria. Food Quality arid Preference 10, 173-184. ROUSSEAU, B., STROH, S. and O’MAHONY, M. 2002. Investigating more powerful discrimination tests with consumers: Effects of memory and response bias. Food Quality and Preference 13, 39-45. THURSTONE, L.L. 1927a. A law of comparative judgement. Psychological Rev. 34, 273-286. THURSTONE, L.L. 1927b. Psychophysical analysis. Amer. J. Psychol. 38, 368-389. THURSTONE, L.L. 1927c. Three psychophysical laws. Psychological Rev. 34, 424-432. TORGERSON, W.S. 1958. Theory and Methods of Scaling. John Wiley & Sons, New York.
CHAPTER 24 APPLICATIONS OF S A P PROGRAMMING LANGUAGE IN SENSORY SCIENCE MAXIM0 C. GACULA, JR. In the past two decades, the scientific and industrial communities have been flooded with computer software facilitating a rapid turn-around time of evaluation of experimental data. There are essentially two types of software: (1) One is the so-called “turn-key” or menu-driven system. This type of system is popular among professionals who are busy with their research activities and involved in gathering experimental data. It is simple to use in the sense that a “point and click” on a particular statistical method provides the result of the analysis including graphical and mapping analyses. Many statisticians are also using this type of software for simplicity, particularly for its excellent graphing capabilities. Examples of turn-key systems that the author has used are Design-Ease (Stat-Ease 1994). Design-Expert (Stat-Ease 1999), Statistix (Analytical Software 1996), and NCSS (NCSS 1998). The softwares that were developed for use in sensory evaluation are, among others, software by Compusense Inc., CAM0 Inc., and Biosystkmes. An example of the sensory evaluation use of Statistix and Design-Expert is given in- Gacula (1997). (2) The other type is software that requires programming or program coding for its use (command-driven system). This type requires knowledge and training for the users. Briefly, program commands or codes are used as instructions for the computer to perform the required tasks. Examples of this software that the author has used are SYSTAT and SAS (Statistical Analysis System). In this chapter, the author shares his experiences with the use of SAS in sensory evaluation and other types of applications useful in the biological areas. In the last decade, several softwares written for specific areas of applications have emerged (Harrell and Goldstein 1997), i.e., EPILOG PLUS 3 for clinical trials and epidemiology, and TRUE EPISTAT for health and medical research. Some software comprise specialized statistical programs, whereas others are general purpose software.
433
434
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
Software Packages for General Use Morgan (1998) reviewed eight software packages for general use. Three of these packages are given in Table 24.1. In addition, SAS (SAS Institute 1999) and SYSTAT (SPSS, Inc. 1998) are listed which comprise software for statisticians and trained professionals. The content of this table is intended to show only the statistical capabilities of the software. Where it applies, statistical method is not specified, i.e., Xbar chart for Quality Control, various models for nonlinear regression, and partial least-squares and ridge regression for multivariate methods. In Morgan’s review, the results of analysis of 20 test problems using Gprism, NCSS, and Statistix were compared to SAS for accuracy. Using only the range of analysis as the only criteria, Morgan recommended NCSS. In particular, NCSS has extensive capabilities for nonlinear regression analysis, which involves iterative fitting of a specified nonlinear model form. For users who do not need all the features of NCSS, Statistix is the next choice. GraphPad Prism is unique, targeting users in the biological sciences and biologicallyoriented laboratories. Recognizing that these software packages are continually being improved and upgraded, it is expected that new capabilities have been added since its last version. Based on the experience of the author, NCSS and Statistix are recommended for sensory professionals and SAS for statisticians and for other trained professionals. If cost is an important consideration, then SYSTAT and NCSS would be the choices, assuming that these software meet the statistical needs. For good graphing capabilities, SAS/STAT must be combined with SAS/GRAPH.
Common SAS Input Format in Sensory Evaluation The basic question that sensory scientists are faced with is how the data will be entered in, creating a database. The data should be entered according to the questionnaire design. Each panelist evaluates the product or stimuli according to the questionnaire. The most common input format is: panelist, sex, order, product, attributes. Other columns that define the observations can be added, such as replication. The number of attributes can vary and should be represented by each column. Almost all data collection software follows this format. For example for 5 attributes, 3 products, and 100 panelists (each panelist evaluating each product), we will have 9 columns and 300 rows in the database. SAS can read the data by two statements in the program code: infile ‘c:\consumer.dat’ lrecl= 1000 dsd; input panelist sex order product xl-x5;
APPLICATIONS OF PROGRAMMING LANGUAGE IN SENSORY SCIENCE
435
As expected x1-x5 refer to attributes 1. 2, 3, 4, 5 . For missing information or observations, a dot (.) should be typed in. An advantage of this method is that it is easy to spot-check the data. If the data are in a floppy disk, replace the “c drive” by the “a drive” in the infile statement: infile ‘a:\consumer.dat’ lrecl= 1000 dsd; input panelist sex order product xl-x5; In case the data are in an Excel format, it should be saved as a comma separated value “ .csv”: infile ‘c:\consumer.csv’ delimiter ‘,’; input panelist sex order product xl-x5;
If the columns are specified by column numbers, then these numbers must appear after each column in the input statement: panelist 2-4 sex 5 order 6-12 product 13-14 xl20-21 x2 22-23, etc. This indicates that panelist number occurs in columns 2 to 4, sex in column 5 , and so forth. In this situation, it is critical that the sensory scientist must consult with the programmer or statistician to check that the columns are properly read by the SAS code. SAS Code for Paired Comparison Consider the descriptive data in Table 24.2 where an attribute score is based on a 0-15 rating scale. If the data are typed directly from the score sheet, it can be done directly into the SAS system as shown below. For this example, it is saved to a floppy disk named descriptive.dat (a:\descriptive.dat). 1 2 3 4 5
l 1 1 1 1
a b a b a
6 8 7 8 9
5 8 8 7 8
1 2 3 4 5
2 2 2 2 2
b a b a b
8 8 8 9 9
6 9 8 9 9
As stated earlier, a dot (.) should be typed for any missing observation. Table 24.3 shows the program code to read the data file “descriptive.dat” and the consequent statistical analysis.
436
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
All possible
X
X
X
X
Logistic
X
X
X
X
Nonlinear
X
X
X
X
APPLICATIONS OF PROGRAMMING LANGUAGE IN SENSORY SCIENCE
Statistics
I SAS
437
I SYSTAT
Multivariate: MANOVA
X
X
Cluster analysis
X
X
Factor analysis
X
X
Principal components
X
X
Discriminant analysis
X
X
Correspondence analysis
X
X
ARIMA models
X
X
Exponential smoothing
X
X
I
Time Series:
Quality control: Control charts
X
Pareto charts
X
Other: Experimental design Survival analysis Tests of normality Sample size calculation Probability Bootstrap
X
X
X
TABLE 24.2. SAMPLE DESCRIPTIVE DATA
I
I
438
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
TABLE 24.3. SAS PROGRAM CODE FOR PAIRED COMPARISON ANALYSIS
*prog chapt24a.sas; data a(keep=judge rep product al-a2) data b(keep=judge rep product bl-b2); %let title=Chapter 24; infile 'a:\descriptive.dat'; input judge rep product $ @@; if product='a' then do; input al-a2 @@; output a; end; if product='b' then do; input bl-b2 @@; output b; end; run; proc sort data=a; by judge; run; proc sort data=b; by judge; run; data c; merge a b; by judge; diffl=al-bl; diff2=a2-b2; run; proc print data=c; var diffl d i m ; tit1el"Paired differences by judges for attributes 1 and 2"; run;
proc means mean n std t prt maxdec=3 data=c; var diffldiff2; title1"Paired t test"; title2"diffl =difference between products for attribute 1"; title3"difPL=difference between products for attribute 2"; run;
APPLICATIONS OF PROGRAMMING LANGUAGE IN SENSORY SCIENCE
439
An important command in this code is the creation of two outputs so that differences between products can be obtained: “output a” and “output b” . These outputs are then merged and the resultant merged filed is called “data c”. Then the paired analysis can proceed, the results of which are shown in Table 24.4. The code in Table 24.3 can also be used to analyze evaluation of two treatments over time, i.e., day1 = a1 - b l , day2 = a2 - b2, day3 = a3 - b3, etc. This situation occurs in irritation studies where treatments are applied in human armpits, forearms, and other. TABLE 24.4. SAS OUTPUT FOR PROGRAM CODE GIVEN IN TABLE 24.3 Paired differences by judges for attributes 1 and 2 Obs
diffl
diff2
1 2 3 4
-2 0 -1 1 0
-1 1 0 2 -1
5
Paired t test diffl =difference between products for attribute 1 diff2 =difference between products for attribute 2 The MEANS Procedure Variable diffl diff2
Mean -0.400 0.200
N
Std Dev
5 5
1.140 1.304
t Value
-0.78 0.34
Pr > It1 0.4766 0.7489
SAS Code for Group Comparison For the data format in Table 24.2, the SAS program code is simpler, as shown in Table 24.5. In this program, one can increase the number of products evaluated, as well as the number of attributes. Once the statistician has gone over the code line by line, it is expected that a sensory scientist has a better understanding of the code and can even modify the code to accommodate an increased number of attributes. Table 24.6 shows the output of the program. The output shows the analysis of variance and multiple comparison using the Duncan’s test. For simplicity, the descriptive statistics are not shown. For those not familiar with the output, a statistician should be consulted.
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
440
SAS Program Code for Nonlinear Regression A common experimental data encountered in practice is in prediction of responses over time or relationship between two variables, i.e., dependent Y and independent X, in which the relationship is nonlinear in the parameters and is difficult to model because it involves an iterative process. In Sensory Science, nonlinear regression is used in time intensity studies, such as evaluating the rate of malodor disappearance in antiperspirant and deodorant products. In this section, the amount of vitamin C absorbed in the skin will be used as an example. In addition, the author's viewpoint on model building is given by modifying the application of the jackknife method. TABLE 24.5. PROGRAM CODE FOR GROUP COMPARISON USING THE DATA FORMAT IN TABLE 24.2 *prog chapt24b.sas; data a; infde "a:\descriptive.dat"; input judge rep product $ x l x2; proc sort data=a; by product judge; run; proc glm data=a; class product judge; model xl-x2 = judge product; means product/Duncan; title1 "Analysisof variance";
run; proc sort data=a; by product; run; proc means mean n std maxdec=3 data=a; var xl x2; by product; Nn;
Data from skin penetration study are used to develop a penetration model so that the amount of vitamin C absorbed can be measured. Like sensory evaluation data, the expected variability in this type of experiment is large due to response variability by human subjects. In this example, the response measured is in micrograms of vitamin C absorbed from eight healthy subjects.
APPLICATIONS OF PROGRAMMING LANGUAGE IN SENSORY SCIENCE
TABLE 24.6. SAS OUTPUT FOR ATTRIBUTE X1 Analysis of variance The GLM Procedure Class Level Information Class
Levels
Values
product 2 5 judge
ab 12345
Number of observations 10 Analysis of variance Dependent Variable: XI Source Model Error Corrected Total
DF
Sum of Squares
5 4 9
5.40000000 2.60000000 8 .00000000
Mean Square
F Value
1.08000000 0.6500oooO
1.66
R-Square
Coeff Var
Root MSE
x l Mean
0.675000
10.07782
0.806226
8.000000
Source
DF
judge product
4 1
Source
DF
judge product
4 1
Type I SS
Mean Square 1.25000000 0.40000000
5.00000000 0.40000000
Type I11 SS
Mean Square 1.25000000 0.40000000
5 .OOOOOOOO 0.40000000
Pr
>F
0.3215
F Value
Pr > F
1.92 0.62
0.2710 0.4766
F Value
Pr
1.92 0.62
0.2710 0.4766
>F
Duncan’s Multiple Range Test for x l NOTE: This test controls the Type I comparisonwise error rate, not the experimentwise error rate. Alpha 0.05 Error Degrees of Freedom 4 Error Mean Square 0.65 Number of Means Critical Range
2 1.416
Means with the same letter are not significantly different. Duncan Grouping A A A
Mean
N
product
8.2000
5
b
7.8000
5
a
441
442
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
Several statistical models were explored using known biological information on what the penetration curve should theoretically look like. We will use the skin penetration data for subject 1, which fits the model with no jackknifing and subject 7 which requires extensive jackknifing. Table 24.7 shows the amount Y (microg,rams) of vitamin C measured on the skin from 0 to 22 hours (X) for the 18 subjects (Obs) used in the study. An exploratory analysis using various forms of nonlinear curve (nonlinear in at least one of the parameters) showed that a 4-parameter model is the best fit model to describe the relationship between Y and X:
Y = (a
+ b*X)exp(-c*X) + d
with parameters a, b, c, and d. In this model, the relationship between Y and X is said to be nonlinear in parameter c. Using the data in Table 24.7, the estimated parameters are: a = -2.0857, b = 7.5432, c = 0.8662, d = 5.5688. TABLE 24.7. SKIN PENETRATION DATA FOR SUBJECT 1 Obs
Y
hour (X)
1
3.74 5.81 7.60 6.66 7.83 3.85 5.26 7.56 3.62 3.12 10.75 7.83 6.19 8.10 3.27 6.70 4.09 1.40
0.00 22.00 4.00 8.00 6.00 22.00 8.00 4.00 6.00 0.00 1.00 0.25 2.00 0.50 0.25 2.00 0.50 1.00
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
1 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1
Figure 24.1 shows the plot of the nonlinear regression model, which supports the biological curve. The vertical dash line at X = 2 hours theoretically divides the curve into two areas. The left side of the line is known as “uptake/ * Table 24.8 contains penetration” and the right side, “absorption/elimination.
APPLICATIONS OF PROGRAMMING LANGUAGE IN SENSORY SCIENCE
443
the program code for calculating the parameters of the model. Note the data shown in Table 24.7 is also plotted in this figure. It is important at this point that the program code should be reviewed with the sensory analyst. It is the experience of the author that once this is done the analyst will ultimately run the program independently. The 4-parameter regression model was applied to the remaining 7 subjects used in the study. Because of the presence of large response variability, not all data from the subjects have solutions (no statistical convergence). Hence, a modified jackknife procedure was applied. This was accomplished by eliminating some observations that visually appear out of the theoreticalhiological curve, then estimating the parameters without those observations. Then these observations were put back to the subject’s database and overlaid into the 4parameter nonlinear regression curve. The modified jackknife technique was applied to subject 7 (n=6) that resulted in statistical convergence. The plot of the resulting regression curve is shown in Fig. 24.2 with parameter estimates of a = -1.2886, b = 2.6068, c = 0.3198, d = 4.8979, and substituting into the model,
Y
= (-1.2886
+ 2.6068*X)exp(-0.3198*X) + 4.8979
Skin Wnsbation Shdy: S u q d 1
e : j
* !
FIG. 24.1. NONLINEAR REGRESSION CURVE FOR SUBJECT 1 WHERE ALL DATA WERE USED
I
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
444
TABLE 24.8. SAS PROGRAM CODE FOR SUBJECT 1 DATA IN TABLE 24.7
*prog SkinSegYl .sas; %let title=Skin Penetration Study Using Nonlinear Model: Subject 1; data a; input y x group; cards; 3.74 5.81 7.60 6.66 7.83 3.85 5.26 7.56 3.62 3.12 10.75 7.83 6.19 8.10 3.27 6.70 4.09 7.40
0
6
1 2 2 2 2
22 8
2 2
22 4 8
4
2
6 0 1 0.25 2 0.50 0.25 2 0.50 1
2 1 1 1 1
1 1 1
1 1
data b; do x=O to 22 by S O ; output; end; run; data a; set a b; run; proc nlin data=a best=4; parms a = l to 5 by 1 b = l to 5 by 1 c = l to .5 by 1 d = l to 5 by 1; temp =exp(-c*x); model y=(a + b*x)*temp d ; output out=result p=predict 195 =lower u95 = upper; title"8ctitle"; run;
+
proc sort data=result; by x; run;
APPLICATIONS OF PROGRAMMING LANGUAGE IN SENSORY SCIENCE
445
proc print data=result; var x y predict; title"&title" ; run; GOPTIONS RESET =ALL; *DEVICE= WINPRTM *NODISPLAY; symbol1 color=black value=star height= 1.2; run; symbol2 color=black interpol=join value=circle height=1.2; run; symbol3 color=black interpol=join value=dot height=.01 width = 3 line = 2; run; symbol4 color=black interpol=join value=dot height=.01 width = 3 line = 2; run; axislorder=(O to 22 by 1) major=(height =0.4) minor=(height=0.2 number=5) value=(FONT=ZAPF height= .8) label=(FONT=ZAPF height= 1 justify =center 'Hour'); axis2order=(0 to 17 by 1) major=(height= 1) value=(FONT=ZAPF height = .8) label=(FONT=ZAPF height= 1 angle=90 justify =center 'ug AA/sam'); PROC GPLOT data=result; PLOT y*x predict*x / overlay haxis =axis1 vaxis=axis2 href=2 Ihref=2; TITLE1 FONT=ZAPF H = l J = L 'Skin Penetration Study: Subject 1'; run; quit; Note: Skin penetration data courtesy of Best Skin Care (Norwalk, CT), Zila Nutraceuticals(Prescott, AZ), and Technikos Research Associates (Scottsdale, AZ).
Figure 24.3 shows when all data are overlaid into the model. Notice the good fit of the 10 observations that were excluded in the computation of the nonlinear regression curve. It can be stated that once we know the theoretical curve, a solution can be found by using the modified jackknife technique. In the vitamin C study, the data for 5 of the 8 subjects did not converge without using
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
446
the modified jackknife technique. The author welcomes viewpoints and confirmation of the technique.
Skin Msbation Study Suqect 7
m 6 14
B U
I1
f i n
3: 7
00
a
n 6 4
3 2 1 0
o
I
I
3
4
s
6
7
8
9
n
11
w
D
w
s
m
17
m
m
10
11
II
Hour
FIG. 24.2. NONLINEAR REGRESSION CURVE USING ONLY THE JACKKNIFE DATA OF n=6
Skin M6tration Study Overlay o f Subject 7 I7
m 6
M
T1 U
{; m
7 6
6 I
s I 1 0
o
I
I
s
a
6
6
7
e
8
n
11
11
D
u
s
m
17
m
I)
10
Hour
FIG. 24.3. NONLINEAR REGRESSION CURVE WITH ALL DATA (n=16) SUPERIMPOSED
1 1 1 1
APPLICATIONS OF PROGRAMMING LANGUAGE IN SENSORY SCIENCE
447
Polynomial Regression The previous section discussed non-linear regression. The curve for the polynomial and nonlinear regressions looks very similar. They differ in the parameter estimates. In the polynomial regression, the parameters are linear as contrasted to nonlinear regression where at least one of the parameters are nonlinear as defined earlier in this section. The polynomial regression equation is given by
Notice the difference between the above model and that given earlier for nonlinear regression. If the model stops at X3,then it is known as cubic polynomial or 3rd degree polynomial regression. This is the most common degree seen in statistical analysis of data. It is primarily used to obtain a point estimate within the polynomial regression line. For definition of the terms in the equation, see Chap. 21. In SAS, the code is simple consisting of one statement that provides all that are needed in the analysis: regression plot with its 95% confidence limits and the parameter estimates of the polynomial regression. In addition, the statement automatically fits the data to the appropriate degree. The statement is “interpol = rcclm95” for 95% confidence limits and “interpol = rcclm99” for a 99%. The complete program code is given in Table 24.9. Note that more columns can be added in the input statement, for example to represent replications or analysts. In this example, retention was measured at various concentrations to determine an endpoint of interest which is 85% from the control (Ward and Gacula 2000). Figure 24.4 shows the result showing the polynomial regression line and its 95 % confidence limits and endpoint indicated by a horizontal line at retention = 85. The cubic polynomial regression line is given by Retention Y = 85.49 - 43.96X - 18.76X2 + 11.25X3 where X = log concentration. Unlike other SAS codes, the code in Table 24.9 is unique in the sense that only the plot is given in the output and the regression equation is obtained by clicking the “log” in the SAS View menu.
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
448
TABLE 24.9. SAS PROGRAM CODE FOR A POLYNOMIAL REGRESSION ANALYSIS
*prog Gassay11.sas; data a; input concen Retention; %let title=HCE TEP Assay ; Concentration=log 1O(concen); goptions reset=global gunit=pct ftext =swissb htitle=4 htext =3; cards;
0.1 0.1 0.1
99.4 99.6 99.4
1.o 1 .o
87.9 82.0
1.o
85.5
2.0 2.0 2.0
71.8 72.8 69.4
10.0 10.0 10.0 100.0 100.0 100.0
40.3 34.9 26.4 20.2 5.0 12.5
title ‘Polynomial Cubic Regression and its 95% CL’; symbol interpol = rcclm95 value =star height=3 width=2; proc gplot data=a; plot Retention*Concentration I haxis=-1 to 2 by .2 vaxis=O to 100 by 5 hminor = 1 vref =85;
run;
APPLICATIONS OF PROGRAMMING LANGUAGE IN SENSORY SCIENCE
449
Pdynanlai Cubic Ragrssrkn and It0 96% CL
FIG. 24.4. RESULTING GRAPH FROM THE SAS CODE GIVEN IN TABLE 24.9
Other SAS Program Codes Statistical methods commonly used in Sensory Science are stepwise regression, factor analysis, and principal component analysis. The program codes for these methods are given in Gacula (1997). Examples of their use are given to illustrate their simplicity. The “macros” and “proc format” will not be covered in this chapter, again for reason of simplicity. Another common input format used in deodorant and skinfeel studies is where the columns are panelist number, right axilldforearm, left axilla/forearm, and attributes. Assuming five attributes, the input SAS code is: input panelist, right, left, xl-x5; In this example, we will again illustrate a program code for a lotion study where the data and the program commands are combined by the “card statement”: data lotion; input panelist, right, left, xl-x5; cards; Table 24.10 shows the SAS program code used to analyze the difference between products A and B, assigned in balanced order to the right and left forearms. We will designate the assignment of the two products on the forearms as “orderl” and “order2” in the input statement. The “if“ statement appropriately assigns the ratingdscores to the corresponding products and attributes. In the absence of SAWGRAPH, the “proc chart” command is used.
450
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
TABLE 24.10. PROGRAM CODE AND SAMPLE DATA FOR THE LOTION STUDY
*prog paired; %let title=Chapter 24; data lotion; input panelist order1 $ order2 $ @a; label
xla='overall liking' x2a='fragrance appropriateness' x3a='skin softness after drying' x4a='moisturizes skin/ xlb='overall liking' x2b='fragrance appropriateness' x3b='skin softness after drying' x4b='moisturizes skin';
if orderl='a' then do; order2='b'; input xla xlb x2a x2b x3a x3b x4a x4b: end ; if orderl='b' then do; order2='af; input xlb xla x2b x2a x3b x3a x4b x4a: end ; cards; 1 a b 7 6 6 6 6 5 5 6 2 b a 6 6 5 6 6 5 7 7 3 a b 7 6 5 6 7 5 7 6 4 b a s 6 5 6 6 5 5 6 5 a b 5 5 5 6 5 5 7 5 6 b a s 7 5 7 6 6 6 7 7 a b 5 6 7 6 6 6 6 6 8 b a 6 8 5 6 6 1 6 7 9 a b 7 5 6 6 5 6 7 6 10 b a 6 7 7 7 7 6 6 7 data lotion; set lotion; diffl=xla-xlb; diff2=x2a-x2b; diff3=x3a-x3b; diff4=x4a-x4b; run ; proc means mean n std maxdec=2; var xla x2a x3a x4a; tit1el &title ; title2"Descriptive statistics for product A " ; run ; proc means mean n std maxdec=2; var xlb x2b x3b x4b; tit1el I' &title 1' ; title2ttDescriptive statistics for product Big; run ; proc means mean n std t prt maxdec=2; var diffl-diff4; tit1el &title'I ; title2"Descriptive statistics for A - B and test of significance"; title3"Diffl=Overallliking Diff2=Fragrance appropriateness"; title4"Diff3=Skin softness after drying Diff4=Moisturizes skinit; run; 11
APPLICATIONS OF PROGRAMMING LANGUAGE IN SENSORY SCIENCE
45 1
proc chart; hbar xla / discrete; title 1 'I &title 'I ; title2"Overall liking chart for product A"; run ; proc chart; hbar xlb / discrete; tit lel It &tit 1e 'I ; title2"Overall liking chart for product B"; run :
Table 24.11 shows the output of the program given in Table 24.10. The output is self-explanatory. By the paired t-test, product A was significantly higher in means for "overall liking" (P=0.0187)and "moisturizes skin" (P=0.0248). The charts of overall liking for products A and B further clarify the significant results (Fig. 24.5). TABLE 24.11. THE SAS OUTPUT FOR THE LOTION STUDY Descriptive statistics for product A The MEANS Procedure Variable
Label
Mean
N
xla x2a x3a x4a
overall liking fragrance appropriateness skin softness after drying moisturizes skin
6.50 6.10 5.80 6.60
10 10 10 10
Std Dev
0.97 0.74 0.79 0.70
Descriptive statistics for product B The MEANS Procedure Variable
Label
Mean
N
xlb x2b x3b x4b
overall liking fragrance appropriateness skin softness after drying moisturizes skin
5.60 5.70 5.80
10 10 10 10
5.90
Std Dev 0.52 0.67 0.63 0.57
Descriptive statistics for A-B and test of significance Diffl =Overall liking DifPl= Fragrance appropriateness Dim =Skin softness after drying Diff4=Moisturizes skin The MEANS Procedure Variable
Mean
N
Std Dev
t Value
diffl dim diff3 diff4
0.90 0.40 0.00 0.70
10 10 10 10
0.99 0.97
2.86 1.31 0.00 2.69
1.05
0.82
Pr > It1 0.0187 0.2229 1.moo 0.0248
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
452
Overall liking chart for product A Freq
Cum. Freq
Percent
Cum. Percent
2
i
20.00
20.00
2
d
20.00
40.00
overall liking 5
6
, 0
;....,..... ).ttt.+.*tt
,.t.ttt.tt***ttttt.t.,****
5
: I
50.00
g0.00
;*,***
1
111
10.00
100.00
_________________-----------1
2
3
4
5
Frequency Overall liking chart for product B overall liking Freq
cum. Freq
Percent
cum. Percent
5
,~,ttttt.**ttttt*ttt*
4
I
40.00
40.00
6
,ttt.ttt..*t*t*****t***********
6
10
60.00
100.00
1
2
3
4
5
6
Frequency
FIG. 24.5. CHARTS OF OVERALL LIKING FOR PRODUCTS A AND B
As emphasized earlier, it is important for the statistician to review the program codes line by line and the corresponding output to sensory scientists to facilitate the learning process.
Computer Simulation Sensory Science departments generate large amounts of Research Guidance Panel (RGP) and Descriptive Analysis (DA) data. The relationship between these two data sets is usually calculated to guide various aspects of product development. For example, the relationship between “overall liking” from RGP and, say, “moisturizes skin” from DA is of great interest in lotion studies. Consider the data in Table 24.12 involving five prototypes and an existing product already in the market. In this table Ihe variance of overall liking and its corresponding 2 standard deviations are given which are used in computer simulation. Recall that 2 standard deviations constitute 95 % of the population. A total of 125 panelists were recruited for the RGP and 10 trained panelists were used for DA.
APPLICATIONS OF PROGRAMMING LANGUAGE IN SENSORY SCIENCE
453
TABLE 24.12. RESEARCH GUIDANCE PANEL AND DESCRIPTIVE ANALYSIS DATA FOR SIX PRODUCTS
Product Code
Overall liking f SD (1-9 hedonic scale)
Variance using 2 SD
Moisturizes Skin (0-15 rating scale)
4.080 6.6 f 1.20
I
3
7.1 f 1.00
4.000
12.4
4
6.4 f 0.97
3.164
8.0
5
6.8 f 1.25
6.250
11.4
6
I
6.0 f 1.00
I
4.000
I
7.1
I
Note: SD = standard deviation
The first step in the relationship model building is to conduct a regression analysis between overall liking and moisturizes skin. The SAS program code for performing the analysis is given in Table 24.13 and the output shown in Table 24.14. The regression plot is given in Fig. 24.6 along with its 95% prediction limits. As given in the output, the adjusted R-squared is 0.84, a good indication of the relation between overall liking and moisturizes skin. In order to cross-validate the results of the linear regression analysis, it is not necessary to repeat the experiment that will involve another expense. A computer simulation can be done using the mean and variance of the concluded study. Table 24.15 shows the SAS code for the simulation. Except for the mean and variance, this code is similar to that given in Chapter 14 (Table 14.7). The results of the simulation are graphed in Fig. 24.7 for N=25 and in Fig. 24.8 for N= 100. Notice that the overall liking simulated mean scores are mostly below the regression line for N=25 and above for N = 100. This result is purely random as a result of variability in overall liking. All of the plotted mean scores lie inside the prediction model suggesting cross-validation of the original data without repetition of the experiment. Computer simulation is a valuable tool in sensory evaluation and consumer testing that should be applied in the years to come.
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
454
TABLE 24.13. SAS PROGRAM CODE FOR LINEAR REGRESSION ANALYSIS *prog chapt24c.sas; data a; input prod overall mOiSNriZe; cards; 1 7.0 2 6.6 3 7.1 4 6.4 5 6.8 6 6.0
12.0 11.5 12.4 8.0 11.4 7.1
proc reg data =a ; model overall = moisturize / r; output out=result p=predict r=residual195 =lower u95 =upper; title1"Chapter 24: Regression analysis"; Nn; proc sort data = result; by moisturize; Nn; GOPTIONS RESET = ALL; *DEVICE =WINPRTM *NODISPLAY; symbol1 color=black value=star height= 1.2; Nun;
symbol2 color=black interpol=joinvalue=dot height- .01 width = 3; Nn; symbol3color=black interpol=joinvalue=dot height=.Ol. width = 3 line = 2; Nun; symbol4 color=blackinterpol=joinvalue=dotheight=.01 width = 3 line = 2; run; axis1 order=(6.5to 13 by 3) minor=(height =0.2number = 5 ) value=(FONT=ZAPF height= .8) label=(FONT=ZAPF height= 1justify=center 'Moisturize Skin'); axis2 order=(5.5to 7.5 by 3) major =(height =0.4) major =(height = 1) value=(FONT=ZAPF height= .8) label=(FONT=ZAPF height= 1 angle=Wjustify=center 'Overall Liking'); PROC GPLOT data = result; lower*moisNrizeupper*moisNrize PLOT overa~~*moisNrizepredict*moisNrize I overlay haxis =axis1 vaxis=axis2; TITLE1 FONT=ZAPF H= 1 J = L '95% Prediction Limits and Scatter Plot'; run; quit;
APPLICATIONS OF PROGRAMMING LANGUAGE IN SENSORY SCIENCE
455
TABLE 24.14. SAS REGRESSION OUTPUT FOR THE LOTION STUDY The REG Procedure Model: MODEL1 Dependent Variable: overall Analysis of Variance
Source Model Error Corrected Total
DF
Sum of Squares
Mean Square
1 4 5
0.72738 0.10762 0.83500
0.72738 0.02690
Root MSE Dependent Mean Coeff Var
Variable Intercept moisturize
R-Square Adj R-Sq
0.16403 6.65000 2.46658
F Value
Pr > F
27.04
0.0065
0.8711 0.8389
DF
Parameter Estimate
Standard Error
t Value
1 1
4.89076 0.16916
0.34491 0.03253
14.18 5.20
pr
> It1
0.0001 0.0065
Std Error
Obs
prod
Dep Var overall
Predicted Value
Std Error Mean Predict
Residual
Residual
1 2 3 4
1 2 3 4 5 6
7 .OOOO 6.6000 7.1000 6.4000 6.8000 6.0000
6.9207 6.8361 6.9883 6.2440 6.8192 6.0918
0.0848 0.0759 0.0934 0.1029 0.0744 0.1265
0.0793 -0.2361 0.1117 0.1560 -0.0192 -0.0918
0.140 0.145 0.135 0.128 0.146 0.104
5
6
Obs
prod
1 2 3 4
1 2 3 4
5
5
6
6
Sum of Residuals Sum of Squared Residuals Predicted Residual SS (PRESS)
Student Residual 0.565 -1.624 0.828 1.221 -0.131 -0.879 0 0.10762 0.24734
Cook's D
-2-1 0 1 2
***
I* I I* I** I *I
I
I I I
I I
0.058 0.359 0.164 0.483 0.002 0.568
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
456
96% Pdiction limits and Scatter Plot
LO
*
......._........ .--.........
-
Moislurim Skin
FIG. 24.6. REGRESSION OF OVERALL LIKING ON MOISTURIZES SKIN AND ITS 95% PREDICTION LIMITS: OVERALL LIKING '= 4.89 + 0.17(MOISTURIZES SKIN). TABLE 24.15. SAS PROGRAM CODE FOR SIMULATING OVERALL LIKING WITH N = 100
*prog chap44cSim.sas; data normal; *2 sd; retain seedl seed2 seed3 seed4 seed5 seed6 100000; do i= 1 to 100; xl=7.0 sqrt(4.080)*rannor(seedl); x2 =6.6 + sqrt(5.760)*rannor(seed2); x3 =7.1 sqrt(4.000)*rannor(seed3); x4=6.4 + sqrt(3.761)*rannor(seed4); x5 =6.8 sqrt(6.250)+rannor(seed5); x6=6.0 sqrt(4.000)*rannor(seed6); if i= 1 then do; seed2 = 100000; seed3= 100000; seed4 = 100OOO; seed5 = 100000; seed6= 100000; end;
+ + + +
output;
end; run; proc print; id i; var seedl seed2 seed3 seed4 seed5 seed6 xl x2 x3 x4 x5 x6; title"Simu1ating products 1,2,3,4,5,6"; run;
APPLICATIONS OF PROGRAMMING LANGUAGE IN SENSORY SCIENCE
457
proc means mean n std maxdec=3; var XI x2 x3 x4 x5 x6; titIe"Simu1ated overall liking statistics for 6 Products"; tidellseed = 100000"; run:
Ovelay of Simulabxl Ovrnli Liking into P M N- 16 seed= DOOW
E.S
7.0
7.c
9.0
9s
QO
ns
no
9s
no
ns
uo
u.s
n.0
Mastunas Skin
FIG. 24.7. OVERLAY OF SIMULATED MEAN SCORES (N=25) INTO THE PREDICTION MODEL.
Ovelay of SimulaM Ovasll Likhg into PM: Ns m seed= Ix)WO
6.6
7.0
7.S
BO
96
00
96
IbO
U S
110
US
UO
U.6
MoiShn?B Skill
FIG.24.8. OVERLAY OF SIMULATED MEAN SCORES (N= 100) INTO THE PREDICTION MODEL
B.0
458
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
REFERENCES ANALYTICAL SOFTWARE. 1996. Statistix for Windows. Tallahassee, FL. BIOSYSTEMES. 9, rue des Mardors - F-21560, Couternon, France. CAM0 INC. PO Box 1628, Corvallis, OR 97339. COMPUSENSE Inc. Guelph, Ontario, Canada. GACULA, JR., M.C. (ed.). 1997. Descriptive Sensory Analysis in Practice. Food & Nutrition Press, Trumbull, Conn. GRAPHPAD SOFTWARE. 1995. GPrism. Version. 2.0. San Diego, CA 92121. HARRELL, F.E. and GOLDSTEIN, R. 1997. A survey of microcomputer survival analysis software: The need for an integrated framework. The American Statistician 51, 360-373. NUMBER CRUNCHER STATISTICAL SYSTEM. 1998. NCSS 2000 User’s Guide - I, 11. Kaysville, Utah. MORGAN, W.T. 1998. A review of eight statistics software packages for general use. The American Statistician 52, 70-82. SAS INSTITUTE Inc. 1999. SASKTAT User’s Guide, Version 8. Cary, NC. SPSS. 1998. SYSTAT 8.0. SPSS Inc., Chicago, IL. STAT-EASE, Inc. 1994. Design-Ease. Version 3.0.7. Minneapolis, MN. STAT-EASE, Inc. 1999. Design-Expert. Version 5.0.9. Minneapolis, MN WARD, S.L. and GACULA, JR., M.C. 2000. Performance of the HCE-T TEP human corneal epithelial transepithelial fluorescein permeability assay. Presented at the Alternative Toxicological Methods for the New Millennium, Nov. 28-Dec. 1, 2000, Bethesda, MD.
CHAPTER 25 ADVANCESANDTHEFUTUREOFDATACOLLECTION SYSTEMS IN SENSORY SCIENCE AND& ARBOGAST The primary use of software packages in Sensory analysis is to automate the data collection. The goal is efficiency, but this does not automatically imply restrictions or lack of flexibility. Integrated tools like graphing, statistics, reporting, and database management allow faster processing and thus increased efficiency of sensory work.
Data Collection Sensory analysis and Marketing/Consumer Testing are often done in similar ways and for similar purposes (see discussion in Chap. 4). Consequently, most users of sensory software expect not only to conduct sensory tests within their sensory booths, but also to collect other data elsewhere with the same tool. Sensory tests are well-defined in their standard implementation, and thus the user should be able to set them up easily with the software. However, computerizing data collection should not only automate and make standard tests faster and easier, but should also take advantage of technology to extend the possibilities. The design of the questionnaire, the type and combination of questions should be the scientist’s responsibility, and the software must allow him to collect all kinds of meaningful data in a variety of environments.
Location and Method of Data Collection There are many data collection situations for sensory and consumer tests, depending on the kind of products to be evaluated and the goal of the study: local or remote sensory tests, mall tests, or home use tests for example. To handle these different situations, the data collection tool must be both powerful and flexible. There are two main solutions to collect the data from the respondent: (1) Answer directly on a screen: Computer, Penpad, PDA, or Internet.
(2) Answer on a paper form, and collect the data automatically using a scanner. Data collection on a computer provides great advantages, and wherever possible is the recommended approach. Computerization provides perfect control of the test procedure: order of questions, forced answers. Furthermore, the 459
460
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
results are available immediately when networked computers are used, as it is often the case for networked PCs in sensory booths. Mobility can be provided within a given test site using a wireless network and portable devices like Tablet PCs. Mall tests are done using portable versions of the sensory package running on Laptops or Tablet PCs for example, the results being consolidated when the interviewer comes back to the home site, or via e-mail. Tests and answers can be sent to and recovered from remote sites or data collection agencies via email. Whereas most sensory software suppliersprovide only such fully computerized solutions, we think that in a number of situations paper forms and subsequent scanning may be a very relevant imd even perhaps the best solution with clear advantages. Sometimes the test location does not allow computerization, perhaps because of lack of available space. Paper forms can be used anywhere, by anyone. A paper-based solution implies lower costs, because the only specific hardware needed is a suitable scanner. And in several circumstances there is just no practical alternative to paper questionnaires, for example: (1) For consumer tests involving many respondents simultaneously in a single location, it might prove very difficult for cost and organizational reasons to provide a data input device for everybody. (2) For home use tests, of course, one could argue that tests can be organized over the Internet to ask general questions about a product or a concept. Even so, answering a questionnaire on the PC in another room while preparing a meal in the kitchen or having lunch with the family is just not practical, and paper forms are easier to use in these cases.
There are clearly some drawbacks to paper questionnaires, but proper management in the software makes them very efficient. Questionnaires need to be scanned before having the results available, but this is done automatically. Input errors, such as missing or multiple answers can occur, but need to be handled and reported appropriately by the software, with suitable tools to check the answers and Correct the data by rescanning for example. For tests in remote locations the questionnaires do not need to be shipped, as printable documents (the questionnaires) and the scanned images (the answers) can be sent by e-mail instead. The user should be able to choose one or several data collection alternatives according to his constraints, and still get compatible data and a uniform application interface. Data collection means will evolve as new technological solutions emerge, but we believe that in the foreseeable future, paper-based solutions will coexist with purely computer based solutions.
ADVANCES AND THE FUTURE OF DATA COLLECTION SYSTEMS
46 1
Types of Data Sensory packages already collect various kinds of data. Traditional sensory tests deliver well-defined types of data: scores, ranks, sample chosen, right/wrong answer in difference tests, scores over time for Time-Intensity tests, multiple-choice questions, free answers to open questions. Open-end questions are part of many studies, on screen or paper forms and the software must allow their categorization before the computation. But comments may be impractical or even impossible to collect using keyboard, mouse or pen in specific situations: when working with children, when applying a cream or a shampoo, or for tests within a car, for example. Voice recording by the software would improve data collection in such cases. Speech recognition, when mature and requiring less training, could allow completely voice-driven data input. All these data correspond to answers input consciously by the subject in one way or another. But reactions to a stimulus (the sample, the concept) are not limited to perceptions or reactions that can be directly described by the subject. Answer time and modifications, facial expressions, physiological reactions, or other parameters can be part of the data of interest to study the impact of the stimulus on a subject. For example with personal care products, such as skin cream, various parameters like skin color and shine are measured after the product has been applied, and incorporated with the perceptions collected from the subject (greasiness, moisture of skin) before their computation. These data are generally not collected directly by the software, but sensors could be interfaced to collect such data automatically. Sensors could also be used to measure product parameters at the time of evaluation: color, temperature, etc. Webcams could be used to collect facial expressions, and wearable computers to collect all kinds of data from a consumer in his normal consumption environment.
Test Samples
In sensory analysis sample presentation orders are a key aspect, and the management of those designs is probably one of the most specific characteristics of sensory data collection tools, when comparing to generic form building packages. Presentation designs are an integral part of the data collection process. The data collection package should thus include tools to generate these designs in an appropriate fashion. The system should allow the use of externally generated designs for special or advanced cases. Blind code generation and label printing, when done properly, streamline the sample preparation. Generally, the software simply displays the sample codes and the samples are given directly to the judge. The delivery of the samples to panelists can be controlled in some cases by the software itself displaying the picture of the sample (packages or concepts, for example) or a video, playing the sound
462
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
corresponding to the sample, delivering a product in a controlled amount or composition. The software can also provide feedback to the taster, during training. Questions can also depend on previous answers. In the same way the software could even trigger the modification of a sample composition depending on the judge’s answer, or control the environment itself through virtual reality.
Data Computation (1) Integrated Versus External. To make fast decisions, users need results and graphs immediately after the data collection, or in some cases even during data collection. With integrated tools, the data structure is well-known and graphs and statistical computations can be done directly. When using external packages, there is time and effort needed for the preliminary exportation, importation into the destination package, and definition of the data. Moreover external packages do not always easily allow the implementation of some very useful graphs or computations specific to sensory data. Therefore, for the sensory scientist’s standard requirements, there should be no need for a separate statistical package, and the related costs (licensing, training, maintenance and upgrades) should be avoided. In fact, sensory software buyers often place a lot of emphasis on the statistical and graphical features available in the application, because they do not have equivalent tools available. However, obviously full statistical packages will always provide more possibilities. Therefore, for more advanced needs, there should be full automation capabilities to streamline the exportation process and get the data in the destination software at the click of a button. (2) Reports. Statistical results and graphs generated must be directly usable in reports. This means that they need to be customizable enough and easy to integrate in reports through copying and pasting, or the use of universally recognized formats.
(3) Database. Sensory and consumer tests generate huge amounts of data, and these are often only analyzed at the test level. The software must also allow the integration of these data in a database from which the user can extract the data he needs through queries about judge participation, product testing, and answers collected. Here again internal tools rnake these features available to all users. These data need also to be integrated with the other information available in the enterprise; therefore the data collection software must support the various data formats and tools that are in use. Advances in data mining, reporting, and
ADVANCES AND THE FUTURE OF DATA COLLECTION SYSTEMS
463
statistical tools will allow extracting more and more meaning from this large amount of interconnected information. The Life of the Software Package (1) Support and Feedback. Users need to be sure that they will get dedicated and efficient support, to help them with questions and more unusual aspects of their work. Feedback about new needs and evolutions must be taken in account in an efficient manner, and user groups often foster such communication and feedback, as well as useful contacts among users. (2) Evolutions and Technology Users need and want evolutions in order to make their daily tasks easier, and to be able to apply new techniques when sensory science evolves. Users generally state categorically that only mature technologies should be used, for these are established, and therefore reliable. However, the choice of hardware/operating system is generally out of the control of the software supplier. Computer technology progresses daily, and users of applications are sometimes confronted with new versions of Windows or new restrictions on current programs. This means that they must be sure to get the upgrades needed for these new environments. Users should not have to abandon a package, which is not maintained anymore and cannot be used beyond a given version of Windows, after only a few years of use. Software suppliers should propose maintenance contracts including support and software upgrades with easy access, and reasonable and affordable fees. Sensory and consumer work encompasses many different aspects. We believe that the integration of sufficient standard tools in the package has great benefits for most users. But at the same time users should be able to transition to features available elsewhere, because no package can realistically be exhaustive. All the tools should be designed for the right user level: standard features for the daily tasks must be easily used, advanced possibilities may require more background knowledge. Users of software for sensory and consumer tests should be able to take advantage of the technology to expand their reach into meaningful data. Flexibility, integration and openness are needed for efficient data collection software. Close collaboration between users, researchers, and software developers will drive the evolution of the tools. CHRIS FINDLAY Compusense has been developing computerized sensory systems since 1985. During that time I have had many opportunities to reflect on where we might go
464
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
in the future and what the main factors are that influence our progress. This brief essay is meant to stimulate debate in this field and promote some understanding of the complex subject of development of sensory software. There are three ways that computer technology can shape the future of sensory and consumer research. The first is the progress in software systems that provide more powerful methodology and user-friendly interfaces. The second is through the development of new hardware that increases the capacity and flexibility of the tools for data collection. The third is the most important; it is the unleashing of the imagination of researchers to use these tools to derive entirely novel insights into consumers, products and human senses.
Software Commercial software systems have grown to become powerful and comprehensive research tools. Almost any test imaginable can be created and conducted. The standardization of many sensory tests and the growth of sensory science as an academic subject have helped to create a common vocabulary and a standard practice of sensory tests. Global corporations in particular have benefited from the improved communications that come from the adoption of a common computerized standard. Their operating units can all draw on the same central expertise or share methods and capabilities worldwide. Concurrently, there has been a general shortage of experienced sensory professionals. Sensory software has provided a framework for many novices to gain experience in good sensory practice. Organizations such as the Institute of Food Technologists Sensory Evaluation Division and the ASTM E-18Committee have provided an invaluable forum for the introduction of new methods and a healthy debate of existing ones. On line forums have also provided excellent opportunities for debate on novel practices and challenges in sensory. The boundaries between sensory analysis, consumer testing and marketing research are frequently being broken down in the field of data collection and analysis. The same software provides a continuum of tools that can be used across the entire product and consumer-testing spectrum. This can result in much easier correlation of analytical sensory results with consumer preference data and market segmentation. Rapid fielding of research and equally fast analysis is providing timely answers to just-in-time development schedules. In a world of dynamic markets, fast response is a key to success. By eliminating paper data collection, the computerized system takes out the “middleman. Direct data entry by the respondent removes ambiguity of marks and scribbles and missing data points. Scanning technology has always been fraught with difficulties and there’s always the demand to move and manage paper. Computerized systems permit complex and highly efficient experimental designs that would be almost impossible to keep track of manually. Sensory
ADVANCES AND THE FUTURE OF DATA COLLECTION SYSTEMS
465
software has also incorporated and distributed new methods quickly to hundreds of labs worldwide, keeping end-users abreast of changes in the science of sensory analysis. There are some tests, such as time-intensity measurements, that are so awkward to perform in any other way that computerized data collection is the only practical method. Elaborate conditional branching techniques have given researchers tools for consumer segmentation and dynamic and responsive research methods. As new object-oriented and internet-based operating systems become available we will see unprecedented flexibility in test design and in operation. The tools are still new, so it will be a while before we see these widely distributed. Progress in the development of new software for sensory research is influenced by the changes taking place in the computing industry. It is a fact that Microsoft@dominates the software world. Developers of programs, large and small, are all subject to the changes that Microsoft introduce in their Windows" operating systems software and their major applications like
[email protected] of the software changes that we have to make are to keep pace with the introduction of new Microsoft systems. Since January of 1986, Compusense has released 16 major software upgrades. In the last five years, the rate has increased to an average of two per year. That grace period has now dropped to less than 6 months. Keeping up with the operating systems is always a problem, and sensory analysis software is no different. If one purchases a new computer, it will arrive with the most recent Windows operating system installed. If one wants an older version of Windows there is often a premium to pay for this older version to be installed. Consequently, any sensory software developer must make the products work on the newest systems. A great deal of effort must be expended on programming just to permit existing software to operate on the newest operating systems. Software users everywhere frequently have to upgrade their hardware just to accommodate the new systems. This provides little benefit to the sensory software technology provider or the researcher who is the end-user.
Hardware If we believe what we see in the movies or in television commercials, we have already arrived in a world of powerful and friendly computing. We live in a world where so much is possible, but at the same time it can be so complicated that it is beyond the reach of the average researcher and typical funding levels. Anyone who has attempted to create a comprehensive experiment of products and audiovisual materials for a multicultural study learns the limitations that we are frequently up against. The topic of limitations is a good point to start discussing the progress in hardware. A question that is frequently asked is
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
466
“What is an ideal device for data collection”? This has led me to the following list of the features that are most desirable (Table 25.1). TABLE 25.1. FEATURES OF AN IDEAL DATA COLLECTION COMPUTER Lightweight, durable and compact Large pen-based or touch screen for good visibility and ease of use Long-life battery operated and portable 4.
Difficult to steal or lose
5.
Wireless
6.
Multimedia capable
7.
Record audio and video responses
8.
Using a widely distributed operating system
9.
Internet accessible
~
10. Universally available and inexpensive
This list inevitably leads us to the conflicts and trade-offs between features. Today’s inexpensive devices are limited in their screen size and operating systems. The machines that are durable are heavy and/or expensive. The very portability of some units makes them a target for theft or loss. Simplified operating systems, such as the software on PIIAs, cannot handle a full range of test types. Security issues require careful planning for implementation of wireless and Internet communication. Operating systems keep changing, limiting the practical lifetime of dedicated computers. The rate of change in computing devices is staggering. Regardless of the problems of our current supply, there will be tablet computers available in the near future that will meet all of these requirements and provide even more capability, like GPS and video camera facility and audio recording to confirm location and identity of respondents. Broadband digital communications will be able to make any computer part of a Virtual Private Network anywhere on the planet. Exactly when this will all come to pass is uncertain, but it will happen. When it does, any test may involve participants from around the planet working together in real time. In the meantime it is important to remember that the lifetime of any technology is limited. If you wait until the ideal technology emerges, you may never get started. The price of entry into this world of rapid data collection and analysis has dropped somewhat over the last two decades, but it still requires a significant investment to get going and maintenance to keep current. However, the benefits that have been derived by the practitioners who
ADVANCES AND THE FUTURE OF DATA COLLECTION SYSTEMS
467
have taken this step are considerable. The speed of data collection analysis and reporting has been a tangible benefit that brings sensory and consumer data into operational decision-making.
Imagination The real opportunity for future sensory research is in using the capability of the machines and new software to capture information that is currently very difficult. Since we do not know what the future will hold, I will try to pose some questions that we may be able to answer through computer technology in the years to come.
Is there a relation between the amount of time a discrimination panelist takes to make a choice and their proficiency in detecting a true difference? Is decisiveness a useful measurable quantity? b. We have the capacity to capture audio and video responses of consumers as they sample product. Can we apply image analysis that gives us a true direct measure of liking without relying on verbal or written response? Will it be feasible to conduct tests entirely with visual and audible questions and recorded responses? c. Can we provide a User Interface that is more intuitive and objective, improving the ease of use and objectivity of the results? d. Will we be able to measure multiple attributes as they change over time? The dynamic changes that we may be able to measure will provide new insights into novel products that evolve and transform as we interact with them. e. Will speech recognition develop to a point where we can conduct meaningful interviews with consumers and analyze the results? f. Will we be able to create sensory training kits that allow an interested consumer to sit in the comfort of their home and evaluate their own sensory acuity? Moreover, will we then be able to use this approach to select and train panelists for descriptive analysis work? a.
I believe that the answer to all of these questions will be "Yes!" However, none of these possibilities will be realized without the software and hardware designed to do the job. Research and innovation are required to make progress in these areas. This will be the challenge to which our colleges and universities must rise.
468
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
PAUL LICHTMAN While automated computer systems have enabled a dramatic increase in the amount and ease of collection of sensory data since the days of paper and pencil, there is one word to describe the future: Internet. Just about any statistics show the dramatic growth of Internet use. Its application to sensory analysis should not be an exception. However, the data collectors need to be prepared to deal with several issues to ensure that the potential gains are realized: ( 1) Internet Communication and Device Configuration Standards
(2) Software Application Conversion Options
(3) Privacy on the Internet.
Let’s examine each of these issues in turn.
W h y the Internet? What are the advantages of using the Internet to collect sensory analysis data? For one, product evaluators need have no special software other than an Internet browser to take a sensory test. No longer do technical specialists have to configure hardware and software before a software program will operate properly. If a person can browse the Internet, the person can become a sensory panelist. Since the Internet is so accessible, sensory tests can incorporate respondents from any part of the world. No longer do panelists need to gather in one place in order to take a particular test. Whereas it is likely that tests of that type will still continue to be given, it may still be advantageous to have the tests conducted on the Internet so that the data is located centrally. Moreover, it may still be easier than configuring hardware and software for the test stations. The cost of administering tests can often be much lower with the Internet. The researcher may not have to telephone nor meet with people in order to ask them questions. One need not print questionnaires, nor pay postage for their return. Response data can be automatically entered to the database. One can notify participants via e-mail when and how to take the test. Or, if one accepts anonymous participation, then panelists can enter directly from the web-site. Internet Standards “OK, I see that the Internet is the way to go,” one might say. “What else do I have to know? What other developments are on the horizon?” Well, for one thing, take a look at how people are connecting to the Internet. Whereas many people connect from home with a 56K modem, others use cable modems, DSL,
ADVANCES AND THE FUTURE OF DATA COLLECTION SYSTEMS
469
and other connections that allow much faster transmission. As connections get faster and faster it becomes easier to send both video and audio feeds via the Internet. Moreover, Internet devices are getting smaller and smaller. Many of them are wireless and handheld. The consequence is that a researcher can take an Internet connection device anywhere to conduct a test. One new Internet access device is designed for the kitchen. It can play CD’s, show TV, and provide Web access. It even has a washable keyboard. However, at present it appears that devices like this may not be quite ready for universal access, and thereby lies the potential for caution. This particular product requires a unique Internet Service Provider (ISP). Also, it may not integrate with an existing e-mail account. It may not be able to take advantage of popular browser plug-in programs that enable audio, video, and animation to be received over the Internet. Within a few years it is reasonable to expect that wireless, hand-held Internet connection devices will become fairly commonplace in many homes and offices. As we have seen in other areas of communication, certain technologies usually emerge as standards. It is important to ensure that the data collection system can adapt to communication and data storage methods as Internet technology evolves. Moreover, if the sensory analysis questionnaire involves audio and video, then panelists should have broadband access to the Internet with reliable interactivity.
Converting Existing Applications How does the sensory professional manage the conversion of an existing application in sensory data collection from an internal network application in order to enable Web connectivity? Here are some issues to consider: (1) (2) (3) (4) (5) (6) (7)
Ease of installation, configuration, and deployment Data security Ease of use Ease of application customization and maintenance Implementation time Technical support - availability, type, and cost Total cost of deployment.
A customized solution, i.e., a complete rewrite of the application for the Web, will usually provide the best and most complete conversion of application features. It will be able to tailor data access to the groups that need it and exclude it from all others. Unfortunately, the implementation time will be higher than just about any other solution, as will the total cost of deployment.
470
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
An alternative to customization is the Web-to-host” solution, supplied by a third party. Simply, Web-to-host is a terminal emulation package made for an Internet browser. The application logic remains on the host server, while the Web is used to provide the information to the user. Web-to-host solutions are usually easy to install, have low implementation time, and cost much less than customization. However, data security issues may require that additional business logic be added to the Web server so access is limited to the appropriate audience. Also, the application will often appear new to the user, and require some additional training. Even if the sensory professional recognizes that the Web-to-host solution will not fulfill all the business needs of the application, it may still be a good interim solution for host access. Expect Web-to-host solutions to be prominent for several more years.
Privacy on the Internet Without doubt the collection of sensory data on the Internet will often include sensitive information. Research participants will often require assurance that their profiles and taste preferences not be shared with any other organizations. Yet many other test panelists, if given the opportunity, would permit the sharing of their private data in return fox incentives, monetary or other. Targeted Internet advertising has become a big business. But privacy advocates are hunting for information disclosure violations, and pushing legislation that would take consumer involvement to a whole new level. Clearly this controversy is only going to heat up. Meanwhile, sensory researchers will need to establish policies on this issue. Whereas there may be significant profit possibilities available for the sale or exchange of data, sensory testing firms will probably need to examine their missions and make some distinct choices. They should also remain flexible enough to be able to adapt to changes in public perceptions and attitudes. Regardless of the outcome, sensory analysis data is and will continue to be valuable information. Its custodians will always need to nurture it and protect it closely.
INDEX Absolute intensity scale, 156 ABX method, 408 Advanced statistics applications, 18, 361 multicollinearityproblems, 28-29, 361, 370 parsimony, 361
in-house/employee panel, 209210, 222, 229 outside agency recruitment, 224 random sample, 211-212, 229 sensory preference segmentation, 216-2 19 users vs. non-users, 213, 225 Church panel and bias, 78, 210-21 1 Claims substantiation, 290-291 Computer systems hardware configuration, 468-469 internet, 468-470 Concept of preference segmentation, 34 Consumer-descriptive data relationships exploratory data analysis, 370 factor scores in modeling, 361 importance and goals, 363-366 information from data relationships, 364-366 multicollinearity, 361, 372 problems in exclusive use of consumer data, 366-368 polynomial regression, 371-372 remedial measures and needs, 372-374 R-R model, 360 S-R model, 359 Consumer fatigue, 194 Consumer privacy, 470 Consumer rating product attributes asking overall liking, 185-187, 189, 192 defining sensory attributes, 173176
Basic tastes, 337 Beta-binomial, 412 Binomial, 412 Biases due to changing market conditions benchmarking, 236, 239-240 changes in consumer wants and needs, and in test methodology, 236-23 8 changes in product characteristics, 234-236 effect on sensory scientist, 231232 levels of changes, 233-234 BIBD with repeated control, 143, 348 Carry-over effects, 141-142 Categorical, 392 Central location tests, 16 Character impact compounds, 360 Characteristics of developed lexicon, 326-329 Choice of consumer population church panel, 210-21 1 consumer pool, 223 diverse recruitment criteria, 22022 1 geographical differences, 2 14
47 I
412
VIEWPOINTS AND CONTROVERSIFS IN SENSORY SCIENCE
how many attributes per session, 178 proper scale, 178-181, 188 quantifying attributes, 177 standardized scale for intensity, 181-183 Consumer satisfaction, 430 Consumer test scales children’s scales, 153 distance between scale categories, 166-169 hedonic scales, 145-146, 149-154, 162 deviations from traditional hedonic scales, 149-153 intensity scales, 154-159 just-about-right scale, 147-149, 156-159, 170 purchase intent scales, 145, 159, 170 scale categories, 149-151, 159, 163- 164 word anchors, 151-152, 159 Contour plots, 428 Cross-cultural studies, 33, 39-4 1 d’, 394, 396, 409, 410 Databases, 462 Data collection computerized, 464-466 hand-held device, 469 internet, 468-470 location and method, 459-460 mobile, 460 off-site, 460 on-screen, 459 paper-based and scanning, 460 security, 469-470 sensory and consumer data, 459 wireless, 469 Data mining, 218-219, 229, 462
Descriptive analysis descriptive-consumer data relationships, 359-374 descriptive/expert panels compared to consumers, 109- 123 methods Flavor, Texture and Modified Profile 11, 53, 62, 109, 255, 318-319, 322, 331, 343-345, 353 Free Choice Profile, 317-318 QuantitativeDescriptive Analysis (QDA@),11, 53, 62, 109, 318, 345, 352, 353 Quantitative Flavor Profile (QFP), 318, 322 Spectrum, 53, 109 myths, 60-62 panel variability 379, 382 references needed in descriptive analysis, 337-349 training factors affecting training time, 351-358 language development, 3 13-336 value to decode consumer data, 113-1 14 Decision boundaries, 400 Decision rules, 395 Degree of difference, 392 6 values, 396, 404 Descriptive panels versus consumers agreement of both panels, 119-122 expert and trained panelists defined, 114, 118 experts more sensitive, 110- 111, 173 experts versus consumers, 173174
INDEX
intellectual history, 109 need for both panels, 112-1 14 role of each panel, 115- 116 value of decoding consumers through descriptive data, 113-114 Deterministic MDS, 418-419 Discrimination testing, 407 Drivers of Liking@,5 Dual pair, 392 Dual users test, 225-226 Duo-trio, 393 Equal interval scale, 399 Expert taste testing, 13 Factor scores, 361 Flavor Profile Method, 53, 109, 255, 319, 331 Food action rating scale, 154 Food Quality and Preference, 1, 11, 107 Foundations, 391 Four D’s of sensory science abuses in discrimination testing, 284-287 analysis of similarity testing, 292293 attributediscrimination tests, 281283 degree of difference, 268, 274, 289-290 difference testing, 267, 271, 272276 discrimination testing or similarity, 276-280 dissimilarity and distance, 269-270 Null Hypothesis in similarity testing, 290-292
473
options for the selection of discrimination panelists, 278-280 simulation in similarity testing, 294-296 Free-Choice Profiling, 3 17-3 18 Gridgeman’s paradox, 394 Guessing Model, 394 Hedonic scale affective scales, 145, 205 controversial variations, 150- 151 hedonic line scale, 153 9-point myth, 67, 107 Hierarchical Thurstonian models, 409 Home use test, 16 Ideal, 423, 426 Indirect measurements, 52 Individual ideal point maps, 428 Individual ideals, 425 Information, 392 In-house panel bias, 210 International sensory science cross-cultural consumer studies, 39-4 1 global consumer responses, 4 1-43 lack of sophistication, 3 1, 45 scale differences and language translation, 32-34, 45 sensory preference segmentation, 34-36 transnational collaboration, 38-4 1 Internet connection device, 469 data collection, 468-470 privacy, 470 sensory science and the internet, 15-17
414
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
web-to-host solution, 470 wireless, 469 Internet Service Provider (ISP), 469 Invariance of scale categories, 164-166 Jackknife technique, 445-446 Joint space, 425 Journal of Sensory Studies, 1, 11, 107 Journal of Texture Studies, 11 Language development in descriptive analysis aspects common to descriptive analysis, 331 characteristics of developed lexicon, 326-329 development of individual vs. common lexicons, 317-321 maintenance of sensory concept, 315 median bar chart, 333-335 process of lexicon formation, 323326 qualitative and quantitative components of descriptive analysis, 315-316, 340-344 sensory concept and sensory attribute, 3 13-3 14, 332 types of common lexicons, 319323 use of quality control, 332-333 Language taught to descriptive panel, 320 Liking ratings, 427 m-alternative forced choice, 407 Magnitude estimation scale, 181, 188
Mapping, 419 Marketing research and sensory science contrasting R&D and marketing approaches, 92-93 data analysis and expense, 91, 468-470 objectives, project stages, samples and questionnaires, 89-9 1 R&D versus market research, 8287 sensory and marketing research, 17-18 sensory science in consumer testing, 77-78 turf battles, 78-82, 87-89 Maximum likelihood, 395, 400, 409 Median bar chart, 333 Model, 391 Multiple samples, 407 Overdispersion, 413 Pangborn, Rose Marie, 1-4, 72 Panelist sensitivity, 49 Paper ballots, 459-460 Parameter, 392 Perceptual standard deviations, 395, 402, 407 variability, 395 variance, 405 Power, 396, 403 Power of the test, 245, 248, 296 Preference, 420 Preference map, 424 Preferential choice, 392 Privacy, 470 Probabilistic, 392-393, 407 multidimensional scaling, 416, 42 1
INDEX
Probabilistic multivariate preference mapping, 420 Product and panelist variability consumer variability, 382 minimizing variability, 382-389 panelist variability, 377-379, 382 partitioning of variability sources, 379, 386-388 product variability, 375-376,37938 1 simulation and bootstrapping, 388-390 Product cuttings, 50, 130 Product distributions, 401 Psychological and physical scales, 167- 168 Psychometric functions, 394, 396 Psychophysics and sensory science concept evolution, 104-105 sensory methods, 103 Qualitative research applications, 260 ethnography, 256 focus groups, 255-258, 261, 263 linkage between qualitative and quantitative methods, 264 projective and elicitation techniques, 259 seven dimensions of words/ phrases, 265 use and misuse, 259-261 Quality control chart, 295 Quantitative Descriptive Analysis, 53, 62, 109, 318 Quantitative Flavor Profile, 3 18 Questionnaire design components of measuring attitude, 205 consumer fatigue, 194-195 length of questionnaire, 193-194, 198, 206
475
no preference category, 200-202, 207 open-ended questions, 202, 461 order of attribute questions, 197206 position of overall liking, 195196, 205 preference and liking questions, 199-200 rules of thumb, 191-192 Rating means, 400 Rating variances, 400 Ratings, 392, 398 References in descriptive analysis do panelists use the same terms, 338 psychological history, 337 qualitative references, 340-343, 347-348 quantitative references, 343-346, 347-348 requirements, caveats and controversies, 344-345 value of reference samples, 339348 Repertory Grid Method, 331 Replication in sensory and consumer testing amount of information, 308-3 10 consistency and mean convergence, 299 panelist performance, 302-304, 308 paradox of replication, 300 product variability, 304-305 reasons for, 299, 302, 304, 306 reliability and validity, 300-301, 306 statistical model, 306-308 testing, 412 Retasting, 403
476
VIEWPOINTS AND CONTROVERSIES IN SENSORY SCIENCE
Reverse engineering, 361 Role of sensory science fads and beliefs, 8-10 global sensory science, 14, 28, 31 hazard and remedies, 5 learning about sensory properties, 20 new methods/applications, 11-13 partnership with Marketing/ Marketing Research, 17 sensory education, 7-8, 20 sensory methods for non-food products, 21 sensory professional, 6, 13, 27 sensory science and internet, 1516,28,241,465,468-470 sensory science and psychophysics, 103-106 trends, 2-5 Same-different, 392, 419, 421 Sample issues in consumer testing blocking vs. randomization, 131133, 141-142 creating samples in laboratory, 126 distribution chain samples, 128 lot and code uniformity, 139 number of samples per session, 128-131, 137-139 screening samples prior to using, 136- 137 selecting samples, 125-126, 135137 sensory professionals’ involvement, 135-136 shelf-life samples, 141 training samples, 133-134, 143 S A P programming language computer simulation, 452-457 group comparison, 439-440 input format, 434, 449
nonlinear regression, 440-445 paired comparison, 435-438, 450 polynomial regression, 447-449 Sample size or base size calculation of sample size, 247248 computer simulation of sample size, 249-253 cost consideration, 24 1, 244-245 information from first judgment, 242 sample size, 397 sample size myths, 64-65, 85-86 SAS code for simulation, 254 statistical considerations, 245-246, 248-249 suppressing noise vs. averaging noise, 243, 253 Scale means, 400 Scales, 398 Semi-technical lexicon, 321 Sensory methods educational needs, 20-21 future challenges, 467 modified when resources are limited, 19 new methods, 11 Sensory mythology choice of sensory methods, 57-60, 73 choice of significance level, 59, 73 consumer studies and research practices, 64-68 controlled conditions, 50-51, 63, 73 descriptive panel data, 61-62, 66 experts versus consumers, 51-53 number of samples for evaluation, 49, 63, 72 order of sample presentation, 67 role of statistics, 53-54
INDEX
Sensory proficiency and accreditation, 14-15, 28 Sensory science, 1, 10, 13, 21, 27, 391 Sensory science and marketing research. See Marketing research and sensory science Sequence effects, 398 Similarity, 416, 420 Software configuration, 468-469 conversion, 469 data collection systems, 19, 459463, 464, 468-470 integrated data analysis, 462 issues in upgrading, 465 SAS@programminglanguage, 433 support and feedback, 462 Spectrum Descriptive Analysis Method, 53, 109 Statistics, 391 Stochastic transitivity, 422 Sweetness, 399 Technical language, 322 Texture Profile Method, 318, 331, 344 3-AFC, 397 Thurstonian, 393, 402, 407 Torgerson’s method of triads, 408 Total quality, 45 Training samples, 133-134
477
Training time in descriptive analysis factors affecting training time, 352-354 impact of training, 351 length of training time, 352-354, 357 level of training, 354 Transformation of hedonic responses, 46-47 Transmitted information, 163 Triangular method, 397 2-AFC, 392, 404, 406 2-Alternative Forced Choice, 392 Type I and Type I1 errors, 59, 73, 248, 280-281, 295-296 Unfolding, 420 Universal panel, 237-238, 353 U.S. Army Natick Laboratories, 106 U.S. Army Quartermaster Corps, 106, 145 Validity and reliability construct validity, 98-99, 101 face validity, 98, 100 predictive validity, 99, 101 Variability, 392 Web-to-host solution, 470 Wine tasting, 110 Word meaning analysis, 94