COMMUNICATIONS ACM cACM.acm.org
OF THE
04/2011 VOL.54 NO.4
Design Principles For Visual Communication Crowdsourcing Systems On The World-Wide Web Successful Strategies For IPv6 Rollouts. Really. Why STM Can Be More Than A Research Toy Engineering Sensation In Artificial Limbs
Association for Computing Machinery
The 2012 ACM Conference on
Computer Supported Cooperative Work
)( 2 0 1 2
February 11-15, 2012 | Seattle, Washington
Call for Submissions CSCW is an international and interdisciplinary conference on CS technical and social aspects of communication, collaboration, and coordination. The conference addresses the design and use of technologies that affect groups, organizations, and communities. CSCW topics continue to expand as we increasingly use technologies to live, work and play with others. This year we have adopted a new review process for papers and notes intended to increase their diversity and quality. The submission deadline is early to avoid conflicting with the CHI 2012 deadline. This enables us to include a revision cycle: Qualifying authors can revise their submissions after the initial reviews. For more details about CSCW 2012, please review the Call for Participation on our conference homepage, www.cscw2012.org. To learn about future details as they become available, please follow us on Twitter @ACM_CSCW2012.
C S C W
Submission Deadlines Papers & Notes 3 June 2011 Workshops 28 July 2011 Panels Interactive Posters Demonstrations CSCW Horizon Videos Student Volunteers 9 September 2011
Conference Chairs: Steve Poltrock, Carla Simone Papers and Notes Chairs: Gloria Mark, John Riedl, Jonathan Grudin
Doctoral Colloquium 16 October 2011
Also consider attending WSDM (wsdm2012.org) immediately before CSCW 2012. Sponsored by
http://www.cscw2012.org
ACM Multimedia 2011 Scottsdale, Arizona, USA November 28-December 1, 2011
http://www.acmmm11.org ACM Multimedia 2011, the worldwide premier multimedia conference, solicits submissions in 10 broad areas, including:
multi-modal integration and understanding in the imperfect world human, social, and educational aspects of multimedia media analysis and search scalability in media processing, analysis, and applications multimedia systems and middleware media transport and sharing multimedia security media authoring and production location-based and mobile multimedia arts and contemporary digital culture
Important Dates: Apr 11, 2011: Full paper submission Apr 11, 2011: Short paper submission Apr 11, 2011: Panels, Tutorials May 09, 2011: Technical demos, Open source software Jun 06, 2011: Multimedia grand challenge Nov 28, 2011: Conference starting date
General Chairs K. Selçuk Candan (Arizona State U.) Sethuraman Panchanathan (Arizona State U.) Balakrishnan Prabhakaran (U. Texas at Dallas) Get short, timely messages from ACM Multimedia 2011. Join Twitter and follow @acmmm11. Acmmm11 is on Facebook. Sign up for Facebook to connect with Acmmm11.
communications of the acm Departments 5
News
Education Policy Committee Letter
24 Emerging Markets
Educating Computing’s Next Generation By Robert B. Schnabel 8
In the Virtual Extension
9
Letters To The Editor
Viewpoints Managing Global IT Teams: Considering Cultural Dynamics Successful global IT team managers combine general distributed team management skills enhanced with cultural sensitivity. By Fred Niederman and Felix B. Tan
I Want a Personal Information Pod 28 Historical Reflections
Building Castles in the Air Reflections on recruiting and training programmers during the early period of computing. By Nathan Ensmenger
10 BLOG@CACM
Matters of Design, Part II Jason Hong discusses how Apple creates well-designed products and what the human-computer interaction community can learn from its methods.
13 The Quest for Randomness
It’s not easy to generate a string of numbers that lack any pattern or rule, or even to define exactly what randomness means. By Gary Anthes
12 CACM Online
ACM on the Move By Scott E. Delman 27 Calendar
Platform Wars Come to Social Media The world can absorb more social media sites, but how many? By Michael A. Cusumano 34 Kode Vicious
16 Engineering Sensation
Last Byte 120 Q&A
The Chief Computer Kelly Gotlieb recalls the early days of computer science in Canada. By Leah Hoffmann
About the Cover: The art of exploded-view illustrations explore the pieces and cuts and shape of the missing geometry and spatial relationships among parts. This month’s cover story traces ways to identify and analyze design principles for creating more powerful and effective visualizations.
in Artificial Limbs Advancements in mobile electronics have led to several prosthetics innovations in recent years, but providing reliable touch sensations to users remains an elusive goal. By Kirk L. Kroeker 19 Social Games, Virtual Goods
The popularity of virtual goods and currencies in online gaming is changing how people think and act about money. By Samuel Greengard
36 Viewpoint
Asymmetries and Shortages of the Network Neutrality Principle What could neutrality achieve? By José Luis Gómez-Barroso and Claudio Feijóo Reaching Future Computer Scientists
Think teachers, not students. By Patricia Morreale and David Joiner
23 British Computer Scientists Reboot
After a year of turmoil, computer scientists at King’s College London have retained their jobs, but substantial challenges lie ahead. By Sarah Underwood
Cover illustration by Leonello Calvetti
communi cations o f the ac m
Coder’s Block What does it take to clear the blockage? By George V. Neville-Neil
| april 2 0 1 1 | vol. 5 4 | no. 4
Illustratio n by w eknow/sh ut t erstock
116 Careers
2
31 Technology Strategy and Management
04/2011 vol. 54 no. 04
Practice
Contributed Articles
Review Articles
60 Design Principles for
86 Crowdsourcing Systems
Visual Communication How to identify, instantiate, and evaluate domain-specific design principles for creating more effective visualizations. By Maneesh Agrawala, Wilmot Li, and Floraine Berthouzoz
on the World-Wide Web The practice of crowdsourcing is transforming the Web and giving rise to a new field. By AnHai Doan, Raghu Ramakrishnan, and Alon Y. Halevy Achievements and Challenges
70 Why STM Can Be More
38 Returning Control to
the Programmer: SIMD Intrinsics for Virtual Machines Exposing SIMD units within interpreted languages could simplify programs and unleash floods of untapped processor power. By Jonathan Parri, Daniel Shapiro, Miodrag Bolic, and Voicu Groza 44 Successful Strategies
for IPv6 Rollouts. Really. Knowing where to begin is half the battle. By Thomas A. Limoncelli, With an introduction by Vinton G. Cerf 49 A Co-Relational Model of Data
for Large Shared Data Banks Contrary to popular belief, SQL and noSQL are really just two sides of the same coin. By Erik Meijer and Gavin Bierman
Illustratio n by A nd rew Cl ark
Articles’ development led by queue.acm.org
Than A Research Toy Despite earlier claims, Software Transactional Memory outperforms sequential code. By Aleksandar Dragojevic´, Pascal Felber, Vincent Gramoli, and Rachid Guerraoui
in Software Reverse Engineering Deeply understanding the intricacies of software must always come before any considerations for modifying it. By Gerardo Canfora, Massimiliano Di Penta, and Luigi Cerulo
Research Highlights
78 Reflecting on the DARPA
Red Balloon Challenge Finding 10 balloons across the U.S. illustrates how the Internet has changed the way we solve highly distributed problems. By John C. Tang, Manuel Cebrian, Nicklaus A. Giacobe, Hyun-Woo Kim, Taemie Kim, and Douglas “Beaker” Wickert Emergency! Web 2.0 to the Rescue!
Emergent serendipity fosters volunteerism driven by creative problem solving, not simply following directions. By Ann Majchrzak and Philip H.B. More A Research Doctorate
for Computing Professionals Looking back on the first decade of the Doctor of Professional Studies in Computing—an ambitious doctoral track for people who want to do research in an industrial setting. By Fred Grossman, Charles Tappert, Joe Bergin, and Susan M. Merritt
98 Technical Perspective
Liability Issues in Software Engineering By Daniel M. Berry 99 Liability Issues in
Software Engineering: The use of formal methods to reduce legal uncertainties By Daniel Le Métayer, Manuel Maarek, Eduardo Mazza, Marie-Laure Potet, Stéphane Frénot, Valérie Viet Triem Tong, Nicolas Craipeau, and Ronan Hardouin
107 Technical Perspective
Patterns Hidden from Simple Algorithms By Madhu Sudan 108 Poly-logarithmic Independence
Fools Bounded-Depth Boolean Circuits By Mark Braverman
Association for Computing Machinery Advancing Computing as a Science & Profession
april 2 0 1 1 | vol. 5 4 | n o. 4 | c o m m u n i c at i o n s of the acm
3
communications of the acm Trusted insights for computing’s leading professionals.
Communications of the ACM is the leading monthly print and online magazine for the computing and information technology fields. Communications is recognized as the most trusted and knowledgeable source of industry information for today’s computing professional. Communications brings its readership in-depth coverage of emerging areas of computer science, new trends in information technology, and practical applications. Industry leaders use Communications as a platform to present and debate various technology implications, public policies, engineering challenges, and market trends. The prestige and unmatched reputation that Communications of the ACM enjoys today is built upon a 50-year commitment to high-quality editorial content and a steadfast dedication to advancing the arts, sciences, and applications of information technology.
Scott E. Delman
[email protected]
Moshe Y. Vardi
[email protected]
Executive Editor Diane Crawford Managing Editor Thomas E. Lambert Senior Editor Andrew Rosenbloom Senior Editor/News Jack Rosenberger Web Editor David Roman Editorial Assistant Zarina Strakhan Rights and Permissions Deborah Cotton
News
Columnists Alok Aggarwal; Phillip G. Armour; Martin Campbell-Kelly; Michael Cusumano; Peter J. Denning; Shane Greenstein; Mark Guzdial; Peter Harsha; Leah Hoffmann; Mari Sako; Pamela Samuelson; Gene Spafford; Cameron Wilson
Publicatio ns Boar d Co-Chairs Ronald F. Boisvert; Jack Davidson Board Members Nikil Dutt; Carol Hutchins; Joseph A. Konstan; Ee-Peng Lim; Catherine McGeoch; M. Tamer Ozsu; Holly Rushmeier; Vincent Shen; Mary Lou Soffa
C on tact P oi n ts Copyright permission
[email protected] Calendar items
[email protected] Change of address
[email protected] Letters to the Editor
[email protected] W eb S I T E http://cacm.acm.org Au th or G u id elin es http://cacm.acm.org/guidelines A dvertisin g
ACM U.S. Public Policy Office Cameron Wilson, Director 1828 L Street, N.W., Suite 800 Washington, DC 20036 USA T (202) 659-9711; F (202) 667-1066
ACM Advertisi n g Department
Computer Science Teachers Association Chris Stephenson Executive Director 2 Penn Plaza, Suite 701 New York, NY 10121-0701 USA T (800) 401-1799; F (541) 687-1840
2 Penn Plaza, Suite 701, New York, NY 10121-0701 T (212) 869-7440 F (212) 869-0481 Director of Media Sales Jennifer Ruzicka
[email protected] Media Kit
[email protected]
Viewpoints
Co-chairs Susanne E. Hambrusch; John Leslie King; J Strother Moore Board Members P. Anandan; William Aspray; Stefan Bechtold; Judith Bishop; Stuart I. Feldman; Peter Freeman; Seymour Goodman; Shane Greenstein; Mark Guzdial; Richard Heeks; Rachelle Hollander; Richard Ladner; Susan Landau; Carlos Jose Pereira de Lucena; Beng Chin Ooi; Loren Terveen P ractice
Chair Stephen Bourne Board Members Eric Allman; Charles Beeler; David J. Brown; Bryan Cantrill; Terry Coatta; Stuart Feldman; Benjamin Fried; Pat Hanrahan; Marshall Kirk McKusick; Erik Meijer; George Neville-Neil; Theo Schlossnagle; Jim Waldo The Practice section of the CACM Editorial Board also serves as . the Editorial Board of C on tributed Articles
Co-chairs Al Aho and Georg Gottlob Board Members Yannis Bakos; Elisa Bertino; Gilles Brassard; Alan Bundy; Peter Buneman; Andrew Chien; Peter Druschel; Blake Ives; James Larus; Igor Markov; Gail C. Murphy; Shree Nayar; Lionel M. Ni; Sriram Rajamani; Marie-Christine Rousset; Avi Rubin; Fred B. Schneider; Abigail Sellen; Ron Shamir; Marc Snir; Larry Snyder; Veda Storey; Manuela Veloso; Michael Vitale; Wolfgang Wahlster; Andy Chi-Chih Yao Research High lights
Co-chairs David A. Patterson and Stuart J. Russell Board Members Martin Abadi; Stuart K. Card; Jon Crowcroft; Shafi Goldwasser; Monika Henzinger; Maurice Herlihy; Dan Huttenlocher; Norm Jouppi; Andrew B. Kahng; Gregory Morrisett; Michael Reiter; Mendel Rosenblum; Ronitt Rubinfeld; David Salesin; Lawrence K. Saul; Guy Steele, Jr.; Madhu Sudan; Gerhard Weikum; Alexander L. Wolf; Margaret H. Wright
Subscriptions An annual subscription cost is included in ACM member dues of $99 ($40 of which is allocated to a subscription to Communications); for students, cost is included in $42 dues ($20 of which is allocated to a Communications subscription). A nonmember annual subscription is $100. ACM Media Advertising Policy Communications of the ACM and other ACM Media publications accept advertising in both print and electronic formats. All advertising in ACM Media publications is at the discretion of ACM and is intended to provide financial support for the various activities and services for ACM members. Current Advertising Rates can be found by visiting http://www.acm-media.org or by contacting ACM Media Sales at (212) 626-0654. Single Copies Single copies of Communications of the ACM are available for purchase. Please contact
[email protected]. Comm unicatio n s of the ACM (ISSN 0001-0782) is published monthly by ACM Media, 2 Penn Plaza, Suite 701, New York, NY 10121-0701. Periodicals postage paid at New York, NY 10001, and other mailing offices. POSTMASTER Please send address changes to Communications of the ACM 2 Penn Plaza, Suite 701 New York, NY 10121-0701 USA
W eb
communi cations o f the ac m
| april 2 0 1 1 | vol. 5 4 | no. 4
REC
Y
E
Printed in the U.S.A.
S
I
4
SE
CL
A
TH
Co-chairs James Landay and Greg Linden Board Members Gene Golovchinsky; Marti Hearst; Jason I. Hong; Jeff Johnson; Wendy E. MacKay
Association for Computing Machinery (ACM) 2 Penn Plaza, Suite 701 New York, NY 10121-0701 USA T (212) 869-7440; F (212) 869-0481
For other copying of articles that carry a code at the bottom of the first or last page or screen display, copying is permitted provided that the per-copy fee indicated in the code is paid through the Copyright Clearance Center; www.copyright.com.
NE
Art Director Andrij Borys Associate Art Director Alicia Kubista Assistant Art Directors Mia Angelica Balaquiot Brian Greenberg Production Manager Lynn D’Addesio Director of Media Sales Jennifer Ruzicka Public Relations Coordinator Virgina Gold Publications Assistant Emily Williams
Co-chairs Marc Najork and Prabhakar Raghavan Board Members Hsiao-Wuen Hon; Mei Kobayashi; William Pulleyblank; Rajeev Rastogi; Jeannette Wing
ACM Copyright Notice Copyright © 2011 by Association for Computing Machinery, Inc. (ACM). Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and full citation on the first page. Copyright for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or fee. Request permission to publish from
[email protected] or fax (212) 869-0481.
I
ACM C ou ncil President Alain Chesnais Vice-President Barbara G. Ryder Secretary/Treasurer Alexander L. Wolf Past President Wendy Hall Chair, SGB Board Vicki Hanson Co-Chairs, Publications Board Ronald Boisvert and Jack Davidson Members-at-Large Vinton G. Cerf; Carlo Ghezzi; Anthony Joseph; Mathai Joseph; Kelly Lyons; Mary Lou Soffa; Salil Vadhan SGB Council Representatives Joseph A. Konstan; G. Scott Owens; Douglas Terry
editorial Board E ditor-i n -c hief
E
Executive Director and CEO John White Deputy Executive Director and COO Patricia Ryan Director, Office of Information Systems Wayne Graves Director, Office of Financial Services Russell Harris Director, Office of Marketing and Membership David M. Smith Director, Office of SIG Services Donna Cappo Director, Office of Publications Bernard Rous Director, Office of Group Publishing Scott Delman
STA F F Director of Group P ublishi ng
PL
ACM, the world’s largest educational and scientific computing society, delivers resources that advance computing as a science and profession. ACM provides the computing field’s premier Digital Library and serves its members and the computing profession with leading-edge publications, conferences, and career resources.
M AGA
Z
education policy committee letter
DOI:10.1145/1924421.1924422
Robert B. Schnabel
Educating Computing’s Next Generation In a sensible world—at least as defined by computer scientists who, as we all know, are eminently sensible people—there would be no need for the ACM Education Policy Committee (EPC). Educational systems, and the policymakers and officials who influence them, would be fully aware that computer science knowledge and skills are among the most essential ingredients of a modern education. They would recognize that not only does this knowledge provide the foundation for modern competency in many others fields ranging from sciences to communications to entertainment and more, but that only through giving students deep computer science (CS) knowledge can we expect to have a new generation that can create—not just consume—the next wave of computing innovations. Educators and policymakers also would be fully cognizant that in conversations about STEM (science, technology, engineering and mathematics) jobs, the current and projected demand for computing workers far outstrips any other area of STEM and faces by far the greatest deficiency of supply relative to demand. Of course, we don’t live in a perfect world. All aspects of our world are influenced by history, and the existence of CS still “only” dates back about half a century, which pales in comparison to math, biology, chemistry, physics, and other sciences. While the higher education system adapted fairly quickly to the existence and importance of CS, the K–12 system has not. More recently, the situation has gotten worse in nations including the U.S. and the U.K. In the U.S (the initial focus of the EPC), the last decade has seen significant declines
in the number of K–12 CS courses, the number of students taking the CS advanced placement exam, and the number of undergraduate CS majors. At the same time, however, the demand for CS professionals continues to grow. In addition, the participation of women and underrepresented minorities remains low at all levels. ACM formed the EPC in 2007 to engage policymakers and the public on public policy issues in CS, including: ˲˲ Reviewing issues that impact science, math, and CS education in K–12 and higher education ˲˲ Determining whether current policies are serving the computing field and recommending improvements ˲˲ Commenting on proposals before governmental bodies that impact computing education ˲˲ Educating policymakers on the importance of computing education ˲˲ Providing expertise on key computing education issues to policymakers While the U.S. is the initial focus, computing education is a global issue and many positions of the committee have global applicability. In its brief history, the EPC has had some significant impacts, including: ˲˲ Co-sponsoring the 2010 report “Running on Empty: The Failure to Teach K-12 Computer Science in the Digital Age,” which examined the extent to which CS education is incorporated into current state education standards and to what extent states allow CS courses to count as a graduation credit
in a required or core subject. Key report findings include that only nine states allow a CS course to count as a core graduation credit and that 14 states have adopted no standards (see http:// www.acm.org/runningonempty/). ˲˲ Leading the development and implementation of CSEd Week. Beginning in 2009, the U.S. House of Representatives endorsed the week of Grace Hopper’s birthday (Dec. 6) to recognize the critical role of computing to bolster CS education at all levels. The 2010 week included over 1,700 pledges of support and 270 activities and events from 45 states and over 30 countries. ˲˲ Leading development of the Computing in the Core coalition, a nonpartisan advocacy coalition of associations, corporations, scientific societies, and other non-profits seeking to elevate the national profile of U.S. K–12 CS education. ˲˲ Holding numerous events in D.C. and at conferences, and meetings with government policymakers, to raise awareness of CS education issues. ˲˲ Commenting on various STEM education studies and policies to assure full attention to CS, with marked impact in some cases. There are, as Robert Frost said, miles to go before we sleep. We welcome your suggestions and support! Robert B. Schnabel, chair of ACM’s Education Policy Committee, is a professor and Dean of the School of Informatics at the Indiana University, Bloomington. © 2011 ACM 0001-0782/11/04 $10.00
april 2 0 1 1 | vol. 5 4 | n o. 4 | c o m m u n i c at i o n s of the acm
5
ACM, Advancing Computing as ACM, Advancing a Science and Computing a Professionas a Science and a Profession Dear Colleague,
Dear Colleague, The power of computing technology continues to drive innovation to all corners of the globe, bringing with it opportunities for economic development and job growth. ACM is ideally positioned The power of computing technology to driveininnovation to all corners of the globe, to help computing professionals worldwidecontinues stay competitive this dynamic community. bringing with it opportunities for economic development and job growth. ACM is ideally positioned to provides help computing professionals worldwide stayyou competitive in this dynamic community. ACM invaluable member benefits to help advance your career and achieve success in your chosen specialty. Our international presence continues to expand and we have extended our online ACM provides invaluable benefits to help you advance your careereducators, and achieve success in and your resources to serve needs that member span all generations of computing practitioners, researchers, chosen specialty. Our international presence continues to expand and we have extended our online students. resources to serve needs that span all generations of computing practitioners, educators, researchers, and students. ACM conferences, publications, educational efforts, recognition programs, digital resources, and diversity initiatives are defining the computing profession and empowering the computing professional. ACM conferences, publications, educational efforts, recognition programs, digital resources, and diversity initiatives areare defining the computing professionlearning and empowering thecurrent computing professional. This year we launching Tech Packs, integrated packages on technical topics created and reviewed by expert ACM members. The Tech Pack core is an annotated bibliography of resources from the This year weACM are launching Tech–Packs, integrated learning packages on current technical topics created and renowned Digital Library articles from journals, magazines, conference proceedings, Special Interest reviewed by expert ACM members. The Tech Pack core is an annotated bibliography of resources from the Group newsletters, videos, etc. – and selections from our many online books and courses, as well an nonrenowned ACM where Digitalappropriate. Library – articles from journals, magazines, conference proceedings, Special Interest ACM resources Group newsletters, videos, etc. – and selections from our many online books and courses, as well an nonACM resources where appropriate. BY BECOMING AN ACM MEMBER YOU RECEIVE:
Timely accessAN toACM relevant information BY BECOMING MEMBER YOU RECEIVE: Communications of the ACM magazine • ACM Tech Packs • TechNews email digest • Technical Interest Alerts and Timely access• to relevant ACM Bulletins ACM journalsinformation and magazines at member rates • full access to the acmqueue website for practiCommunications of conference the ACM magazine • ACM PacksACM • TechNews email digest • Technical Interest Alerts tioners • ACM SIG discounts • theTech optional Digital Library and ACM Bulletins • ACM journals and magazines at member rates • full access to the acmqueue website for practitioners • ACM SIGenhance conference discounts the optional ACM to Digital Resources that will your career• and follow you newLibrary positions Career & Job Center • online books from Safari® featuring O’Reilly and Books24x7® • online courses in multiple Resources will enhance your career follow you todigest new positions • access to ACM’s 34 Special Interest languages •that virtual labs • e-mentoring servicesand • CareerNews email Groups&•Job an acm.org forwarding address withbooks spamfrom filtering Career Center • email The Learning Center • online Safari® featuring O’Reilly and Books24x7® • online courses in multiple languages • virtual labs • e-mentoring services • CareerNews email digest • access to ACM’s36 worldwide network of more than 97,000 members rangesaddress from students to seasoned ACM’s Special Interest Groups • an acm.org email forwarding with spam filtering professionals and includes many renowned leaders in the field. ACM members get access to this network and the advantages that come worldwide from their expertise to more keep you the forefront of the technology world. to seasoned professionals and ACM’s network of thanat100,000 members ranges from students includes many renowned leaders in the field. ACM members get access to this network and the advantages that Pleasefrom taketheir a moment to to consider theatvalue of an ACM membership your career and your future in the come expertise keep you the forefront of the technologyforworld. dynamic computing profession. Please take a moment to consider the value of an ACM membership for your career and your future in the Sincerely,computing profession. dynamic Sincerely, Alain Chesnais President Alain Chesnais Association for Computing Machinery
President Association for Computing Machinery
Advancing Computing as a Science & Profession
membership application & digital library order form
Advancing Computing as a Science & Profession
Priority Code: AD10
You can join ACM in several easy ways: Online
Phone
http://www.acm.org/join
Fax
+1-800-342-6626 (US & Canada) +1-212-626-0500 (Global)
+1-212-944-1318
Or, complete this application and return with payment via postal mail Special rates for residents of developing countries: http://www.acm.org/membership/L2-3/
Special rates for members of sister societies: http://www.acm.org/membership/dues.html
Please print clearly
Purposes of ACM
Name
Address
City
State/Province
Country
E-mail address
Postal code/Zip
ACM is dedicated to: 1) advancing the art, science, engineering, and application of information technology 2) fostering the open interchange of information to serve both professionals and the public 3) promoting the highest professional and ethics standards I agree with the Purposes of ACM: Signature
Area code & Daytime phone
Fax
Member number, if applicable
ACM Code of Ethics: http://www.acm.org/serving/ethics.html
choose one membership option: PROFESSIONAL MEMBERSHIP:
STUDENT MEMBERSHIP:
o ACM Professional Membership: $99 USD
o ACM Student Membership: $19 USD
o ACM Professional Membership plus the ACM Digital Library: $198 USD ($99 dues + $99 DL) o ACM Digital Library: $99 USD (must be an ACM member)
o ACM Student Membership plus the ACM Digital Library: $42 USD o ACM Student Membership PLUS Print CACM Magazine: $42 USD o ACM Student Membership w/Digital Library PLUS Print CACM Magazine: $62 USD
All new ACM members will receive an ACM membership card. For more information, please visit us at www.acm.org Professional membership dues include $40 toward a subscription to Communications of the ACM. Student membership dues include $15 toward a subscription to XRDS. Member dues, subscriptions, and optional contributions are tax-deductible under certain circumstances. Please consult with your tax advisor.
RETURN COMPLETED APPLICATION TO: Association for Computing Machinery, Inc. General Post Office P.O. Box 30777 New York, NY 10087-0777 Questions? E-mail us at
[email protected] Or call +1-800-342-6626 to speak to a live representative
Satisfaction Guaranteed!
payment: Payment must accompany application. If paying by check or money order, make payable to ACM, Inc. in US dollars or foreign currency at current exchange rate. o Visa/MasterCard
o American Express
o Check/money order
o Professional Member Dues ($99 or $198)
$ ______________________
o ACM Digital Library ($99)
$ ______________________
o Student Member Dues ($19, $42, or $62)
$ ______________________
Total Amount Due
$ ______________________
Card #
Signature
Expiration date
in the virtual extension DOI:10.1145/1924421.1924424
In the Virtual Extension To ensure the timely publication of articles, Communications created the Virtual Extension (VE) to expand the page limitations of the print edition by bringing readers the same high-quality articles in an online-only format. VE articles undergo the same rigorous review process as those in the print edition and are accepted for publication on merit. The following synopses are from articles now available in their entirety to ACM members via the Digital Library.
contributed article
contributed article
review article
Emergency! Web 2.0 to the Rescue!
A Research Doctorate for Computing Professionals
Achievements and Challenges in Software Reverse Engineering
Fred Grossman, Charles Tappert, Joe Bergin, and Susan M. Merritt
Gerardo Canfora, Massimiliano Di Penta, and Luigi Cerulo
The Doctor of Professional Studies (DPS) in Computing at Pace University provides computing and information technology professionals a unique opportunity to pursue a doctoral degree while continuing to work full time. It supports interdisciplinary study among computing areas and applied research in one or more of them, and thereby provides a background highly valued by industry. It is an innovative post-master’s doctoral program structured to meet the needs of the practicing computing professional. The DPS in Computing is distinguished from the Ph.D. by focusing upon the advancement of the practice of computing through applied research and development. It is designed specifically for people who want to do research in an industrial setting.
The need for changing existing software has been with us since the first programs were written. Indeed, the necessity for modifying software requires understanding it. Software comprehension is a human-intensive process, where developers acquire sufficient knowledge about a software artifact, or an entire system, so as to be able to successfully accomplish a given task, by means of analysis, guessing, and formulation of hypotheses. In most cases, software comprehension is challenged by the lack of adequate and up-to-date software documentation, often due to limited skills or to resource and timing constraints during development activities. This article discusses the developments of broad significance in the field of reverse engineering, and highlights unresolved questions and future directions. The authors summarize the most important reverse engineering concepts and describe the main challenges and achievements. They provide references to relevant and available tools for the two major activities of a reverse engineering process and outline future trends of reverse engineering in today’s software engineering landscape.
DOI: 10.1145/1924421.1924449
Ann Majchrzak and Philip H.B. More Beginning Sunday October 21, 2007, San Diego County experienced the worst fires ever reported in a large urban area in the U.S. As seven major fires and five lesser fires raged, Ron Roberts, Chair of the San Diego County Board of Supervisors, said, “We are entering day three of what appears to be one of the worst fires, probably the worst fire in San Diego County history, and easily one of the worst fires in the history of the state of California.” The after-action report later said the seven fires collectively caused 10 civilian deaths, 23 civilian injuries, and 89 firefighter injuries, consumed 370,000 acres, or about 13% of the county’s total land area, destroyed 1,600 homes, and cost in excess of $1.5 billion. Sociological research on disaster situations has repeatedly recognized the value of volunteers, demonstrating their importance providing physical and emotional assistance, as well as their need for timely information. However, local and federal disaster-management planning and policy implementation have generally ignored the role of volunteers. Because volunteers sometimes add to the chaos of a disaster, disaster officials may even discourage their participation.
DOI: 10.1145/1924421.1924450
viewpoint
DOI: 10.1145/1924421.1924448
Reaching Future Computer Scientists Patricia Morreale and David Joiner The decline in undergraduate enrollment at the university level is well documented and it begins in high school. Today’s high school students are exposed to traditional math and science curriculums but exposure to computer science and associated computational thinking is frequently absent from the U.S. high school experience. While the importance of computer science to the U.S. national curriculum has been addressed by some states with teacher certification programs, the field cannot wait another generation of students for legislated standards to be implemented. Instead, the authors advocate university faculty reaching out to high school faculty. By holding teacher workshops, the authors have been able to update and enhance many of the ideas current high school faculty have regarding applications of computers in the sciences.
8
communications o f the acm
| april 2 0 1 1 | vol . 5 4 | no. 4
DOI: 10.1145/1924421.1924451
letters to the editor DOI:10.1145/1924421.1924423
I Want a Personal Information Pod
I
found much to agree with in Stephen Davies’s article “Still Building the Memex” (Feb. 2011) but also feel one of his suggestions went off in the wrong direction. Davies imagined that a system for managing what he termed a “personal knowledge base” would be a “distributed system that securely stores your personal knowledge and is available to you anywhere...” and toward this end dismissed handheld devices due to their hardware limitations while including handheld devices as a way to access the networked distributed system. Just as Davies might have imagined future built-in network capabilities able to guarantee access anywhere anytime at desirable speeds to the desired information (presumably at reaonable cost), the rest of us, too, can imagine personal devices with all the necessary capabilities and interface control. Such devices would be much closer to Vannevar Bush’s Memex vision. Bush was clearly writing about a personal machine to store one’s collection of personal information, and a personal device functioning as one’s extended memory would be far preferable to a networked distributed system. But why would any of us trust our personal extended memory to some networked distributed resource, given how often we are unable to find something on the Web we might have seen before? In my own exploration of Bush’s Memex vision (“Memex at 60: Internet or iPod?” Journal of the American Society for Information Science and Technology (July 2006), 1233–1242), I took a stab at how such a personal information device might be assembled and function, comparing it to a combination iPod and tablet PC, resulting in a personal information pod. Ultimately though, I do fully agree with Davies as to the desirability of a tool that benefits any of us whose “own mind” is simply “insufficient for retaining and leveraging the knowledge [we] acquire.” Richard H. Veith, Port Murray, NJ
Author’s Response: Veith makes a fair point. For users, what ultimately matters is whether their knowledge base is ubiquitously available and immune to data loss—as provided by the distributed solution I described but that would be handled just as well by a handheld device with synchronized backups. Stephen Davies, Fredericksburg, VA
(Feb. 2011) transposed technical credit cited for James A. Katzman and Joel F. Bartlett at Tandem in the late 1970s. Katzman and Bartlett were key contributors on hardware and operating system, respectively. Communications welcomes your opinion. To submit a Letter to the Editor, please limit your comments to 500 words or less and send to
[email protected]. © 2011 ACM 0001-0782/11/04 $10.00
More Industrial Researchers at Conferences and Workshops Moshe Y. Vardi’s Editor’s Letter (Jan. 2011) posed the possibly rhetorical question “Where Have All the Workshops Gone?” Just as business strategy is influenced by tax law, the nature of workshops is determined, in part, by academic- and corporate-funding policies. Corporate attendance at an external (to the organization) conference or workshop is often driven by whether a mainstream research paper has been accepted for publication in its proceedings. Likewise, academic promotions and salary increases at research universities are influenced by whether a paper was published in “first tier” conference or workshop publications. These policies are mainly responsible for the demise of the type of workshop Vardi described. I have found, at least in my company, that enlightened management encourages the free exchange of ideas and best practices at such workshops, even though they are typically closed events due to IP concerns. However, organizing them so they’re indeed open to the public and attract attendees and speakers likely requires a change in promotion and tenure policies at universities and a value proposition that encourages participation by industrial researchers. Brian Berenbach, Princeton, NJ
Coming Next Month in
Communications Brain-Computer Interfaces for Communication and Control The Future of Microprocessors: Changing Technology Landscape Shapes the Next 20 Years of Microprocessor Architecture Proving Program Termination Privacy-Preserving Network Forensics An Interview with Steve Furber Viewpoint: The Importance of Reviewing the Code And the latest news on domestic-assistance robots,
Correction Due to an editing error, the letter to the editor by Paul McJones “Credit Due for Tandem’s Hardware and OS”
computational metaphysics, large-scale organization of photos, and artificial intelligence for developing nations.
april 2 0 1 1 | vol. 5 4 | n o. 4 | c o m m u n i c at i o n s of the acm
9
The Communications Web site, http://cacm.acm.org, features more than a dozen bloggers in the BLOG@CACM community. In each issue of Communications, we’ll publish selected posts or excerpts.
Follow us on Twitter at http://twitter.com/blogCACM
doi:10.1145/1924421.1924425
http://cacm.acm.org/blogs/blog-cacm
Matters of Design, Part II Jason Hong discusses how Apple creates well-designed products and what the human-computer interaction community can learn from its methods. Jason Hong “Why is Great Design so Hard (Part Two)?” http://cacm.acm.org/ blogs/blog-cacm/97958
This blog entry is a follow-up to a previous one a few weeks ago [“Why is Great Design so Hard?”, which appeared in the BLOG@CACM department in the Feb. 2011 issue] looking at the challenges of integrating great interaction and user experience design into organizations. Basically, my question was, Everyone wants well-designed products, but why do few organizations seem to be able to make it happen? What are the barriers? Why isn’t good design more pervasive? If we want good design to spread to more organizations, we need to have a better understanding of what is and isn’t working. My last blog entry examined challenges that software companies face in incorporating design into how they make products. This blog entry takes a different tack and looks at what successful organizations are doing right. More specifically, I’ve been having many discussions with different individuals over the past few weeks about 10
communi cations o f th e acm
how Apple does design. Many people perceive Apple as the paragon of design, with clean, sleek interfaces that aren’t just easy to use, but also are fun, attractive, and aesthetically appealing on a visceral level. So, what is Apple doing right? One point repeated by many people was that the culture of design is pervasive throughout Apple. Everyone knows it’s important and that it helps sell products. What surprised me were examples of how this culture was translated into practical everyday work at Apple. Some people mentioned how strong top-down design work of the overall user experience was done up-front, rather than the typical bottom-up user interface tinkering done by most organizations (and then bringing in designers after the system has already been built). One story that stood out in particular really helped me understand just how much prominence was given to design. A former Apple employee that worked on designing hardware recounted how he was given a prototype of a physical form factor by an industrial designer. His team looked at the
| april 2 0 1 1 | vol . 5 4 | no. 4
shape and size, and said what the designer was asking for was impossible. The industrial designer pushed back and said, “Prove it.” The team iterated on various hardware layouts and got to about 90% of what the industrial designer wanted, and told him if he made a few changes to the form factor, they could make everything fit. Another surprise was how different Apple’s design methods are from “standard” best practices in humancomputer interaction (HCI). For example, a typical method we teach in HCI is to start with ethnographic field studies to gain deep insights into what people do, how they do it, and why they do it. Another best practice is to do iterative user testing with benchmarks, to ensure that people find products useful and usable. From what I can tell, Apple doesn’t use either of these methods. Instead, people described three different methods used at Apple. The first is that Apple preferred having subject matter experts who have many years of experience in the field be part of their teams. For example, for Aperture, Apple’s photo management software, it might be an expert photographer who deals with tens of thousands of photographs on a regular basis. For iMovie, it might be a team of people who edit movie clips for a living. In one sense, this approach might be thought of as an adaptation of participatory design, where the people who will eventually use the software help design the software. Historically, however, participatory design has been used for custom
blog@cacm software for a specific organization, so Apple’s use of experts for mass market software is a new twist. The second is that people at Apple think really long and hard about problems. From that perspective, certain solutions will pop out as being obviously better ways of doing things. Thus, part of Apple’s strategy is to guide people toward that way of thinking as well. If you see problems the same way that Apple does, then the structure and organization of an interface will make sense. The third is that Apple tends to design by principle rather than from data. In HCI classes, we emphasize making design decisions based on evidence as much as possible, for example, from past user studies on previous iterations of the interface or from ethnographic field studies. In contrast, at Apple, more weight is given to design decisions made from first principles. So, what does this all mean? I have two closing thoughts. First, should we just throw away existing HCI methods for design? Given the sharp contrast between traditional methods in HCI and the methods used at Apple, and given the success of Apple’s products, do HCI methods actually matter? One of my colleagues has a good counterargument, which is that Apple’s products aren’t always the first in an area. The iPod wasn’t the first MP3 player, iTunes wasn’t the first online music store, and the iPhone wasn’t the first smartphone. As such, Apple can learn from the mistakes of others, apply the skills of subject matter experts, and hone existing designs in a proven market. However, for new kinds of hardware or applications that there isn’t a lot of precedence for, this style of “think really long and hard” won’t be as effective in pinpointing user needs and developing products in new markets. Second, how much prominence should be given to design within organizations? What is the right mix of design, engineering, and business needs? For example, the so-called “death grip” for iPhones, where holding the phone the wrong way leads to a drop in signal strength, is clearly an engineering problem rather than an interaction design problem. A better mix of design and engineering may have caught the problem long before production and shipping.
Furthermore, it’s actually not clear if Apple’s approach to design is optimal or even replicable. Apple’s approach relies heavily on people at the top of the organization consistently coming up with great ideas and great designs. However, we’ve also seen a tremendous amount of innovation with reasonably good interfaces coming out of Google, Facebook, Amazon, and other high-tech companies. Perhaps there are ways of helping organizations be more user-centric without having to radically restructure their company, a topic I’ll explore in my next blog post. Comments Did you talk to these people and ask them, Why do you think Apple no longer shows up at the annual HCI conference SIGCHI runs? Do you think Apple believes standard HCI practices are not really useful? Why aren’t they part of the conversation at these conferences? —Ed Chi
Jason Hong responds Apple hasn’t had a strong research presence since Apple’s Advanced Technology Group (ATP) was closed in 1997 by Steve Jobs. Given the style of products Apple has been pursuing, as well as their success (bigger market cap than Microsoft now!), it’s hard to say it was a bad decision for Apple, though it was clearly bad for the research community. My impression as to why Apple isn’t part of the conversation is because it’s not the style of their work. There are still many organizations where industrial design is seen as the only form of design. It’s also not part of their DNA. Steve Jobs is well known for being ultra-secretive, and this outlook just doesn’t mesh well with the open sharing in research. In terms of valuing HCI methods, again I think it’s just not part of their culture. Given their successes too, it would be hard to say they need to do something different. However, I think the key point was what I mentioned in the blog entry—Apple’s recent line of products have been more about perfecting existing products and addressing well-known needs. I don’t think this is a bad thing, but it may suggest some new insights as to when and how
we should be applying HCI methods vs. other approaches. Comments Have you seen Alain Breillatt’s article where he documents Apple’s development process based on a talk from a senior engineering manager at Apple? “You Can’t Innovate Like Apple” http://www.pragmaticmarketing.com/ publications/magazine/6/4/you_cant_ innovate_like_apple Basically, Apple bets the company every time it does a new product, and takes high risks to change the market each time. Most other companies wouldn’t risk failure to do these things. Also, some relevant commentary from Bruce Tognazzini on the differences at Apple: http://asktog.com/ columns/082iPad&Mac.html —James Jarrett Jason: If Apple can be successful without using more traditional HCI methods, then it is high time that we take a look at ourselves, and think about the true impact of HCI design and evaluation methodologies. Supposedly, they do no market research, and implies they hardly ever talk to real users during the design process. That means, I guess, there is very little participatory design or iterative refinement. Blasphemy! Ha ha! Speaking of ATG: There are some great research that was done at the ATG before it shut down. Spotlight came from Sherlock, which I believe was transferred from ATG. So is the idea of the Apple Data Detector entity extraction and interaction ideas. James: If they bet the house every time, I can see how they have to be more secretive. Thanks for the link. Very interesting read. —Ed Chi Keep in mind that although successful, Apple provides just one way to think about design. They have a small and narrowly focused product line. Further, they are designing for a (relatively) small user group. There are other consumer products companies that continue to value ethnographic research and use it to inform “good design.” Finding examples of this means thinking beyond tech companies.… —spwyche Jason Hong is an associate professor of computer science at Carnegie Mellon University. © 2011 ACM 0001-0782/11/04 $10.00
april 2 0 1 1 | vol. 5 4 | n o. 4 | c o m m u n i c at i o n s of t he acm
11
cacm online
ACM Member News DOI:10.1145/1924421.1924426
Scott E. Delman
ACM on the Move Over the coming weeks, ACM will be diving headfirst into the world of mobility. Our strategy is essentially two-pronged. The first is the development of a mobile version of the existing CACM Web site at cacm.acm.org. After launching the new site in March, whenever our readers access the Communications’ Web site using any mobile device, they will automatically be brought to the new mobile site, which is a cleaner, more simplified version of the main site that enables quick viewing, saving, and sending of articles without many of the extraneous features and functionality present on the main site. Speed, access, browsability, and sharability are the main goals of this new site. We are not the first scholarly or professional magazine to launch a mobile Web site for our readers, but we are among the first and we are certain the usage of this new site will be significant, if the current traffic generated by mobile devices is any indication. The second component of our mobile strategy is apps. In March, ACM launched its first two apps, which are built on the iOS development platform and include both an iPhone and iPod Touch version and a tablet-sized iPad version. The decision to develop on this platform over other popular platforms such as Android or BlackBerry as our initial foray into mobile application development was based primarily on a quantitative analysis of the devices actually being used to download articles from Communications’ Web site at cacm.acm.org. Based on Google Analytics usage reports we generated at the time, iPods, iPhones, and iPads represented the overwhelming majority of the devices being used to download articles and so we opted to accommodate our existing users as our first experiment into mobility. Once we start receiving feedback from the community, both positive and negative, we will take all that we have learned through this process and begin work on developing applications for the Android platform, which is quickly catching up to Apple in terms of penetration on our Web site. Ultimately, the goal in launching mobile applications is to provide a new and meaningful way to interact with articles and commentary published both in Communications and its complementary Web site. Apps offer a unique user experience and ability to easily access and share articles with colleagues, peers, and students and to reach our readers while they are traveling to work, enjoying their coffee in the morning, waiting for a flight, or simply curling up on the couch with an issue of Communications on their iPad. In coming issues we will provide helpful hints on using these apps and the new mobile site to enhance your enjoyment of these new elements of Communications. 12
communications of th e ac m
| a p r i l 2 0 1 1 | vo l . 5 4 | n o. 4
Marin Litoiu Wins IBM Award One of the leading pioneers in cloud computing, Marin Litoiu, associate professor at York University, was named IBM’s Center for Advanced Studies Faculty Fellow of the Year for his research on automating processes in the cloud. The cloud allows multiple users to share both computer hardware and software applications, reducing the expense of maintaining those tools themselves. “It’s easy to say but very hard to do,” says Litoiu. He has built an analytical model that examines applications as they are running, tests different variations of resource allocation, and dynamically adjusts the sharing. Because it takes several minutes to shift resources around, Litoiu is also building a predictive model that looks at shortterm trends in demands for resources, such as memory, CPU time, and network bandwidth, and suggests improvements in provisioning. Litoiu expects cloud computing to become more geographically distributed, so his York lab is trying to emulate a network of “sub-clouds” to examine how thousands of applications running simultaneously, each trying to optimize its own resources, affect each other and perform over networks. He’s received $234,000 from the Natural Sciences and Engineering Research Council of Canada and $266,000 from IBM to upgrade his York lab. One important challenge is figuring out how to use the cloud as a platform on which to develop new applications. Litoiu says Amazon’s approach gives software developers full control of applications, but makes development more difficult, while Google’s model is simpler but less flexible. “It’s not clear what the best programming model is,” he says. But relying on the cloud for software and infrastructure services? “That’s going to evolve rapidly.” —Neil Savage
N
news
Science | doi:10.1145/1924421.1924427
Gary Anthes
The Quest for Randomness
It’s not easy to generate a string of numbers that lack any pattern or rule, or even to define exactly what randomness means.
Illustratio n by w eknow/sh ut t erstock
G
enerating
random
num-
might seem pretty simple. After all, so many things in our everyday lives occur without pattern or repetition—coin tosses, for example, or the distribution of raindrops on your car windshield. In any case, the whole notion of striving for randomness might seem a bit alien to a computer scientist, who is trained to make processes predictable and repeatable. Cryptographers use random number generators (RNGs) to produce unpredictable keys for coded messages, digital signatures, challenge-response systems, and other technologies at the heart of information security. And researchers in fields such as biology, economics, and physics use them to simulate the real world via Monte Carlo models. Human life and great sums of money may depend on the robustness of these systems, yet it is often difficult to prove that they work correctly—and will continue to do so. Part of the problem is one of definition. One might say, for example, that a random bit generator (as RNGs are often called) must over time produce half 0s and half 1s, or that it must genbers
erate the sequence “00” one-quarter of the time. But, points out Silvio Micali, a computer scientist and cryptography expert at Massachusetts Institute of Technology (MIT), the sequence 0, 1, 2, ... 9 (written in binary) would pass those tests but would hardly be considered “random.”
Definitions matter, Micali says. “If you define [randomness] correctly, then you can get it. But if you are vague or unclear, you can’t possibly get it.” In 1984, Micali and Manuel Blum, a professor of computer science at Carnegie Mellon University, published an influential paper, “How to Generate
april 2 0 1 1 | vol. 5 4 | n o. 4 | c o m m u n ic at i o n s of t he acm
13
news
Pseudorandomness, and How to Get It “You can’t start with a blank sheet of paper and generate 100 million random numbers,” says Silvio Micali, a computer scientist at Massachusetts Institute of Technology. “That is very hard.” Instead, he says, one should begin with a relatively small list—say, 100 bits—that are truly random, most likely generated by some physical process such as quantum mechanics. He goes on to define pseudorandom number generators as software that “expands” the randomness in this initial small list, or seed. “Now I want to generate 100 million random bits of very high quality,” he says. “But how do I define ‘high quality’?” He and Manuel Blum, a professor of computer science at Carnegie Mellon University, proposed a definition in which anyone shown the first n bits generated, but not the seed, could not guess what the next bit would be substantially more than 50% of the time. “Substantially more” might be defined as 51%, 50.1%, 50.01%, or some other threshold, depending on one’s security requirements. The Micali-Blum definition also includes the notion of computational resources. “If all the computers in the world must work for one million years to guess the next bit, then it is high quality,” says Micali, who believes the ideal seed contains 300 bits. Guessing the 300-bit seed that generated any given sequence by trial-and-error would mean testing more possibilities than there are elementary particles in the universe. Given the seed list of truly random bits, Micali and Blum prescribed the use of one-way functions to produce the desired sequence of random bits. In cryptography, a one-way function is easy to compute in one direction—for example, determining the product of two large prime numbers—but computationally very difficult to reverse—factoring that product back into the two primes. That is the method used in public-key cryptography, for example. Similarly, the Blum-Micali generator of pseudorandom bits employs the asymmetry between computing discrete logarithms, which is difficult, and the inverse, discrete exponentiation, which is easy. Micali refuses to comment on any of the existing ways that so-called “true” random numbers are generated. “I prefer to be agnostic about seed generation,” he says, “because I find it conceptually difficult to convince myself of their randomness.” —Gary Anthes
Cryptographically Strong Sequences of Pseudorandom Bits,” that laid out just such a definitional framework (see the “Pseudorandomness, and How to Get It” sidebar above). Another difficulty lies in testing a RNG. Traditionally, a test checks to see if the system in question reliably produces output known in advance to be correct, but when is a stream of random bits correct? The U.S. National Institute of Standards and Technology (NIST) has developed “known-answer tests” for software-based, or deterministic, RNGs, because any given algorithm is expected to produce the same answer every time, given the same input. But convincing tests of the output of nondeterministic RNGs, such as a noise source in hardware, are not so easy to construct. A third problem, related to testing, is that some RNGs initially work correctly but gradually and silently fail over time, introducing biases that corrupt randomness. The deterministic nature of soft14
communi cations o f th e acm
ware, which produces pseudorandom numbers, has led researchers to look for ways to produce “true” randomness, and that always involves hardware or a physical process not governed by rules. Indeed, the Russian mathematician Andrei Kolmogorov defined randomness in a way that rules out software as a way to produce it. Ac-
A cryptography system in finance or national security might need to generate millions of random bits per second.
| april 2 0 1 1 | vol . 5 4 | no. 4
cording to Kolmogorov, a string of bits is truly random if the shortest description of the string is as long as the string itself. Researchers have based nondeterministic RNGs on disk head seek times, thermal noise, atomic scale quantum effects, and other phenomena that are statistically shown to be fundamentally random. Theoretically perfect hardware approaches can suffer from efficiency problems, says MIT’s Micali. For example, one can imagine looking at the Brownian movement of a particle suspended in a liquid and counting it as a 0 if it is on the left and a 1 if it is on the right. “To get the first bit is easy,” he says, “but to get 1,000 is a little bit hard.” A cryptography system in banking or national security might need to generate millions of random bits per second. A Matter of Timing A more subtle problem is one of timing. Suppose one samples the randomly fluctuating noise from a diode and records a 1 if it is below a certain threshold and a 0 above. If you sample too often, you are likely to get biased strings like 000011110000 because the noise level doesn’t change fast enough. For this and a variety of other reasons, hardware sources are not immune to biases that affect the randomness of their output. But now Intel Corp. claims to have found a way around the drawbacks of nondeterministic, hardware-based RNGs. It recently announced that it had created a working prototype of a digital circuit, which could be resident within the central processing unit, that harvests randomness, or “entropy,” from thermal noise in the chip. “Ours is the first truly, fully digital RNG,” claims Ram Krishnamurthy, a senior principal engineer at Intel’s Circuits Research Lab. He says conceptually similar devices, known as true random number generators, have been built from analog circuits that are subject to disruption from the process and noise variations in the central processing unit, so they have to be isolated off-chip and connected by a bus that is vulnerable to snooping. They also employ large capacitors that make manufacturing and scaling difficult, Krishnamurthy says.
news Intel’s circuit is implemented in a 45nm complementary metal oxide semiconductor and can generate 2.4 billion truly random bits per second—200 times faster than the best available analog equivalent—while drawing just 7 milliwatts of power, according to Krishnamurthy. And the technology is highly scalable, he says, so that multiple copies of the digital circuit could be coupled in parallel arrays. The technology could be scaled up in this way to directly provide the random numbers needed by large systems, or it could be scaled down to low voltages so as to just provide highentropy seeds for a software-based RNG, Krishnamurthy says. In the latter mode, it would operate at 10 megabits per second and draw just 10 microwatts of power. Krishnamurthy acknowledges that the circuit’s output is subject to process fluctuations—caused by transistor, power supply, and temperature variations—that could introduce bias in its output. But Intel has developed a self-calibrating feedback loop that compensates for those variations. The resulting design operates at a level of entropy of 99.9965% and has passed the NIST tests for randomness, Krishnamurthy says. But more work on these tests is needed, says Elaine Barker, a mathematician in NIST’s Computer Security Division. “The thing we have been really struggling with is how to test the entropy sources, the noise
Some random number generators initially work correctly but silently fail over time, introducing biases that corrupt randomness.
Further Reading Barker, E. and Kelsey, J. Recommendation for random number generation using deterministic random bit generators (revised), NIST Special Publication 800-90, U.S. National Institute of Standards and Technology, March 2007. Blum, M. and Micali, S. How to generate cryptographically strong sequences of pseudorandom bits, SIAM Journal on Computing 13, 4, November 1984. Menezes, A., van Oorschot, P., and Vanstone, S. Pseudorandom bits and sequences, Handbook of Applied Cryptography, CRC Press, Boca Raton, FL, 1996.
sources,” she says. “We are kind of feeling our way along. How do you do general testing for technology when you don’t know what will come along? We are not really sure how good these tests are.” Entropy sources may not produce output that is 100% random, and different test samples from a single source may have different degrees of randomness. “We want to know how much entropy is in 100 bits—100, 50, or two?” Barker says. “And does it continue that way?” Indeed, generating random numbers today clearly lags what is theoretically possible, Micali says. “We are still in the mode of using library functions and strange things that nobody can prove anything about,” he says.
Rukhin, A, Soto, J., Nechvatal, J., Smid, M., Barker, E., Leigh, S., Levenson, M., Vangel, M., Banks, D., Heckert, A., Dray, J., and Vo, S. A statistical test suite for random and pseudorandom number generators for cryptographic applications, NIST Special Publication 800-22, U.S. National Institute of Standards and Technology, April 2010. Srinivasan, S., Mathew, S., Ramanarayanan, R., Sheikh, F., Anders, M., Kaul, H., Erraguntla, V., Krishnamurthy, R., and Taylor, G. 2.4GHz 7mW all-digital PVT-variation tolerant true random number generator in 45nm CMOS, 2010 IEEE Symposium on VLSI Circuits, Honolulu, HI, June 16–18, 2010. Gary Anthes is a technology writer and editor based in Arlington, VA. © 2011 ACM 0001-0782/11/04 $10.00
Society
Predictive Modeling as Preventive Medicine If an ounce of prevention is worth a pound of cure, then how much prevention would it take to put a dent in the U.S.’s projected $2.8 trillion annual health-care tab? How about $3 million worth? That’s the amount the Heritage Provider Network is offering as prize money in a new contest for developers to create algorithms aimed at identifying those patients most likely to require hospitalization in the coming year. (Contest details are available at http://www. heritagehealthprize.com/.)
Previous studies have suggested that early treatment of at-risk patients can dramatically reduce health care expenditures. For example, a well-regarded study in Camden, NJ, demonstrated that hospitals could slash costs by more than 50% for their most frequently hospitalized patients by targeting them with proactive health care before they incurred expensive trips to the emergency room. In hopes of replicating those savings on a larger scale, the contest organizers will
furnish developers with a set of anonymized medical records for 100,000 patients from 2008. Using this data set, developers will try to develop a predictive algorithm to pinpoint those patients most likely to have been hospitalized the following year. For example, an effective algorithm might identify patients who have already been diagnosed with diabetes, high cholesterol, hypertension, and premature menopause—and correctly predict the 90% likelihood that such a patient would wind up in the hospital
within the year. “The Heritage Health Prize is a high profile way to harness the power of predictive modeling and using it solves one of America’s biggest challenges,” says Jeremy Howard, chief data scientist for Kaggle, the company managing the contest. If successful, this effort could yield a powerful computational tool to help rein in spiraling health-care costs, potentially saving billions of dollars—or at least a few pounds’ worth of cure. —Alex Wright
april 2 0 1 1 | vol. 5 4 | n o. 4 | c o m m u n i c at i o n s of t he acm
15
news Technology | doi:10.1145/1924421.1924428
Kirk L. Kroeker
Engineering Sensation in Artificial Limbs
R
esearc hers
wo rki n g
in
advanced prosthetics, a field that draws on physics, biochemistry, and neuroscience, are attempting to overcome key engineering challenges and make potentially life-changing artificial limbs capable not only of moving as naturally as healthy human limbs but also of providing sophisticated sensory feedback for their users. Improvements in materials science, electronics, and sensor technologies, along with sizeable government funding in the U.S. and abroad, are contributing to these efforts. However, while leg prosthetics have advanced even to the point where it is now possible for amputees to rival the capabilities of nondisabled athletes in certain sports, the nuanced movements of the human hand remain the most difficult to replicate mechanically. Advancements in portable power and mobile electronics have led to several prosthetics innovations in recent years. One example is the i-Limb, the result of a project that garnered much media attention when the device came to market in 2007 and was said to be the first commercially available hand prosthesis with five individually powered digits. While the technology for enhancing motor functions continues to improve, providing reliable sensory feedback to users about what an artificial hand might be touching remains a distant goal for scientists and continues to be an active area of research. The U.S. Defense Advanced Research Projects Agency (DARPA) has been funding extensive work in this area, with the objective being to create artificial limbs that tie directly into the nervous system, facilitating more natural control and reliable sensory feedback. The charter of Advanced
16
communi cations o f th e ac m
A prototype created by the Revolutionizing Prosthetics project in the Applied Physics Lab at Johns Hopkins University.
Prosthetics, the first DARPA-sponsored project in this field, was to produce a strap-on artificial arm built with the most advanced ready-to-use technology available. That program, run by DEKA Research, resulted in a prosthesis called the “Luke Arm.” The second DARPA-sponsored program, Revolutionizing Prosthetics, focused on using new control technologies and neural interface strategies to create a modular arm that could match the performance of a healthy limb. Stuart Harshbarger, chief science officer at Orthocare Innovations, directed DARPA’s Revolutionizing Prosthetics project from the Applied Physics Lab at Johns Hopkins University. The research timeline of Harshbarger’s program, which came to a close earlier this year, overlapped with DEKA’s development and involved collaboration with more than 300 researchers at 30 institutions. “The program made enormous strides in demonstrating that natural and intuitive control of increasingly anthropomorphic artificial limbs is
| april 2 0 1 1 | vol. 5 4 | no. 4
indeed possible in the near term,” says Harshbarger. “But there is much basic engineering work to do to make such systems reliable and affordable for clinical adoption.” DARPA projects typically fall into the high-risk, high-reward category, with the idea usually being to show that the impossible can be made possible. Harshbarger says that, by this measure, the demonstrations created for the Revolutionizing Prosthetics program succeeded. The project produced what Harshbarger calls the world’s first closed-loop, multisite, multi-electrode cortical interface technologies and the world’s first wireless, multi-electrode, motor-decoding applications designed to enable neurophysiology studies that have not been possible with wired setups. “The science and technology components progressed, as proposed, and possibly with greater success that was originally contemplated,” says Harshbarger. As with any advanced research, key challenges remain, such as the viability of cortical and peripheral neural interface technologies, but Harshbarger says these issues can be addressed incrementally in later research. The next DARPA-sponsored research in this area will focus specifically on braincomputer interface challenges, where Harshbarger says a significant amount of work remains to be done. Modular Designs Given that no two amputations are exactly alike, one of the objectives for researchers working in the Revolutionizing Prosthetics project was to create modular designs so the devices coming out of the lab could accommodate the reality that each user and prosthetic setup would be unique. One research team, for example, focused
photogra phs by F red rik Seb elius, Lund Universit y
Advancements in mobile electronics have led to several prosthetics innovations in recent years, but providing reliable touch sensations to users remains an elusive goal.
news on developing a targeted muscle reinnervation (TMR) procedure to restore natural touch sensations. The TMR procedure’s modular sensor systems, which measure a mere 5 × 5 millimeters and contain 100 electrodes for recording nerve signals and stimulating nerve fibers, are now entering a commercial transition phase. Harshbarger says that keeping these different projects synchronized and focused on a modular design philosophy required constant communication. “Most people, including myself, believed that coordination of these various research groups and their principal investigators would be like herding cats,” he says. “In reality, however, I was surprised and exceptionally proud of how the team focused on the common goals and mission of the project.” Another researcher working in this area is Fredrik Sebelius, coordinator of the SmartHand project and a professor in the department of measurement technology and industrial electrical engineering at Lund University. The SmartHand project, spanning multiple research facilities at several universities across Europe, paralleled DARPA’s Revolutionizing Prosthetics program in its ambitiousness, but focused exclusively on replicating the capabilities of the human hand. The goal for Sebelius and other researchers working in the project was to create an electromyography-controlled robotic hand that could deliver feedback to the user by stimulating what Sebelius calls the sensoric phantom map. Sebelius describes the sensoric phantom map as a well-organized region of the brain that can be stimulated through nerves in the arm to produce feelings in a missing hand. The idea might appear to be straightforward, but the implementation is far from simple. Sebelius and his team developed and refined their approach to phantom maps by using functional magnetic resonance imaging and a modular neural interface designed to be attached to an arm’s nerve bundles. “The sensors on the hand prosthesis deliver tactile information to the subject’s phantom map via actuators and, voila, the subject experiences sensation from the missing fingers,” he says. The SmartHand project reached the proof-of-concept stage, but Sebelius
says much work needs to be done before the technology can move out of the lab. The main challenges, he says, are biocompatibility and signal loss. As for the SmartHand prosthesis itself, Sebelius says the project will need more funding to make it ready for commercialization. “The robotic hand was only intended for research groups,” he says. “It has not undergone production development.” Still, Sebelius notes the research that led to the creation of the hand made it possible to achieve crucial advances for ongoing work in nerve-signal recognition and processing. Like Harshbarger’s Revolutionizing Prosthetics project leading to several new technologies that are now spinning off into different research teams, the SmartHand project formally came to an end last year with several subproj-
ects gaining independent momentum. Sebelius says he and his team now will focus specifically on the sensory feedback system to design a new module that can integrate the phantom map concept into any prosthetic device. He says he remains optimistic about the work but is careful to note that current signal-processing technology and portable electronics cannot perfectly replicate the performance of a healthy human hand controlled by thousands of nerves. “Hopefully,” says Sebelius, “we will be able to achieve a level of sophistication so the user will not see the prosthesis as a tool but rather as an extension of the body.” While the potential benefit of such sensory-feedback technology is readily apparent, the cost of prosthetics designed for sensation may be prohibi-
A myoelectric prothesis created by researchers in the SmartHand project at Lund University. april 2 0 1 1 | vol. 5 4 | n o. 4 | c o m m u n i c at i o n s of t he acm
17
news tive. Today, advanced prosthetic devices can cost more than $20,000, putting them outside the buying range of amputees living in developing nations. Several organizations are dedicated to creating low-cost prosthetics, especially for nations with large numbers of people who have lost limbs due to landmines. One such project, Mobility for Each One, led by Canadian industrial designer Sébastien Dubois, has developed an energy-return prosthetic foot that can be locally produced for a mere $8. But as of yet, there is no project that operates as an analog to the One Laptop Per Child program, seeking to provide low-cost, sophisticated prosthetics to developing nations. “For now, a sophisticated hand prosthesis will remain quite expensive,” says Sebelius. Harshbarger offers a similar perspective. “This is a difficult issue and one that I believe in addressing, though our work has largely been on reducing costs for the highest-capability systems for domestic users,” he says. As with any modern technology, production costs will come down over time, but the most advanced prosthetic devices are expected to remain expensive. As the science of prosthetics matures, discussion about the ethics of using the technology as supplementary enhancement for healthy limbs rather than as replacement for lost limbs no doubt will become more complex. Artificial enhancement, a popular trope in speculative and science fiction, has a long history of advocates and detractors. What’s likely to turn out to be the first experimental augmentation with a healthy human appears to have been accomplished by British scientist Kev-
“Hopefully,” says Fredrik Sebelius, “we will be able to achieve a level of sophistication so the user will not see the prosthesis as a tool, but rather as an extension of the body.”
in Warwick who, in 2002, had a device implanted in his arm to interface electrodes with his median nerve, allowing him to control a robotic arm by moving his own. Advocating such uses of prosthetics research might be called either visionary or far-fetched, but there is no disputing the notion that modern prosthetics breakthroughs have led to new ways of looking at ability versus disability. In 2008, for example, South African double-leg amputee Oscar Pistorius was ruled ineligible for the Summer Olympics because it was thought that his carbon prosthetics gave him a distinct mechanical advantage over runners with ankles. An appeal led to the ruling being overturned on the basis that there wasn’t enough conclusive evidence. And, in the end, Pistorius didn’t make the team. But many credit
the brouhaha as helping to reshape global perceptions about disability. Sebelius, for his part, says he remains convinced that, no matter how sophisticated prosthetics technology becomes, it will be difficult to rival the capabilities of healthy, functioning limbs. “Nature has done a pretty good job,” he says. “An interesting parallel is artificial intelligence: although we have very fast computers, we are not even close to mimicking human intelligence or learning.” Further Reading Carrozza, M.C., Cappiello, G., Micera, S., Edin, B.B., Beccai, L., and Cipriani, C. Design of a cybernetic hand for perception and action, Biological Cybernetics 95, 6, Dec. 2006. Cipriani, C., Antfolk, C., Balkenius, C., Rosén, B., Lundborg, G., Carrozza, M.C., and Sebelius, F. A novel concept for a prosthetic hand with a bidirectional interface, IEEE Transactions on Biomedical Engineering 56, 11, Nov. 2009. Gray, S.H. Artificial Limbs (Innovation in Medicine; 21st Century Skills). Cherry Lake Publishing, North Mankato, MN, 2008. Jia, X., Koenig, M.A., Zhang, X., Zhang, J., Chen, T., and Chen, Z. Residual motor signal in long-term human severed peripheral nerves and feasibility of neural signal controlled artificial limb, Journal of Hand Surgery 32, 5, May–June 2007. Kuniholm, J. Open arms: what prosthetic-arm engineering is learning from open source, crowdsourcing, and the video-game industry, IEEE Spectrum 46, 9, March 2009. Based in Los Angeles, Kirk L. Kroeker is a freelance editor and writer specializing in science and technology. © 2011 ACM 0001-0782/11/04 $10.00
Milestones
Japan Prize and Other Awards The Japan Prize Foundation, National Academy of Engineering (NAE), and U.S. President Obama recently recognized leading computer scientists for their research and leadership. Japan Prize Dennis Ritchie, retired, and Ken Thompson, a distinguished engineer at Google, were awarded the 2011 Japan Prize in 18
communications o f th e ac m
information and communications for developing Unix. Ritchie and Thompson will split the prize’s $600,000 cash award. NAE Members The NAE elected nine members in the field of computer science and engineering. They are: Susan Dumais, Microsoft Research; Daphne Koller, Stanford University; Hank Levy, University of | april 2 0 1 1 | vol . 5 4 | no. 4
Washington; Jitendra Malik, University of California, Berkeley; Nick McKeown, Stanford University; Don Norman, Northwestern University; Ari Requischa, University of Southern California; Fred Schneider, Cornell University; and Mihalis Yannakakis, Columbia University. Jonathan Rose, University of Toronto, was elected as a Foreign Associate.
Presidential Mentoring Award Maja J. Mataric ´, a professor of computer science at the University of Southern California (USC), received the U.S. Presidential Award for Excellence in Science, Mathematics, and Engineering Mentoring in recognition of her work with K–12 students, USC students, and faculty colleagues. —Jack Rosenberger
news Society | doi:10.1145/1924421.1924429
Samuel Greengard
Social Games, Virtual Goods The popularity of virtual goods and currencies in online gaming is changing how people think and act about money.
screensh ots © farmville
C
world of social online gaming and, these days, you’re almost certain to encounter much more than avatars, demons, and battlefields. As growing throngs of players dive into these environments—in games as diverse as FarmVille, Mafia Wars, World of Warcraft, and Second Life—they’re often buying and selling virtual goods using virtual currencies. It’s a multibillion-dollar trend. According to market research firm In-Stat, the demand for virtual goods hit $7.3 billion in 2010, up from $2.1 billion in 2007. Moreover, In-Stat predicts the figure will reach $14 billion by 2014. Another consulting firm, Magid and Associates, reports that the percentage of Web users trading virtual goods will grow from about 13% to 21% through 2011. Virtual goods also accounted for the vast majority of social-gaming revenues last year, with 60% of total revenues, more than lead generation offers (26%) and advertising (14%), according to eMarketer.com. “It’s clear that people are willing to pay real money for all sorts of virtual goods,” explains Vili Lehdonvirta, researcher at the Helsinki Institute for Information Technology and visiting scholar at the University of Tokyo. “As in the physical world, objects take on meaning based on perceived status, scarceness, and emotional value.” In today’s gaming universe, coveted items include homes, weapons, space stations, and, yes, horse manure. Yet behind the façade of glitzy interfaces and alluring virtual objects lies a medium that’s driving changes in the way governments, researchers, legal experts, and the public think about goods—and in the way people behave. Although virtual goods and lic k in to t h e
Due to the growing popularity of games such as FarmVille from Zynga, consumers’ demand for virtual goods reached $7.3 billion last year, according to In-Stat.
virtual currencies aren’t going to overtake their physical counterparts, they are making a mark on society and affecting everything from culture to business to government. “The casual game sector is growing enormously,” observes Edward Castronova, associate professor of telecom-
munications at Indiana University. “It’s not just your teenager in his mom’s basement fragging monsters all night. It’s everyone else. These games fill little holes in the day—while people wait for a bus or phone call.” Virtual economies, he says, are global and real. They’re forcing marketers, tax collectors, and others
april 2 0 1 1 | vol. 5 4 | n o. 4 | c o m m u n ic at i o n s of t he acm
19
news to reevaluate their thinking and adapt to a rapidly shifting landscape. Game On It would be unwise to dismiss social gaming with virtual goods as a fluke or passing fad. Although the buying and selling of objects within games has existed in one form or another since the late 1990s, social gaming has exploded in popularity during the last couple of years. In-Stat industry analyst Vahid Dejwakh says that four main factors have contributed to this new virtual world order: the emergence of gaming on social networking sites such as Facebook; the rise of smartphones as casual gaming platforms; the fact that virtual goods gaming models increasingly allow people to play for free; and the availability of gaming platforms to all segments of society. “Today’s social games are simple to learn, but they are usually complicated to master,” he says. A slew of companies have established a virtual toehold on the social gaming market. Among them are Slide (purchased by Google for $182 million last August), which has tens of millions of users and offers games such as SuperPoke! and FunWall on Facebook; Blizzard Entertainment, which claims more than 12 million players for World of Warcraft; and Zynga, which earns more than $600 million in annual revenue through diversions like FarmVille, Treasure Isle, Mafia Wars,
and Live Poker. In fact, Zynga now has more than 225 million people visiting its sites every month. These social games are ushering in changes that increasingly reverberate through the physical and virtual worlds. Not only is society using the Internet in new ways, but the definition of ownership and what constitutes a possession is evolving. “People feel a sense of material value for objects they own online,” says Lehdonvirta. “As a result, designers are consciously introducing objects to see how they can influence behavior.” Yet, Lehdonvirta and others believe that it’s a mistake to view online and offline worlds as distinctly separate. “Virtual objects are very real, and they’re always intertwined with the physical world. They are entirely material in the sense that they’re stored on magnetic disks and presented to users as photos that hit their eyes,” says Lehdonvirta.
“Designers are consciously introducing objects to see how they can influence behavior,” says Vili Lehdonvirta.
“We see them as things, just as we see physical objects as things.” This fact, along with basic supply-and-demand economics, have spurred players to spend anywhere from several hundred to several thousand U.S. dollars for items such as horse manure in Ultima Online and houses in Second Life. Even seemingly mundane items command attention in these virtual games. For example, in Habbo Hotel it’s possible to purchase any number of trophies that all look identical. However, a virtual trophy that the game publisher awarded to a famous user has many times the value of a trophy with no history. The key to the value is that it’s possible to click on the trophy and view its ownership history. Likewise, in MapleStory, different colored boots are available at various prices. While only players fully understand this system—that certain colored pixels are more valuable than others— everyone agrees to play by a set of rules. In reality, Lehdonvirta says, this is no different from the physical world where paper money and objects have no intrinsic value. “We all simply buy into a system and abide by it,” he notes. Virtual Currency Issues Although thinking and behavior in social games largely mirrors the physical world, this doesn’t mean that social games with virtual goods aren’t changing things—and introducing new issues and challenges. For example, in
In Memoriam
Ken Olsen, DEC President and CEO, 1926–2011 Ken Olsen, who cofounded Digital Equipment Corporation (DEC) and turned it into the second-largest computer company in the world during the 1980s, died on February 6 in Indianapolis, IN. Olsen graduated from the Massachusetts Institute of Technology (MIT) with bachelor’s and master’s degrees in electrical engineering and, after working in MIT’s famed Lincoln Laboratory for seven years, cofounded DEC with fellow Lincoln Lab colleague Harlan Anderson in 1957. At the time, the data-processing 20
communications o f th e ac m
industry was dominated by IBM, which manufactured and sold million-dollar, room-size mainframe computers. In the 1960s, DEC launched the second revolution in computing with its invention of the minicomputer, including the PDP and VAX machines, and prospered. In the 1980s, DEC reached annual sales of $14 billion, with more than 110,000 employees in 97 nations. However, like other minicomputer companies such as Data General and Wang, DEC was upended by the personal computer revolution, and floundered badly. Olsen | april 2 0 1 1 | vol. 5 4 | no. 4
resigned as president and CEO of DEC in 1992, and six years later the company was acquired by Compaq Computer for $9.6 billion. In retrospect, one of Olsen’s greatest and lasting achievements at DEC was the creation of a company culture in which creativity, innovation, and employee empowerment was fostered and highly treasured. And as legions of DEC employees, including Gordon Bell, Jim Gray, and Radia Perlman, joined other software and hardware companies, DEC’s business culture and values were
transmitted throughout the computer industry. Many of Olsen’s obituaries noted that Fortune Magazine proclaimed him “the most successful entrepreneur in the history of American business” in 1986. Near the end of his life, however, Olsen saw himself differently. “Even though I have been an entrepreneur,” he said, “I have always been a scientist first and foremost. Science is more than a study of molecules and calculations; it is the love of knowledge and the continued search for truth.” —Jack Rosenberger
news China, an instant messaging and game publisher, Tencent Inc., introduced a virtual currency called Q coins, which wound up being used in the physical world, as well as in Tencent games and other virtual sites. In some cases, Q coins were reportedly used for gambling and pornography. These virtual currencies—created by game publishers primarily for the purpose of making micro-transactions feasible and controlling the environments within their games—are growing in use and stature. With Q coins being increasingly exchanged for real yuan, the basic money unit in China, Chinese banking officials and government agencies have become concerned about virtual money challenging the renminbi’s legitimacy. “It’s a gray area that governments and regulatory bodies are taking a hard look at,” says Andrew Schneider, president and cofounder of Live Gamers, a company that develops and manages virtual currencies for gaming companies. “The last thing a sovereign government wants is a shadow currency that could have real-world economic implications.” Government officials, along with scholars and legal experts, are attempting to sort through a tangle of issues, including what constitutes revenue, what happens to unclaimed property, whether it’s possible to pass items on to next of kin, and what tax implications exist. “Although a framework for how to deal with these issues already exists in the physical world,” Schneider says, “how they are applied to the virtual world remains somewhat murky.” It’s hardly an abstract point. A number of people playing Second Life have earned hundreds of thousands of U.S. dollars selling real estate or providing services within the game. One German couple, operating as the avatar known as Anshe Chung, became real-world millionaires in 2006 and has since opened a virtual studio with 80 employees to develop virtual environments within games such as Second Life. Others have bought and sold virtual goods on eBay and other sites—sometimes for thousands of dollars. Meanwhile, businesses have sprung up—mostly in China—around developing characters and building environments for those who prefer to pay rather than play. Theodore Seto, professor of law at Loyola Law School, says any virtual
“The last thing a sovereign government wants is a shadow currency that could have real-world economic implications,” notes Andrew Schneider.
transaction is potentially taxable in most countries. “Whenever you receive value in exchange for other value, you have a taxable exchange,” he says. However, like frequent flyer miles, hotel points, and other closed currencies, tax authorities haven’t figured out what to do about virtual earnings. In certain instances, when an individual withdraws funds from a game (some games allow this, some don’t) or pays someone to develop a character, the transaction is typically subject to taxation. These virtual economies are spilling over into the business world. Not only have gaming companies popped up, many of them thriving off virtual economies, but conventional businesses increasingly view environments, such as Gaia Online, Habbo, and Facebook games, as ideal places to market their products and services. Among the brick and mortar companies advertising online are Coca-Cola, Wells Fargo, and Starwood Hotels. What’s more, charity is becoming part of the picture. Zynga’s Sweet Seeds initiative, for example, recently raised more than $1 million for Haitian children. One interesting matter is how virtual goods and virtual currencies will evolve and adapt. Facebook, which boasts more than 500 million users, offers Facebook Credits that span more than 200 games and applications. Other companies are also introducing megacurrencies that span several games. This approach boosts spending and creates greater “stickiness.” A person who becomes bored with one game can sell the virtual goods and take her or his loot to another game. A few companies,
including Tapjoy, Gambit, and TrialPay, have introduced exchanges that enable consumers to convert Amazon Mechanical Turk jobs into virtual currencies. In fact, Lehdonvirta and others believe it’s possible to use virtual goods and money within games to affect social change in areas as diverse as diet, health care, and CO2 emissions. Lehdonvirta and researchers in Japan have already created prototype games that have demonstrated the viability of such a concept. “Virtual objects influence behavior in a similar way to material objects,” he says. “The virtuality of objects is not as important as the meanings that people attach to them.” As for virtual economies, Schneider believes they will have a significant impact in the future. “But it’s important to recognize that they are largely artificial economies that a game publisher controls,” he says. “You can examine all the economic theory you like—some principles apply and others do not—but it’s important to recognize that, at the end of the day, a game publisher has almost total control of its economy.” Further Reading Guo, Y. and Barnes, S. Why people buy virtual items in virtual worlds with real money, ACM SIGMIS Database 38, 4, Nov. 2007. Hamari, J. and Lehdonvirta, V. Game design as marketing: how game mechanics create demand for virtual goods, International Journal of Business Science and Applied Management 5, 1, 2010. Nojima, M. Pricing models and motivations for MMO play, Proceedings of DiGRA 2007, Tokyo, Japan, Sept. 24–28, 2007. Wang, Y. and Mainwaring, S.D. Human-currency interaction: learning from virtual currency use in China, Proceedings of the 26th Annual SIGCHI Conference on Human factors in Computing Systems, Florence, Italy, April 5–10, 2008. Yamabe, T., Lehdonvirta, V., Ito, H., Soma, H., Kimura, H., and Nakajima, T. Applying pervasive technologies to create economic incentives that alter consumer behavior, Proceedings of the 11th International Conference on Ubiquitous Computing, Orlando, FL, Sept. 30–Oct. 3, 2009. Samuel Greengard is an author and journalist based in West Linn, OR. © 2011 ACM 0001-0782/11/04 $10.00
april 2 0 1 1 | vol. 5 4 | n o. 4 | c o m m u n i c at i o n s of t he acm
21
news Academia | doi:10.1145/1924421.1924430
Sarah Underwood
British Computer Scientists Reboot After a year of turmoil, computer scientists at King’s College London have retained their jobs, but substantial challenges lie ahead.
E
the future of computer science at King’s College London (KCL) was in jeopardy. More than 20 academics in the computer science department were at risk of losing their jobs, along with a group of worldrenowned researchers who previously made up the Group of Logic, Language and Computation (GLLC) that spanned the computer science and philosophy departments. The potential devastation resulted from government funding cuts of £1.1 billion in higher education by 2013 and, in the case of KCL’s computer scientists, the poor performance in research rankings of the School of Physical Sciences and Engineering in which they worked. A year later, the outlook is more positive. No compulsory redundancies were made in computer science, and the jobs of those within the former GLLC were not made redundant. In addition, as a result of restructuring relating to performance rather than cost cutting, the university’s computer scientists have moved to the Department of Informatics in the new School of Natural & Mathematical Sciences. Michael Luck, who stepped into the role of head of informatics in January, replacing Andrew Jones when his term ended, is leading the performance challenge with the goal of reaching the top quartile in the Research Assessment Exercise, a U.K. government review of the quality of university research. Jones acknowledges the goal will be difficult. “The hierarchy of the college hopes we will be in the top quartile in the next research assessment exercise [in 2013], but that is challenging as change in the department is substantial in terms of people, integration, and culture,” says Jones. “We can make some improvements in the short term,
Photogra ph by Kat y Ereira
arly last year ,
significant improvements in three years, and hopefully be in the top 10% in five years.” Some of the department’s difficulties are due to last year’s unrest, which saw campus trade unions staging industrial action against compulsory redundancies, global academic shock at the prospect of GLLC members losing their jobs, and media stories criticizing KCL management for, among other things, suggesting brutal job cuts while paying large salaries to top executives. The restructuring program that transformed KCL’s School of Physical Sciences & Engineering, of which computer science was a part, into the School of Natural & Mathematical Sciences, gave the university’s computer scientists a fresh start in the newly formed Department of Informatics. Here, they work alongside academics from the robotics and telecommunications groups that were part of the Division of Engineering, which is being closed as a result of poor performance
in the Research Assessment Exercise. In additional, computational linguistics was saved and remains part of the School of Arts & Humanities. Luck is conscious that while KCL is one of the top 25 universities in the world according to the Quacquarelli Symonds ranking, his department does not match that claim nor compete successfully with University College London and Imperial College. But he is driving change and believes the combination of computer science, robotics, and telecommunications will provide the critical mass needed to compete successfully, and that research-led teaching, coupled with collaboration with other groups across the college, will improve performance. Whether this will be enough to carry Luck and his team up the rankings, however, remains uncertain. Sarah Underwood is a technology writer based in Teddington, U.K. © 2011 ACM 0001-0782/11/04 $10.00
april 2 0 1 1 | vol. 5 4 | n o. 4 | c o m m u n i c at i o n s of t he acm
23
V
viewpoints
doi:10.1145/1924421.1924431
Fred Niederman and Felix B. Tan
Emerging Markets Managing Global IT Teams: Considering Cultural Dynamics Successful global IT team managers combine general distributed team management skills enhanced with cultural sensitivity.
W
globally distributed IT team can offer substantial benefits— sometimes it is the only way to address certain tasks—but it also presents challenges. Many studies have addressed how to work well in these situations: notably, Nunamaker and colleagues present nine principles for virtual teamwork7 (see Table 1). Although this advice is likely to be helpful in general, in our view it insufficiently addresses differences in managing teams involving IT staff from different cultures. o rkin g with a
Using A Cultural Lens Applying a cultural lens to Table 1, we can see several areas where this generally applicable advice needs further refinement to account for variations in cultural orientation.a These adaptations include:
˲˲ Reward structures. In general there is significant variance in what individuals regard as a positive reward. We expect these differences to be even more pronounced when cultural differences are accounted for. For example, in an individualistic culture like the U.S. a corner office may be viewed as a great incentive and reward, whereas prestige or leadership title may be more valued in collectivist cultures as found in Asia. Through wide-ranging studies of IT employees, Couger2,3 and col-
Global IT team dynamics will vary based on the degree to which group membership is voluntary.
a We follow Hofstede’s5 broadly used cultural dimensions throughout this column. Readers are directed to this book for a full treatment of this topic. 24
communi cations o f th e ac m
| april 2 0 1 1 | vol. 5 4 | no. 4
leagues frequently found differences in motivators between workers in different countries. In lower-wage countries, issues with compensation are weighted much more heavily, while issues with challenge and promotion are more heavily valued by IT staff in higher-wage countries. ˲˲ Be more explicit. This admonition may come easily to people of a low-context culture such as Australia where individuals tend to speak bluntly without elaboration. But it may take away some richness from those more comfortable in a high-context culture such as the Middle East. Perhaps, in spite of cultural issues, explicit communication is a necessary ingredient for successful globally distributed IT teams. If so, distributed team members in high context countries may be hard pressed to use the systems in the manner intended. In order to win contracts and successfully complete projects, individuals in such (often developing) countries may simply need to conform to norms that are not preferred. This suggests the need for designers to consider creation of
Illustratio n by M att hew Co oper
viewpoints
alternative tools and practices that incorporate a wider variety of cultural norms. ˲˲ Train to self-facilitate. This may be difficult advice for people from high power distance cultures such as those in Asia that value hierarchy. It may be that global IT team leaders working with large numbers of people from such cultures need to extend their range of controlling strategies to successfully run such distributed teams. A transition period of moving from highly directive to more self-governing teams will be required. Managers also need to be careful in responding to early efforts at self-facilitation. Often self-facilitating groups go in unexpected and even potentially harmful directions. The manager must accept some amount of non-optimal decisions and behaviors but understand how to facilitate the group to grow self-direction but within acceptable bounds. Cultural Skills Having suggested that some general principles for managing globally distributed IT teams must be modified by
a cultural lens, we want to look more broadly at what cultural skills are critical for successful management of such teams where membership crosses national, organizational, or professional borders. Before delving into particular skills, we want to be clear that these virtual teams come in many stripes and flavors with their dynamics varying as a result. They include: IT-focused communities of interest within or across organizations; ad hoc and project IT teams charged with creating particular results on a one-time basis; and ongoing teams with portfolios of projects, as found with offshoring activities ranging from help desks through sophisticated application development. Global IT team dynamics will vary based on the degree to which group membership is voluntary. As participation is more voluntary, the leader will typically have less leverage to be directive. This distinction is clearly drawn contrasting loose communities of interest pertaining to hardware, software, information policy, appli-
cations, or IT industry firms’ interest groups with focused IT development teams internal to a firm. Between these extremes are the many IT project teams that individuals may simultaneously be assigned to that compete for employee attention. Even within a firm, it is often a mistake to assume that an individual “required” to participate in a group will in fact devote full attention to the matter. It is safest to assume that even when participation is required, enthusiasm and proactive contribution are optional and require persuasion, rewards, or both to maximize individual input. Distributed team dynamics will also vary with the degree to which the commitment is long term versus short term. Is there an expectation that the team will develop a single product (or fulfill a particular mission) or continue to function together over time, perhaps monitoring a range of applications or continuing to add features for future releases? If expectation of continuation is high, there is greater rationale for investing in relationships, effective processes, and stronger tools than
april 2 0 1 1 | vol. 5 4 | n o. 4 | c o m m u n i c at i o n s of t he acm
25
viewpoints Table 1. Principles for effective virtual teamwork (from Nunamaker7).
Principles 1
Realign reward structures for virtual teams
2
Find new ways to focus attention on task
3
Design activities that cause people to get to know each other
4
Build a virtual presence
5
Agree on standards and terminology
6
Leverage anonymity when appropriate
7
Be more explicit
8
Train teams to self-facilitate
9
Embed collaboration technology into everyday work
where the only concern is getting the particular task completed. In the short run, this may increase the costs of the particular project, but in the long run it may create the basis for efficient performance of repeated tasks. Team members from long-term orientation cultures will likely be easy to persuade to invest in team-building activities. Those from a short-term orientation culture may see these investments as a waste of effort. Dynamics will also vary with group size and the number of cultures represented. One IT development project familiar to the authors had more than 70 team members from 10 countries
on three continents. It was extremely difficult to find approaches workable for the whole group. This task was particularly challenging as part of the team was co-located and others were distant, the work was highly interdependent, major software tools were proprietary thus available only at the headquarters site, and the European team leaders took a highly “selffacilitate” approach that resulted in much uncertainty, redundancy, and ill feeling. Given that the project was never completed and that bitterness remains among some team members to the present, it is not clear what solution would have fully addressed this situation. However, it is clear that not directly addressing cultural issues was not a successful strategy. In our view there are five essential facets of all team and project activities that carry over and are particularly pronounced in the global IT team setting. These are supervision of tasks, communication, time and agenda, work process, and the environment (see Table 2). The Cross-Cultural Manager It stands to reason that managers of all globally distributed IT teams need basic skills in all five of the Table 2 areas. We believe the cross-cultural IT team manager needs two additional skills. The cross-cultural distributed IT team manager must recognize that
Table 2. Facets of global IT team and project activities.
26
Facet
Description
Task supervision
intelligent division and assignment of tasks, monitoring task completion for completeness and quality, integrating completed tasks into larger team products
Communication supervision
ensuring continued participation, full and complete exchange of information, surfacing and resolution of conflict, conducting synchronous and asynchronous interchanges
Time and agenda supervision
setting up intelligent time frames for tasks, reorganizing critical paths for sequencing of activities, ensuring priority attention to the project relative to other group member obligations
Work process supervision
ensuring an understanding and buy-in for common goals, negotiating common tools and work processes, maintenance of common bodies of knowledge and version control from which all members draw
Environment variation supervision
ensuring that terms are used consistently even where their reference difference varies in different locations (for example, accounting terms that pertain to local taxation or reporting within different countries or regions), ensuring smooth acceptance of varied holidays and labor rules, working with time zones and other physical geography issues
communications o f th e acm
| april 2 0 1 1 | vol . 5 4 | no. 4
there are often many approaches to doing almost anything. Following the systems concept of equifinality, there are many ways to arrive at a destination. The culturally sensitive manager must recognize the existence of these multiple paths. They will accumulate knowledge regarding how to evaluate the different benefits, costs, and risks of different approaches. And they will recognize that an unfamiliar approach may in fact be the best one for a particular circumstance. Thus, there are circumstances where introducing a “foreign” cultural approach may result in desired outcomes. Mexican group facilitators reported success when purposely introducing more formal processes into meeting support systems in order to shift the culture of meetings to greater standardization, focus on outcomes, and retention of discussed ideas.6 In considering multiple paths, however, the manager encounters inevitable trade-offs between standardization and customization. Developing common standards while avoiding significant loss of employee productivity can be challenging. Decoupling issues can often provide a solution to this trade-off. For example, a co-located subgroup may discuss a project in their home language but create software modules conforming to standard formats. The worldwide project management director for a huge global IT company described her firm as setting a project management standard and training all new managers to this standard. At the same time successful “legacy” managers were encouraged to move to the standard, but were given much latitude for nonstandard but effective methods. In the short run there continued to be quite a bit of diversity in approaches, with the expectation that a standard would emerge over time. The cross-cultural distributed IT team manager must develop a broad set of approaches to the five teamwork facets and become skilled at selectively applying these. This means developing varied approaches to monitoring employee work, to setting benchmarks and to measuring progress. An example of this occurred among Irish IT workers at a U.S. multinational who balked at using time cards rather than simply
viewpoints working to finish particular tasks.8 Such a practice was not common among these workers, even though quite natural for U.S. firm management. In the end, a compromise was reached to use the time cards, however, on a flexible schedule so that workers could arrive or leave early or late as long as their times were recorded. It also means taking care regarding the negotiation of a common vocabulary, recognizing and resolving conflicts, soliciting and responding to feedback, and the selection of communication tools. Groups will vary in their preferred communication modes as well as their understanding of messages within and across groups, and across cultures. For instance, Chinese staff frequently prefer to use instant messaging and telephone, while Australians prefer to use email for communicating.4 Tactics for managing cultural communication differences in global IT teams include establishing norms for practices in using communication tools based on face-to-face training and setting up a shared and consistent project-related vocabulary approved by members of each nationality team.1 Sensitivity to decision-making preferences can also help groups stay on track. As a rule of thumb, the global IT team manager will want to be clear about the degree to which he or she solely makes key decisions, consults with team members, then makes decisions, and delegates decision making to team members. Difficulties in implementing decisions and taking actions can come from fuzziness in communicating how such decisions are to be made as much as from the chosen approach itself. In our experience a “demand” for full adherence to the IT team manager’s unilateral decisions can be smoothly (or poorly) communicated even with highly “low power distance” personnel. Retention of full decision making by the manager can be communicated clearly in terms of overall project effectiveness or the employees’ self-interest. Conclusion In sum, the role of the global IT team manager requires general distributed team management skills enhanced with cultural sensitivity. Such a man-
Groups will vary in their preferred communication modes as well as their understanding of messages within and across groups, and across cultures.
ager needs awareness of “equifinality” and a large palette of approaches and methods that can be applied as circumstances vary. This should include sensitivity to varied ways to approaching tasks, communication, time and agenda, and work processes, while remaining aware of and adjusting for environmental concerns. The excellent culturally sensitive manager knows or learns when and how to apply these varied techniques as the particular situation demands. References 1. Casey, V. Developing trust in virtual software development teams. Journal of Theoretical and Applied Electronic Commerce Research 5, 2 (Feb. 2010), 41–58. 2. Couger, J.D. and O’Callaghan, R. Comparing the motivation of Spanish and Finnish computer personnel with those of the United States. European Journal of Information Systems 3, 4 (1994), 285–292. 3. Couger, J.D. et al. Commonalities in motivating environments for programmer/analysts in Austria, Israel, Singapore, and the U.S.A. Information & Management 18, 1 (Jan. 1990), 41–47. 4. Guo, Z, Tan, F.B., Turner, T., and Xu, H. An exploratory investigation into instant messaging perceptions and preferences in two distinct cultures. IEEE Transactions on Professional Communication 51, 4 (Dec. 2008), 1–20. 5. Hofstede, G.H., Hofstede, G-J., and Minkov, M. Cultures and Organizations: Software for the Mind. Third Edition McGraw Hill, New York, NY, 2005. 6. Niederman, F. Facilitating computer-supported meetings: An exploratory comparison of U.S. and Mexican facilitators. Journal of Global Information Management 5, 1 (Jan. 1997), 17–26. 7. Nunamaker, J.F., Jr., Reinig, B.A., and Briggs, R.O. Principles for effective virtual teamwork. Commun. ACM 52, 4 (Apr. 2009), 113–117. 8. Weisinger, J.Y. and Trauth, E.M. Situating culture in the global information sector. Information Technology and People 15, 4 (Apr. 2002), 306. Fred Niederman (
[email protected]) is Shaughnessy Professor of MIS St. Louis University, MO. Felix B. Tan (
[email protected]) is a professor of information systems at Auckland University of Technology in New Zealand.
Calendar of Events April 18–20 International Conference on Multimedia Retrieval, Trento, Italy, Sponsored: SIGMM, Contact: Francesco G.B. De Natale, Email:
[email protected] April 20–22 International Conference on Fundamentals of Software Engineering, Tehran, Iran, Contact: Sirjani Marjan, Email:
[email protected] May 1–4 International Symposium on Networks-on-Chips Pittsburgh, PA, Sponsored: SIGDA, SIGARCH, SIGBED, Contact: Diana Marculescu, Email:
[email protected] May 2–6 The Tenth International Conference on Autonomous Agents and Multiagent Systems, Taipei, Taiwan, Contact: Liz Sonenberg, Email: 1.sonenberg@unimelb. edu.au May 3–6 16th International Conference on Animation, Effects, Games and Interactive Media, Stuttgart, Germany, Contact: Thomas Haegele, Email: thomas.haegele@ filmakademie.de May 6–9 International Conference on Computer Supported Education Noordwijkerhout, Netherlands, Contact: Monica Saramago, Email:
[email protected] May 7–8 International Conference on Cloud Computing and Services Science, Noordwijkerhout, Netherlands, Contact: Monica Saramago, Email:
[email protected] May 7–9 International Conference on Cloud Computing and Services Science, Noordwijkerhout, Netherlands, Contact: Monica Saramago, Email:
[email protected]
Copyright held by author.
april 2 0 1 1 | vol. 5 4 | n o. 4 | c o m m u n ic at i o n s of t he acm
27
V
viewpoints
doi:10.1145/1924421.1924432
Nathan Ensmenger
Historical Reflections Building Castles in the Air Reflections on recruiting and training programmers during the early period of computing.
28
communi cations o f th e ac m
to attract the attention of the writers of the “Talk of the Town” column in the New Yorker magazine. Struck by the seeming incongruity between the appeal to both musicians and mathematicians, the “Talk of the Towners” themselves “made bold to apply” to the IBM manager in charge of programmer recruitment. “Not that we wanted a
programming job, we told him; we just wondered if anyone else did.”3 The IBM manager they spoke to was Robert W. Bemer, a “fast-talking, sandy-haired man of about 35,” who, by virtue of his eight years experience was already considered, in the fast-paced world of electronic computing, “an old man with a long beard.” It was from
IBM computer programmer recruitment advertisement circa 1956.
| april 2 0 1 1 | vol. 5 4 | no. 4
Historical document courtesy of IBM
I
months of 1956, an intriguing series of advertisements appeared in the New York Times, the Los Angeles Times, and Scientific American. Although the ads had been placed by the IBM Corporation, they were not selling any IBM products or services; rather, they were soliciting assistance from the public. Recent advances in electronic computing had created an unprecedented demand for computer programmers, the advertising copy declared, and in bold letters inquired of its readers, “Are you the man to command the electronic giants?” Because computer programming was a “new and dynamic” field, the advertisement argued, “there are no rigid qualifications.” Those individuals who enjoyed chess, music, or mathematics might consider themselves particularly suitable candidates, but the only real requirements were that you had an “orderly mind” and a “lively imagination.”5 These 1956 advertisements were not, of course, the first appeal for computer programmer talent launched by the IBM Corporation, or for that matter, by a whole host of computer manufacturers and users. Already by the late 1950s the growing demand for computer programmers was emerging as critical problem for the industry. But what makes this particular recruitment campaign significant was its emphasis on creativity and imagination. In fact, the ads were so intriguing as n t h e early
viewpoints Bemer that they learned of the 15,000 existing computer programmers. An experienced programmer himself, Bemer nevertheless confessed astonishment at the unforeseen explosion into being of a programming profession, which even to him seemed to have “happened overnight.” And for the immediate future, at least, it appeared inevitable that the demand for programmers would only increase. With obvious enthusiasm, Bemer described a near future in which computers were much more than just scientific instruments, where “every major city in the country will have its community computer,” and where citizens and businessmen of all sorts “grocers, doctors, lawyers” would “all throw problems to the computer and will all have their problems solved.” The key to achieving such a vision, of course, was the availability of diverse and wellwritten computer programs. And therein lay the rub for recruiters like Bemer: in response to the calls for computer programmers he had circulated in high-profile national newspapers and journals, he had received exactly seven replies. That IBM considered this an excellent return on its investment highlights the peculiar nature of the emerging programming profession. Of the seven respondents to IBM’s advertisements, five were experienced programmers lured away from competitors. This kind of poaching occurred regularly in the computer industry, and although this was no doubt a good thing from the point of view of these well-paid and highly mobile employees, it only exacerbated the recruitment and retention challenges faced by their employers. The other two respondents were new trainees, only one of whom proved suitable in the long-term. The first was a chess player who was really “interested only in playing chess,” and IBM soon “let him go back to his board.” The second “knew almost nothing about computing,” but allegedly had an IQ of 172, and, according to Bemer “he had the kind of mind we like...[he] taught himself to play the piano when he was 10, working on the assumption that the note F was E. Claims he played that way for years. God knows what his neighbors went through, but you can see that it shows a nice independent talent for the systematic translation of values.”
In the period when IBM was still referring to computers as “electronic giants,” the problem of training and recruiting programmers was particularly apparent.
Eventually the advertising campaign and subsequent New Yorker coverage did net IBM additional promising programmer trainees, including an Oxfordtrained crystallographer, an English Ph.D. candidate from Columbia University, an ex-fashion model, a “protohippie” and, of course, numerous chess players, including Arthur Bisquier, the U.S. Open Chess champion, Alex Bernstein, a U.S. Collegiate champion, and Sid Noble, the self-proclaimed “chess champion of the French Riviera.”4 The only characteristics these aspiring programmers appeared to have in common were their top scores on a series of standard puzzle-based aptitude tests, the ability to impress Bob Bemer as being “clever,” and, of course, the selfconfidence to respond to vague but intriguing help-wanted ads. Identifying Candidates The haphazard manner in which IBM recruited its own top programmers, and the diverse character and backgrounds of these recruits, reveals much about the state of computer programming at the end of its first decade of existence. A quick tour through the industry literature from the late 1950s and early 1960s reveals numerous similar attempts to identify those unique individuals with the “right stuff” to become programmers. A 1962 advertisement from the RCA Systems Division, for example, appealed to those individuals with “a mind deep enough for Kant, broad enough for science fiction, and sufficiently precise to enjoy the esoteric language of the computer.”6 A tall order, indeed, particularly in a period in which very
few people even knew what a computer language was, much less whether or not they would enjoy working with one. The difficulty that firms like IBM and RCA had in defining what skills, knowledge, or training were essential to computer programming reflects both a specific moment in the history of electronic computing—the birth pangs of an industry that was only just coming into being—but also a more general truth about the distinctive character of computer programming as a discipline. From its origins, and to this day, computer programming has been regarded alternatively, and often simultaneously, as both a “black art” and an industrial discipline. Its practitioners run the gamut from Ph.D. computer scientists to self-taught 16-yearold “hackers.” As an intellectual and occupational activity, computer programming represents a peculiar amalgam of art, science, and engineering. In the period when IBM was still referring to computers as “electronic giants,” of course, the problem of training and recruiting programmers was particularly apparent. If we take Bob Bemer’s reckoning as accurate, each new computer installation would require a support staff of at least 30 programmers. Since almost all computer programs in this period were effectively custom-developed—the packaged software industry would only begin to emerge in the late 1960s—every purchase of a computer required the corresponding hire of new programming personnel. Even if we were to halve Bemer’s estimates, the predicted industry demand for computer programmers in 1960 would top 80,000. And where were all of these new programmers to come from? There were, in this period, no established programs in computer science; what formal academic training that did exist for computer programmers generally occurred within mathematics or electrical engineering departments (or was provided by the computer manufacturers). For the most part, programmers in this period learned on the job, developing their own tools, and inventing their own idiosyncratic techniques and solutions. As John Backus would later describe the situation, programming in the 1950s was “a black art,” “a private arcane matter,” in which “general pro-
april 2 0 1 1 | vol. 5 4 | n o. 4 | c o m m u n ic at i o n s of t he acm
29
viewpoints
ht
tp :// w
w
w. a
cm .o rg /s ub s
cr
ib
e
ACM’s interactions magazine explores critical relationships between experiences, people, and technology, showcasing emerging innovations and industry leaders from around the world across important applications of design thinking and the broadening field of the interaction design. Our readers represent a growing community of practice that is of increasing and vital global importance.
30
communi cations o f th e acm
gramming principles were largely nonexistent [and] each problem required a unique beginning at square one.”1 It was not clear to anyone in this period what skills, training, or expertise were required to make a good programmer. It is no wonder, then, that corporate employers like IBM had difficulty identifying potential programmer trainees, and had to rely instead on vague notions of “orderliness” and “imagination.” What emerges over the course of the next decade were a set of mechanisms for assuring an adequate supply of well-trained, experienced computer programmers. These included the establishment of formal programs in computer science, the establishment of professional societies, the publication of journals (it is no coincidence, for example, that Communications of the ACM first appeared in 1958). New technologies were developed that made computer programming less idiosyncratic, and new languages were introduced to help rationalize the process of programming (the first FORTRAN compiler was delivered in 1957, and the specification for COBOL was written in 1959). By the late 1960s computing was its own science, and proposals for incorporation of “software engineering” techniques were being developed. Employers such as IBM and RCA no longer had to look to chess players, musicians, and philosophers to staff their programming projects. Nevertheless, the sense that computer programming was, at least in part, a distinctively creative activity, and that the very best programmers were gifted with a unique and idiosyncratic ability, remained a defining feature of the discipline. Throughout the 1960s and 1970s many corporate recruiters continued to rely heavily on the use of aptitude tests and personality tests aimed at identifying the “twinkle in the eye,” the “indefinable enthusiasm,” that marked those individuals possessed by “the programming bug that meant…we’re going to take a chance on him despite his background.”7 The IBM Programmer Aptitude Test (PAT), which had been developed by personnel psychologists in 1955 to help identify promising “programmer types,” continued to be used (albeit modified) well into the 1970s.
| april 2 0 1 1 | vol . 5 4 | no. 4
It has been estimated that as many as 80% of the employers working in the field in that period had taken some form of the IBM PAT. Conclusion The curious persistence, well beyond the early, immature decades of electronic computing, of the notion that good computer programmers are “born, not made” can perhaps be interpreted as a sign of an ongoing lack of intellectual or professional rigor among programming personnel. But it is more likely, and more justified by the historical evidence, that the creative tension between art and science in computer programming has been one of the keys to the enormous productivity of the programming community, and to innovation in the software industry. In The Mythical Man-Month, for example, Frederick Brooks famously suggested that “the programmer, like the poet, works only slightly removed from pure-thought stuff. He builds his castles in the air, from air, creating by exertion of the imagination. Few media of creation are so flexible, so easy to polish and rework, so readily capable of realizing grand conceptual structures.”2 Far from criticizing the notion that programmers required a “lively imagination,” Brooks was embracing the fact. Like the IBM recruiters of an earlier era, he was identifying in computer programming those intangible qualities that made it one of the most novel and intriguing of the new technical disciplines to emerge during the great computer revolution of the mid20th century. References 1. Backus, J. Programming in America in the 1950s: Some personal impressions. In A History of Computing in the Twentieth Century: A Collection of Essays. N. Metropolis, J. Howlett, and G-C. Rota, Eds., Academic Press, NY, 1980, 125–135. 2. Brooks, F.P. The Mythical Man-Month: Essays on Software Engineering. Addison-Wesley, NY, 1975, 20. 3. Gill, B. and Logan, A. Talk of the Town. New Yorker 5 (Jan. 1957), 18–19. 4. Halpern, M.I. Memoirs (Part 1). IEEE Annals of the History of Computing 13, 1 (1991), 101–111. 5. IBM Corporation. Are you the man to command electronic giants? New York Times (May 13, 1956). 6. RCA Corporation. Deep enough for Kant. 1962. 7. The Computer Personnel Research Group. Datamation 9, 1 (1963), 130. Nathan Ensmenger (
[email protected]) is an assistant professor at the University of Pennsylvania and the author of The Computer Boys Take Over: Computers, Programmers, and the Politics of Technical Expertise (MIT Press, 2010). Copyright held by author.
V
viewpoints
doi:10.1145/1924421.1924433
Michael A. Cusumano
Technology Strategy and Management Platform Wars Come to Social Media The world can absorb more social media sites, but how many?
O
half-century, we have seen a dramatic evolution in computing and communications platforms. These have changed our lives first in the research lab and the corporate office, then in the home, and now everywhere. In the 1960s and 1970s, we saw the emergence of timesharing systems over mainframe computers and their private networks. In the 1980s, we saw the birth of networked personal computers and workstations as well as advances in homeuse recording and playback devices for multimedia. In the 1990s, we had an explosion of activity on the Internet with the Web as a graphical user interface. In the 2000s, we have seen enormous activity—email, texting, Web searching, sharing of photos and digital music files, location-based services, and many other applications—on laptops and mobile devices, especially smartphones. And, increasingly over the past decade, but especially over the past several years, we have seen the emergence of social networking systems ranging from MySpace to Facebook, LinkedIn, Twitter, and Foursquare, as well as Groupon (which recently turned down a $6.5 billion purchase offer from Google). Social media networks are, in different degrees, new kinds of platforms that facilitate communication and offer new systems for texting and sending email
Illustratio n by J oh n hersey
ver t h e past
as well as sharing files. They enable computing through access to different applications and databases. What is less well understood is how these platforms compete for user attention and revenue with other computing and communications platforms, such as for personal computing (Microsoft Windows and the Apple Macintosh), general mobile communications (RIM Blackberry, Nokia Symbian, Google Android, Microsoft Windows, and others), tra-
ditional email (Microsoft Outlook and Google Gmail), Internet search (Google, Firefox, Microsoft Bing), and file sharing (Flickr, among others). There truly is a new war going on, with lots of players and no clear path to victory. How to win or preserve territory requires the same kind of thinking we saw in previous platform battles. (See my January 2010 Communications column “The Evolution of Platform Thinking.”) But, when it comes to social networking, we should keep two thoughts in mind. One is for all the would-be billionaires who are now trying to create their own social media empires. When we look at Facebook, or even Groupon, we are looking at the survivors of a competition that has already gone on for years, since the late 1990s. Like startup companies in other domains, most disappear completely or fall into obscurity. The world can absorb more social media networks, but how many? In addition, nearly all rely on advertising (including searches of their privately controlled or “walled” content) for revenue. Advertisers want to see volume and results. Because of the power of network effects and positive feedback, a relatively small number of sites will probably draw most of the user traffic and advertising dollars. So a large number of social networking businesses cannot survive with a traditional advertising business model, unless they are non-profit or no-profit.
april 2 0 1 1 | vol. 5 4 | n o. 4 | c o m m u n ic at i o n s of t he acm
31
viewpoints Of greater concern from the business point of view is that advertisers on social media sites such as Facebook get poor responses—about one-fiftieth the click-through rates compared to what Google sees for its sponsored ads along with Internet searches.a It seems that people who want to purchase goods and services usually begin their search on Google or another search engine, and so ads that appear on the screen with search results are targeted and often highly relevant. Ads that appear on social media sites are also related to the content the user is seeing or creating, but these users seem much more interested in communicating with their friends or sharing information than making a purchase. Social network users may even see ads as more of a nuisance and an intrusion into their privacy, rather than as helpful leads to help them in shopping. Groupon, founded in 2006 and the fastest company to go from zero to $1 billion in revenues, has a different business model, and this has made it unusually successful. The company solicits businesses such as restaurants, retail shops, or local services such as hair and nail salons to offer discounts, one a day in specific markets. The strategy, in effect, is to apply online social networking technology to the coupon business, with some special characteristics. The discounts take effect only if a certain number of people come to the store to purchase the good or the service. Groupon then takes a percentage of the revenue. So customers are encouraged to tell their friends about the deal, lest no one get any discount. Unfortunately for Groupon, the strategy is easy to copy. (LivingSocial and some 500 sites around the world also do this.) And too many customers can swamp a small unprepared vendor. Nonetheless, Groupon has a head start, a proven business model, and a lot of “mind share” that continues to attract and keep users and clients. But it still needs to build a defensible platform and a broad ecosystem to help maintain this leadership position. The second thought I have expressed before: In a platform battle,
the best platform, not the best product (or service), wins. We have seen this with VHS over Betamax, MS-DOS and Windows PCs over the Macintosh, the Intel x86 microprocessor over noncompatible chips, and even Mattel’s Barbie doll over newcomers such as the Bratz and Moxie Girlz families of dolls from MGA Entertainment. Accordingly, we know something about what it takes to win and whether a stalemate is likely to occur. The latter seems to be the case in video-game platforms with Sony (PlayStation), Microsoft (Xbox), and Nintendo (Wii). Google is also challenging the “open, but not open” or “closed, but not closed” strategies of nearly all computing and communications platforms, ranging from desktop and mobile operating systems to social networks and Internet TV and video. Successful Platform Attributes Some attributes of winning platforms relate to technology strategy—how open versus how closed the interfaces to the platform are for application developers or users who want to share or import information across different systems. Another factor is the degree of modularity and potential richness in the platform. Platform companies must make it relatively easy for outside firms or individuals to add functionality and content as well as to create compelling new products or services that utilize features of the platform. Another key attribute depends on the interaction of platform openness, modularity, feature richness, and strategic efforts to increase the size and vibrancy of the applications ecosystem. A strong ecosystem is usually essential to generate powerful “network effects” between the platform and the complementary
To win in a platform war, it helps if a platform generates strong network effects.
a T. Eisenmann et al. “Facebook’s Platforms,” Harvard Business School Case #9-808-128, 2009, p. 1. 32
communi cations o f th e acm
| april 2 0 1 1 | vol . 5 4 | no. 4
products and services that geometrically increase the platform’s value to users as more complements (and more users) appear. The platform must also accommodate “sticky” content and user behaviors that make it possible to search and attract advertisers. Let’s take Facebook as an example. Mark Zuckerberg, Time magazine’s 2010 Person of the Year, founded the company in 2004 with some friends while he was an undergraduate at Harvard University. At the end of 2010, Facebook had some 1,700 employees, probably $1 billion in annual revenues, and a market value of some $40 billion (of which Microsoft owns 1.6%).b It had more than 500,000 applications in its ecosystem. Most importantly, in December 2010 Facebook had around 600 million users—with 70% located outside the U.S. The aggregate number was up from 100 million in August 2008. This means Facebook has been acquiring new users at the rate of more than 200 million per year. It is not inconceivable that, in just a few more years, nearly everyone in the world with a personal computer and an Internet connection or a smartphone will have a Facebook account. How likely is this to happen? How much money is this likely to be worth to Facebook investors? We can examine aspects of these questions and see how the future might play out by using the Winner-Take-All-orMost (WTAoM) framework.c First, to win in a platform war, it helps if a platform generates strong network effects. Facebook has these, in spades. There are very strong “indirect” network effects in the sense that, the more your friends join the network, the more peer pressure there is on you to join. Some people (especially younger people but not only them) spend a good part of their day chatting and exchanging photos and video clips with their friends, and friends of their friends— b Some of the data on Facebook is from the Harvard case cited previously, the Facebook Web site (http://www.facebook.com), and the Wikipedia entry on “Facebook.” c See T. Eisenmann, G. Parker, and M. Van Alstyne, “Strategies for Two-Sided Markets,” Harvard Business Review 84, 10 (2006), 92–101; and M. Cusumano, Staying Power: Six Enduring Principles for Managing Strategy and Innovation in an Uncertain World (Oxford University Press, 2010), pp. 22–67.
viewpoints on Facebook. The peer pressure and the system keeps you in the network and encourages you (by automating the suggestion process!) to bring more of your friends into the network. There are also very strong “direct” network effects in that applications built to run on the Facebook operating system and that access special features and database content within the platform only run within Facebook. The technical platform elements here consist of the Facebook Connect APIs and proprietary query and mark-up languages for applications developers. Google is challenging Facebook by promoting its own more “open” application development standards for social networking sites, the OpenSocial platform, and trying to bring other sites together in a coalition that puts pressure on Facebook to join and open up. Be that as it may for the moment, at present, Facebook is strong on this first measure of generating its own network effects. Second, a winning platform must minimize the opportunities for competitors to fragment the market through exploiting differentiation strategies or segmentation niches. Here is where Facebook has a big challenge. There are thousands of social media sites that do different things and some are quite large in their user bases. They bring together people interested in art and music (MySpace), instant communications (Twitter), selling product and service discounts (Groupon, LivingSocial), locating their friends in real time (Foursquare), or connecting with other professionals (LinkedIn, Plaxo). These sites are based in the U.S.; other countries, such as China, Korea, Japan, India, and Brazil have some domestic-only social networking sites that are very large as well and quickly growing in popularity. It should be noted that Facebook is banned in several countries, including China, Iran, and Pakistan. So Facebook will never get 100% of the global market like Japan Victor did with VHS. It will not even get an Intel-like 85% of the market as long as there are specialized sites and geographic or political barriers to world domination. Third, a winning platform must make it difficult or costly for users or ecosystem partners—mainly developers of complementary application
The conditions are not quite present for a VHS-like or a Wintel-like domination of social networking by one company.
products and services, and advertisers—to use more than one platform, that is, to have more than one “home.” This “multi-homing” also fragments the market for “eyeball” attention and advertising dollars. It is similar in some ways but different from the notion of “switching costs.” On Facebook, for example, switching costs for users are quite high. This is good for platform leadership. Facebook users are not likely to switch, suddenly, to another social networking site for their daily communications and move years of accumulated content—unless all of their friends simultaneously do so as well. At the same time, however, Facebook users tend to have more than one social networking account and spend lots of time on these other sites, for different purposes. They may use LinkedIn to look for professional contacts or Twitter to follow the news and particular events or people. They may go to Groupon or LivingSocial every day to look for bargains. They may follow their favorite musical band or get movie reviews on MySpace or another site. They go to Foursquare to find out where their friends are and get recommendations for restaurants or stores— now. In the future, it is possible that Facebook will offer these kinds of features and services itself—which would be a strategy to reduce differentiation and niche opportunities for competitors. But, at present, there is nothing in the Facebook platform that technically makes multi-homing difficult for individual users. On the other hand, application developers who invest in learning the Facebook APIs and programming languages will find it time
consuming to rewrite their apps for more than one platform. Google is also putting a lot of pressure on Facebook to use more standard technologies. So there is some difficulty with multihoming for application developers but it is not clear how important this is or how long Facebook can hold out. Conclusion In short, the conditions are not quite present for a VHS-like or a Wintel-like domination of social networking by one company. There are still opportunities for competitors to differentiate themselves and room for users and application developers to spread themselves around. The powerful network effects do suggest that large numbers of people will continue to sign up and use Facebook. But advertising is not so effective and nowhere near as profitable as packaged software (Microsoft) or fully automated digital services (Google), at least not yet. Nonetheless, Facebook and several other social networking sites have staying power and will continue to fight among themselves and with Google as well as Microsoft, Yahoo, and others for user eyeballs and their fair share of Internet advertising. My guess is that Facebook will end up like Google—with 65% or so of the global market, a winner-take-most scenario. But Facebook as a company needs to explore better ways to make money as well as develop the capabilities to quickly add features and services that mimic Twitter, LinkedIn, MySpace, Foursquare, and Groupon. It may want to enter directly into commercial activities and try drawing user attention and advertising revenue from the two other most popular Internet sites, Amazon and eBay, or from the online classified ad specialist, Craigslist. The possibilities are nearly endless for a platform with 600 million users, and counting. But nothing is guaranteed. Facebook will still have to fight off competition from the social networking innovations that no one has thought of yet. Michael A. Cusumano (
[email protected]) is a professor at the MIT Sloan School of Management and School of Engineering and author of Staying Power: Six Enduring Principles for Managing Strategy and Innovation in an Unpredictable World (Oxford University Press, 2010). Copyright held by author.
april 2 0 1 1 | vol. 5 4 | n o. 4 | c o m m u n ic at i o n s of t he acm
33
V
viewpoints
doi:10.1145/1924421.1924434
George V. Neville-Neil
Kode Vicious Coder’s Block What does it take to clear the blockage? Dear KV, I am a manager of a small development group and one of my programmers seems to spend his whole day staring at his screen but not actually writing code. He then spends weekends and nights in the office and eventually checks in code that actually works, but whenever I ask him why he is just staring during the day, he replies, “Coder’s block,” and then continues to stare. It’s kind of creepy. Is there any such thing as coder’s block?
Dear NBC, Programming is a creative endeavor, and therefore the short answer is yes, there is such a thing as coder’s block. Not only is there such a thing, but there are also various types and sources of coder’s block, some of which I will cover here. If any of these apply to your programmer, you’ll be able to help clear the blockage, since that is what managers are supposed to do. Perhaps the easiest source of coder’s block to see and understand is distrac-
One of the best ways to overcome coder’s block is to look at another unrelated piece of code.
34
communi cations o f th e ac m
tion. Any modern office environment is a hotbed of distractions, including ringing phones, talking coworkers, people who come by your desk to ask questions (many of which they could answer themselves by reading documentation), meetings, and, of course, well-meaning managers who drop by to ask, “How’s it going?” All but the most trivial coding tasks require quiet and concentration, and if programmers do not get those, then they are not going to be able to build up the intellectual head of steam they need to solve complex problems. Time free from distraction also has to be sufficient to the task. Ask your programmer how long he gets to sit and think between inter-
| april 2 0 1 1 | vol. 5 4 | no. 4
ruptions, and you will probably find it is less than one hour. While some programs can be designed and finished in an hour, they are few and far between. Giving someone who works for you a few distraction-free hours per day is one way to help prevent coder’s block. One piece of advice for all you programmers out there is to touch code first thing when you come into work. Do not read email, open your instant messaging client, or do anything else that can suck you into distractions that are not coding. When you sit down to work, sit down to work. Email, instant messaging, and social networking are anathema to concentrating on hard problems.
Photogra ph by Sergei Chum akov/ sh ut terstock
Not Blocked but Confused
viewpoints Sometimes coder’s block is brought on by programmers taking on more work than they can handle. Either the problems are too complex, or they just do not know where to start, or both. I don’t know if you have noticed most coders, but they tend to have a large, personal store of hubris that often gets them into trouble, both at and outside of work. If a programmer is staring at a problem for hours on end and making no progress, you can ask if he or she has tried to break down the problem, and perhaps have the person show you how he or she is breaking it down. Do not do this in a nagging way that distracts the programmer (see above). One of the main ways to annoy a coder is to constantly ask, “How’s it going?” You are allowed to ask this once per day, and not when the person looks like he or she is actually concentrating. You might want to ask this question at lunch. Another type of coder’s block comes from fear of failure. Most people want to do a good job and be recognized for the good work they do. This is especially true of people who do knowledge work, such as programmers, because the only measures of quality are how cleverly and cleanly they have built something. Since software is nearly infinitely malleable, many coders get stuck trying to come up with the absolute neatest, cleverest way to implement something. The need for perfection leads to many false starts, writing a page of code and then doing what writers used to do when they had a typewriter writer’s block: they crumple up the one page they have finally written and throw it in the trash. When people used paper to write, it was easy to see that they had writer’s block, because their trash cans would be overflowing with crumpled pieces of paper. If you picked up a few of these, you would see that none contained a full page of text; the writers probably got only halfway down a page before they tore it out of the typewriter and threw it away. In our modern world you cannot actually see the false starts, because no one in his or her right mind would check them in. If you find a programmer making a lot of false starts at a piece of code, give that person something else to work on. One of the best ways to overcome coder’s block is to look at anoth-
Sometimes when people build systems they expect are going to be extended, they place extra space into a structure for future expansion.
er unrelated piece of code. You want the coder’s mind to remain in a coding mode but to relax its grip on the problem that is causing the angst. Finally, sometimes coder’s block comes from some sort of emotional problem, usually outside the job, although job stress can also lead to emotional imbalances. I find that jobs, in general, lead to emotional imbalance, as does waking up in the morning, commuting, and talking to stupid people. Sometimes you have to just tell someone to take some time off, so that his or her mind will clear and the coder’s block will lift. Of course, if you find a programmer is just staring at the screen, you might want to take a surreptitious glance at what he or she has written. If it is anything like, “All work and no play make Jack a dull coder,” then it is time to hide the fire axes and call for an intervention. KV
that you may yet be able to pull this off, but it depends on the structure you are working with. Sometimes when people build systems they expect are going to be extended, they place extra space into a structure for future expansion. Of course I say “sometimes” because although all computer science programs sing paeans to extensibility, they often leave out the practical part of just how one makes code extensible. If you are lucky, then the structure you need to modify has some extra space, usually labeled as padding, which you can use for your new purposes. All you would need to do is subtract some of the padding and add new elements to the structure that are equal in size to the padding you have removed. Make sure to add the new parts of the structure after the preexisting parts, because if you do not, then your older binaries will be reading your new type of data where they expect their old data to be. That might lead to a crash, which is the good case because you will be able to find the bug; but it might simply lead to getting subtly incorrect results, and those types of bugs are incredibly annoying to search out. And make sure you comment your change and be very clear about the change in your commit message. Messing with in-memory structures in this way can lead to hardto-find bugs, as I mentioned, and the first thing someone who comes across this type of bug will want to do is to try the code without your change, to see if it is the source of the problem. KV
Dear KV, I am working on some very old C code that needs to be extended, but there is an annoying complication. This code manipulates an in-memory structure that other programs have to read, but there is a requirement that older binaries still be able to work with the new code. I don’t think this is possible, since once I change the structure the older binaries are definitely going to fail. Is there a way around this? Backwardly Structured
Dear Backward, There is a small, but nonzero, chance
Related articles on queue.acm.org IM, Not IP (Information Pollution) Jakob Nielsen http://queue.acm.org/detail.cfm?id=966731 Coding Smart: People vs. Tools Donn M. Seeley http://queue.acm.org/detail.cfm?id=945135 A Conversation with Joel Spolsky http://queue.acm.org/detail.cfm?id=1281887 George V. Neville-Neil (
[email protected]) is the proprietor of Neville-Neil Consulting and a member of the ACM Queue editorial board. He works on networking and operating systems code for fun and profit, teaches courses on various programming-related subjects, and encourages your comments, quips, and code snips pertaining to his Communications column. Copyright held by author.
april 2 0 1 1 | vol. 5 4 | n o. 4 | c o m m u n ic at i o n s of t he acm
35
V
viewpoints
doi:10.1145/1924421.1924435
José Luis Gómez-Barroso and Claudio Feijóo
Viewpoint Asymmetries and Shortages of the Network Neutrality Principle What could neutrality achieve?
36
communi cations o f th e ac m
comments have appeared in favor of or against network neutrality but have always shared one issue: the opinions provided continue to limit the fight to the scope of traditional networks (telephone or cable). Take as the most prominent example the two articles recently published by Communications on the topic. Van Schewick and Farber’s Point/Counterpoint explicitly played on the landline carriers’ court.6 In his less prescriptive, more descriptive (regulatory) Viewpoint, Larouche uses a broader term (ISPs) but takes a similar approach.3
| april 2 0 1 1 | vol. 5 4 | no. 4
It is not our intention to provide new arguments underlining the virtues of the supporting or opposing positions. The matter we would like to stress is the narrow-mindedness of the approach that is adopted repeatedly. It must be remembered that the defenders of network neutrality base their arguments on the need to avoid closing the door to any innovation: the Internet would simply be a platform necessary for the competition between application developers (see, for example, Weinstein7). With this idea in mind, there are three axes
Illustratio n by XYZ/shut terstock
T
h e d ebate on network neutrality has reached sufficient notoriety to eliminate the need for detailed explanation. A simple definition will suffice: “network neutrality” is understood as the principle by which the owners of broadband networks would not be allowed to establish any type of discrimination or preference over the traffic transmitted through them. What is indeed interesting to remember is the origin of the debate. In February 2002, the U.S. the Federal Communications Commission (FCC), launched a proposal considering DSL connections as an “information service” and, thus, not subject to access requirements telephone companies must fulfill. The following month, the FCC launched a similar draft for cable networks and requested comments on what the regulatory regime that would finally prevail should look like. The comments received by the FCC, particularly those of the High Tech Broadband Coalition (HTBC) Group, which integrates different associations and partnerships of the ICT industry, represent the starting point of the subsequent controversy.9 We recall the beginning of the controversy because, since then and perhaps as an inheritance of this beginning, it has been restricted to fixed broadband networks. Indeed, academic papers, political opinions and media
viewpoints toward which the debate could, or should, extend. First, it is interesting to recall the opinion of Zittrain, for whom “Internet is better conceptualized as a generative grid that includes both PCs and networks rather than as an open network indifferent to the configuration of its endpoints.”10 This way, what occurs in the pipes would be only a part of the problem and what he defines as “PC openness” could be set out. In fact, this is the path where the more theoretical journey has taken place, thanks basically to the work of open source software supporters. However, it is surprising that connections with network neutrality have not been explored in greater depth. And this is not just about PCs. As other electronic devices (consider mobile handhelds, and more specifically, smartphones) win the favor of consumers, so does their role as intermediary (always neutral?) between applications or content creators and possible clients. Second, quite outstanding is the fact that, until the FCC’s Notice of Proposed Rulemaking, released Oct. 22, 2009, and with a few significant exceptions (such as Frieden1, Wu8), no one had considered wireless connections in their analyses. It can be said that, although not specifically stated, a certain degree of consensus for “naturally” extending the current arguments to Wi-Fi or WiMax networks has probably existed. But what about mobile operators? If there is a service where network neutrality has been really breached, it is the data access service provided by mobile operators. The extreme discrimination model (“walled garden”) is still used by operators throughout the world. Walled gardens have been accepted as another of the possible business strategies of companies operating in the sector, and almost no one has torn their hair out. Without going to the extreme represented by walled gardens, are there any circumstances, as the FCC asks, in which it could be reasonable for a wireless network to block video, VoIP, or peer-to-peer applications? Last, the focus must be targeted toward some of the applications that have become the de facto door for accessing the Internet and that can be considered as instrumental as the
What is clear is that in a convergent scenario the coexistence of different regulatory criteria does not seem fair.
network itself. It must be considered that for every business project where it is vital that the network owner does not discriminate the traffic generated by the application, there are probably another 100 whose concern is limited to the order in which they appear in a specific portal or search engine. And here one can find a clear asymmetry between what is supported “downward” and what actually happens “upstairs.” Naturally, search engines, portals, and aggregators of diverse content or directories quickly joined the defenders of network neutrality…a fact that does not prevent them from prioritizing certain customers: those with which they have a “special relationship” or, simply, those that pay. That is exactly the opposite of what these same companies demand from network operators. Let us clarify that discrimination is not always arbitrary or unfair. In particular, search engines must follow some regulatory guidelines2 but still discrimination is possible. Along the same line, Pasquale5 or Odlyzko4 have already discussed the parallelism between the basic principles of transparency that should similarly inform carriage and search regulation. Note that the use of the condition in the title of this column means the authors do not call for the imposition of a “universal” neutrality affecting networks, equipment, software, and applications (could we call it gatekeeper neutrality?). Without engaging in such a discussion, what we do assert is that the analysis, in its current dimension, does not go far enough. Neutrality has other facets, certainly. Another neutrality, often presented as a cornerstone for correct regula-
tion—technological neutrality—already demands an initial reconsideration of the reasoning. But even beyond that, in a convergent industry where agents from “different worlds” often fight to conquer the same links in the value chain (integrating as many activities as possible for a tighter control of the value network), the rules of the game should be equal for everyone. While most of these agents are judged ex post, and their activity is limited only when it harms competition or hampers the development of the markets, others have to comply with stringent ex ante regulations. This does not necessarily imply that the antitrust-oriented approach should be extended to any party. On the contrary, a general forward-looking regulatory model that would guarantee certain rights and abilities (such as privacy and interoperability) would probably be needed. In any case, what is clear is that in a convergent scenario the coexistence of different regulatory criteria does not seem fair. And one must not forget that convergence, the same convergence that never seems to arrive, cannot be slowed down. References 1. Frieden, R. The costs and benefits of separating wireless telephone service from handset sales and imposing network neutrality obligations. In Proceedings of the First ITU-T Kaleidoscope Academic Conference Innovations in NGN: Future Network and Services (Geneva, May 12–13, 2008), 273–278. 2. Gasser, U. Regulating search engines: Taking stock and looking ahead. Yale Journal of Law & Technology 8 (2006), 202–234. 3. Larouche, P. The network neutrality debate hits Europe. Commun. ACM 52, 5 (May 2009), 22–24. 4. Odlyzko, A. Network neutrality, search neutrality, and the never-ending conflict between efficiency and fairness in markets. Review of Network Economics 8, 1 (2009), 40–60. 5. Pasquale, F. Internet nondiscrimination principles: Commercial ethics for carriers and search engines. University of Chicago Legal Forum 2008 (2008), 263–300. 6. Van Schewick, B. and Farber, D. Network neutrality nuances. Commun. ACM 52, 2 (Feb. 2009), 31–37. 7. Weinstein, L. Ma Bell’s revenge: The battle for network neutrality. Commun. ACM 50, 1 (Jan. 2007), 128. 8. Wu, T. Wireless Carterfone. International Journal of Communications 1 (2007), 389–426. 9. Yoo, C. Network neutrality and the economics of congestion. Georgetown Law Journal 94, 6 (2006), 1847–1908. 10. Zittrain. J.L. The generative Internet. Harvard Law Review 119, 7 (2006), 1974–2040. José Luis Gómez-Barroso (
[email protected]) is an associate professor in the Department of Applied Economics and Economic History at the National University of Distance Education, Spain. Claudio Feijóo (
[email protected]) is a professor at the Universidad Politécnica de Madrid, Spain, and a Senior Member of ACM. Copyright held by author.
april 2 0 1 1 | vol. 5 4 | n o. 4 | c o m m u n i c at i o n s of t he acm
37
practice doi:10.1145/1924421.1924437
Article development led by queue.acm.org
Exposing SIMD units within interpreted languages could simplify programs and unleash floods of untapped processor power. by Jonathan Parri, Daniel Shapiro, Miodrag Bolic, and Voicu Groza
Returning Control to the Programmer: SIMD Intrinsics for Virtual Machines hardware architecture is continually improving, yet interpreted languages— most importantly, Java—have failed to keep pace with the proper utilization of modern processors. SIMD (single instruction, multiple data) units are available
Server and wo rkstatio n
38
communications o f th e acm
| april 2 0 1 1 | vol . 5 4 | no. 4
in nearly every current desktop and server processor and are greatly underutilized, especially with interpreted languages. If multicore processors continue their current growth pattern, interpreted-language performance will begin to fall behind, since current native compilers and languages offer better automated SIMD optimization and direct SIMD mapping support. As each core in commercial x86 multicore processors includes a dedicated SIMD unit, the performance disparity will grow exponentially as long as the available SIMD units remain underutilized in interpreted-language environments. Software design and computer architecture have seen the evolution of parallel data processing, but it has not been fully utilized within the interpreted-language domain. Taking full ad-
Illustratio n by A nd rew cl a rk
vantage of the available system architecture and features, especially SIMD units, can be very challenging for the developer. Such features are often disregarded in favor of costly but scalable measures to increase performance, mainly the addition of processing power. Virtual machines and interpreted languages have consistently abstracted and moved away from the underlying system architecture. SIMD instructions were added to modern processors to increase instruction-level parallelism. In a SIMD instruction, multiple and unique data elements can be manipulated with one common operation. SIMD units typically include an additional register bank with a large register width. These individual registers can be subdivided to hold multiple data elements of varying data types.
Developers have begun to take note that SIMD instruction capabilities are underutilized in the CPU.4 In interpreted languages, a portion of the softwareto-hardware mapping occurs on the fly and in real time, leaving little time for thorough instruction-level parallelism analysis and optimization. Bytecode compilation can be used to identify parallelism opportunities but has proven to be ineffective in many realistic scenarios. Exhaustive automated SIMD identification and vectorization is too computationally intensive to occur within JIT (just-in-time) compilers. It is now common to find vector operations on arrays being performed in native languages with the help of SIMD-based instruction-set extensions (for example, AltiVec, SSE, VIS), general-purpose computing on graphics cards (for example, Nvidia CUDA,
ATI STREAM), and memory modules. AltiVec, SSE, and VIS are well-known SIMD instruction sets. AltiVec is commonly found on PowerPC systems, SSE on x86-64, and VIS within SPARC. We propose that generic userknown parallel operation calls be mapped directly in virtual machine internals to native SIMD calls. By allowing generic SIMD vector access within interpreted languages, the user is given more control over parallelism, while still allowing for automated SIMD vectorization by the JIT compiler. Future virtual machine developments and language specifications must adapt to make better use of SIMD units. Language support for SIMD will not impact support for processors that do not contain SIMD operations. Instead the virtual machine must translate the generic calls into sequential instructions
april 2 0 1 1 | vol. 5 4 | n o. 4 | c o m m u n i c at i o n s of t he acm
39
practice when no SIMD hardware is detected: the status quo. Resource utilization is a key tenet of the cloud-computing paradigm that is now attracting many companies. Maximizing the utilization potential of cloud resources is vital to lowering cost and increasing performance. Because of their popularity and interoperability, interpreted languages, mainly Java, are frequently chosen for cloud computing. Interpreted languages within cloud environments do not expose processor architecture features to the programmer, and as a result, the generated code is rarely optimized for the target platform processor resources. More efficient use of existing datacenter resources would surely result in cost reduction. Interpreted languages such as Java, Flash, PHP, and JavaScript may all benefit from such an approach. There are many potential arguments for and against SIMD intrinsics in interpreted languages, but it is clear that change is needed.
There are many potential arguments for and against SIMD intrinsics in interpreted languages, but it is clear that change is needed.
Existing Solutions As the number of cores on modern processors grows ever higher, so do the number of available SIMD units. Most native compilers allow for explicit declaration of SIMD instructions in the form of intrinsics, while others go even further by extending the compiler front end with vector pragmas.5 This closes the gap between the application and the underlying architecture. One study has shown it is possible to create a common set of macros for calling many different SIMD units, including MMX, SSE, AltiVec, and TriMedia.13 This set of macros, called MMM, provides an abstraction for SIMD unit programming that makes programs more portable across hardware platforms. Although this initial approach did provide code portability, it did not address interpreted languages. A similar approach is also available at the compiler level.10 Whereas some compilers can selectively specialize code at compile time using late binding,3 it is simpler to let the user make parallelism explicit within the code, resolving the parallelism at runtime. Interpreted languages do not expose vector functionality to the pro40
communications o f th e ac m
| april 2 0 1 1 | vol. 5 4 | no. 4
grammer in a transparent way. This is becoming an important problem, since high-performance computations are increasingly being carried out using languages such as Java rather than Fortran.2 One solution addressing this performance gap for interpreted languages is AMD’s Aparapi,1 which provides thread-level parallelism and access to video-card acceleration but no SIMD support. Intel has also exposed some native SIMD support in its recent Intel Math Kernel Library in the form of a library package.6 There is an argument for writing your own custom native interface for SIMD access rather than using a standardized API or virtual machine extensions. One of the simplest solutions out there today is to find hotspots in Java code using runtime profiles and add JNI (Java Native Interface) calls to the offending code, effectively offloading the inefficient operation to faster native code. Unfortunately, this approach works only when the processor architecture is known to the programmer, and it adds compatibility problems when upgrading to a new architecture. Arguments for Change The arguments for supporting the inclusion of premapped vector intrinsics within interpreted languages are faster application runtime, lower cost, smaller code size, fewer coding errors, and a more transparent programming experience. Hardware continues to change, but interpreted languages are not keeping pace with these changes, often relying solely on the default data path. The integration of SIMD instructions into virtual machines and their matching language specifications allows for a larger utilization of available resources. Direct vector operation mappings will yield higher utilization of the SIMD unit, and therefore the code will run faster. Multiprocessors today and in the future will contain a SIMD hardware core for each processor, magnifying the disparity between sequential and parallel code. For example, in Intel’s Core i7-920, an x86-64 processor with four physical cores, each core contains an SSE 4.2 instruction-set extension. A virtual machine can take advantage of four SSE units at once, out-
practice performing the four single-processor cores on throughput. As cloud-computing providers such as Amazon and Google have begun to take hold, resource utilization has become a key issue. These organizations do not know ahead of time the potential client workloads; therefore, resource utilization improvement by programming languages will offer great benefit. As the interpreted languages running on these clusters improve, the need to expand the cloud more readily is reduced, with a corresponding cost savings. As the user gains more control over vectorized data types, the coding experience when exploiting SIMD operations becomes more transparent. The code size, measured in lines of code, will be reduced because the machinery for SIMD operations on vectors can now be performed inside the virtual machine instead of inside every program. This is the same reason that stacks, queues, arrays, hash maps, strings, and other data structures reduce code size. These classes are encapsulating and abstracting complexity. This encapsulation and abstraction also means fewer errors in coding. Whereas the state of the art requires the user to make a native interface call and write C/C++ code to access the SIMD instructions explicitly, a standardized API and the proposed virtual machine integration avoid complex debugging and instead expose SIMD intrinsics. Addressing SIMD Inclusion Concerns Platform compatibility can be achieved by adding native SIMD mappings to the virtual machines of target platforms. There should always exist a default mapping where SIMD operations are mapped at run time to sequential code if the appropriate hardware is not available. Virtual machines such as the HotSpot JVM (Java virtual machine) from Oracle already contain optimizations to identify SIMD opportunities on the fly. The JIT compiler must run at a near real-time pace and, therefore, cannot spend a lot of time finding an optimal SIMD mapping at runtime. Simple SIMD operations written in Java are easily picked up by the JVM,
but more complex vector manipulation sequences of operations can be easily overlooked.11 Many complex applications are not transformed well into parallel SIMD code by optimizing compilers. We argue that the developer often knows which operations can be executed in parallel and should be able to code them in a standardized fashion that will utilize the available SIMD functionality. The user knows that the source and destination of the operation are arrays of the same type and length, and this type of operation is most beneficial when sequences of SIMD operations can be chained together, avoiding transfers between the SIMD and main CPU register files. Another source of resistance that we have heard from industry and research personnel concerns the possibility of using a dedicated graphics card for GPGPU (general-purpose computing on graphics processing units). Examples of this technology are Nvidia’s CUDA and ATI’s STREAM. In the context of Flash, Adobe recently announced that its ActionScript can now use Nvidia graphics cards.8 This alternative approach is legitimate for those users who have such a graphics card supporting the required features. Unfortunately, this means the user has to purchase an external video
card that is often expensive and requires additional power. In the cloudcomputing context this can add up to a high number of video cards with associated costs. Also note that each transaction requires system bus access, whereas the SIMD unit is right on the processor die and ships with nearly every processor. Future generations of processors may include GPUs on the die, but until that is the case for existing infrastructures, SIMD is a low-hanging fruit, not fully utilized for getting more computations per core. Most significant is the fact that GPUs capable of such a feat are not as widely distributed as processors. Intel commands a 50% market share with its integrated graphics products.12 As multicore systems become more prominent, the possibility of using multiple SIMD units becomes even clearer. The arguments for resisting the inclusion of SIMD intrinsics in interpreted languages can be easily overcome with standardized parallel virtual-machine extensions and specifications. Hopefully, these ideas will become readily available in future language evolutions. Using SIMD in Java In response to the need for SIMD in interpreted languages, we designed
High-level overview of the jSIMD API.
jSIMD Java User Standard Java
jSIMD API
Creates Java Process
Native Binary
JVM
Bytecode
Build
N gcc Target Binaries
Generates Native Runtime Library
JNI Bridge
SIMD Mappings
Native Binary
Memory Space
april 2 0 1 1 | vol. 5 4 | n o. 4 | c o m m u n i c at i o n s of t he acm
41
practice an API called jSIMD for mapping Java code to SIMD instructions using vectorized data of various data types. JNI is a feature provided within Java to allow access to native code. It is used in the back end as a bridge between the Java-familiar API seen by the programmer and the SIMD mappings compiled as native code. Using an image-processing program as an example, we observed a 34% speedup over traditional Java code. Earlier tests on simpler and purely mathematical constructs have yielded speedups of two to three times.11 An overview of the API is shown in the accompanying figure. Once a transaction of operations on vectors is built up, the user code tells the API to initiate the desired operations. The API identifies the available operating system and SIMD units in order to decide when to execute API calls in Java and when to pass the calls to the dynamic libraries with SIMD mappings for parallel execution. If parallel execution is possible, the API makes JNI calls to dynamic libraries, which carry out SIMD instructions on data in the Java memory space. The SIMD native library can be recompiled for a different target architecture through gcc or by using a prepackaged binary. Generic source code has been used to facilitate simple cross-compilation.
Future generations of processors may include GPUs on the die, but until that is the case for existing infrastructures, SIMD is a low-hanging fruit, not fully utilized for getting more computations per core.
Motivating Example Consider a motivating example that uses the jSIMD API to obtain a speedup over an out-of-the-box Java solution. Alpha blending, used for creating transitions in video processing or blending two individual images, is one example of an algorithm that can be moderately accelerated through the use of SIMD instructions. There are many such parallel data-processing applications that are easy to write using the SIMD paradigm. Examples of SIMD tasks that are inherently parallel include: 3D graphics, real-time physics, video transcoding, encryption, and scientific applications. Selective versions of these applications are usually supported by custom native code in the virtual machine, whereas our solution gives the programmer the ability to express any algorithm, not just the ones built into the interpreter. Execution profiles were obtained 42
communi cations o f th e ac m
| april 2 0 1 1 | vol. 5 4 | no. 4
using Intel’s VTune Performance Analyzer,7 which can be used to profile and analyze various executable applications. We used it to observe the number and types of SSE calls performed by the JVM alone. An alpha-blending program was executed using several standard-size images (640 x 480 - 1920 x 1080 pixels); 1,000 samples for each test executed on an Intel Core 2 Duo E6600 with 2-GB DDR2 RAM running Windows XP Pro SP3. Using jSIMD resulted in an average speedup of 34%, and a large number of SSE calls as expected. Also, no SIMD instructions were executed when using the out-ofthe-box Java solution, while the results when using the jSIMD API showed that the number of retired SIMD instructions was in the millions and saved several milliseconds per frame. For video transcoding this is a significant performance improvement. The linear relationship between retired SIMD instructions and pixel count means that the API works well at large and small scales. These observations show that exposing SIMD intrinsics will improve execution time by calling more SIMD instructions. The results from the current jSIMD implementation yielded a speedup below the anticipated level, based upon a maximum of four concurrent operations within SSE for the data types and processor that we used. The speedup is still significant, considering that no changes to the underlying system architecture were needed and that the changes to the user code were relatively simple and natural. As it is impossible to guarantee that arrays remain pinned in the JVM9 because of the garbage collector, memory copies occur occasionally, as confirmed through analysis. Making it Happen Some of the problems that arose during the development of the jSIMD API were dependencies between SIMD code and regular Java code, and multiple instantiations of the API. The integrity of the vector registers during program execution is another area of concern. We found that even though Java does make SIMD calls on its own, the JVM will not interrupt the JNI call to our API, and therefore it will not replace any of the contents of the SSE registers on the fly.
practice The use of SIMD registers can be inefficient unless data transfer between memory and the SIMD unit is reduced. Looking at lists of SIMD operations as transactions allows for further analysis, weighing the performance gain versus the overhead cost. One drawback to our approach is that interlacing SSE calls with regular Java code may cause thrashing of register files. Our current solution requires the programmer to write all SSE code in one continuous block so that the JVM does not need to execute while the JNI call is performed. When calling the API to perform a sequence of SIMD operations, the API packages the operations into a transaction using a simple sequential list scheduling algorithm, and then passes off all of the instructions and data by reference to the C program, which executes the SIMD instructions. Dependencies with regular Java code, such as casting before an API execute statement, must occur outside of a transaction unless they are done using the API. Dependency and anti-dependency resolutions will further improve execution time and utilization. Looking to the Future with SIMD Interpreted languages can expose vector functionality to the programmer, and the results will be faster, smaller, and simpler code as demonstrated by a practical application of this approach using Java. Furthermore, better SIMD utilization within cloud-computing infrastructures has the potential to reduce costs significantly. Improving the scheduling algorithm within individual transactions is a future direction that will indeed increase performance and throughput. Another clear next step is to take advantage of multiple cores at the same time in a real cloud-computing infrastructure. Our results can be generalized and included in many virtual machines. For example, Flash would clearly benefit from further manual SIMD intervention by the developer for ActionScript computational segments. PHP and JavaScript can also derive benefits from such an approach in order to increase the speed of Web applications. More generally, if you create a virtual machine, you should allow explicit access
to generic SIMD instructions. Since you have paid for the SIMD unit inside your server, you might as well let your programmers use it. Although this work is still in progress, we are confident that it will be widely adopted for interpreted languages. Users can easily identify parallel operations on vectors and arrays. Interpreted languages need not be forcefully architecturally agnostic and opaque. The time has come for virtual machines to embrace their underlying architectures so that data centers and high-performance applications can fully utilize the available processing infrastructure. It is reasonable to expect existing virtualmachine instruction sets to include generic SIMD operations. Returning some control to the programmer, especially for inherently parallel vector operations, is an easy step toward a transparent programming experience. This argument should not be confused with the age-old assemblyversus-C sentiment. Generic SIMD mappings can retain the abstraction that already exists in interpreted languages; all the while the user is unaware of the exact hardware mapping that is taking place. Users are simply given more control to specify what they believe to be parallel code negating the required re-discovery used by real-time algorithmic approaches. SIMD units available in nearly every desktop and server processor are underutilized, especially when using interpreted languages. Giving more control to the programmer and the programming syntax allows for successful and simple mappings with performance increases. Virtual machines must address the current challenges of integrating SIMD support.
Related articles on queue.acm.org GPUs: A Closer Look Kayvon Fatahalian, Mike Houston http://queue.acm.org/detail.cfm?id=1365498
References 1. AMD. Aparapi; http://developer.amd.com/zones/java/ aparapi/. 2. Amedro, B., Bodnartchouk, V., Caromel, D., Delb, C., Huet, F. and Taboada, G. L. Current state of Java for HPC. Sophia Antipolis, France, 2008; http://hal.inria.fr/ docs/00/31/20/39/PDF/RT-0353.pdf. 3. Catanzaro, B., Kamil, S. A., Lee, Y., Asanovi, K., Demmel, J., Keutzer, K., Shalf, J., Yelick, K. A. and Fox, A. SEJITS: Getting productivity and performance with selective embedded JIT specialization. Technical Report UCB/EECS-2010-23. EECS Department, University of California, Berkeley; http://www.eecs. berkeley.edu/Pubs/TechRpts/2010/EECS-2010-23. html. 4. Cheema, M. O. and Hammami, O. Application-specific SIMD synthesis for reconfigurable architectures. Microprocessors and Microsystems 30, 6 (2006), 398–412. 5. Codeplay. VectorC Compiler Engine; http://www. codeplay.com. 6. Intel Software Network. Intel AVX optimization in Intel MKL V10.3, 2010; http://software.intel.com/enus/articles/intel-avx-optimization-in-intel-mkl-v103/. 7. Intel Software Network. Intel VTune Amplifier XE, 2010; http://software.intel.com/en-us/intel-vtune/. 8. Nvidia. Adobe and Nvidia announce GPU acceleration for Flash player, 2009; http://www.nvidia.com/object/ io 1243934217700.html. 9. Oracle. JNI enhancements introduced in version 1.2 of the Java 2 SDK, 2010; http://download.oracle.com/ javase/1.3/docs/guide/jni/jni-12. html#GetPrimitiveArr ayCritical. 10. Orc (Oil Runtime Compiler); http://code.entropywave. com/projects/orc/. 11. Parri, J., Desmarais, J., Shapiro, D., Bolic, M. and Groza, V. Design of a custom vector operation API exploiting SIMD within Java. In Proceedings of the Canadian Conference on Electrical and Computer Engineering (May 2010). 12. Ranganathan, L. 3D gaming on Intel Integrated Graphics, 2009; http://software.intel.com/en-us/ articles/3d-gaming-on-intel-integrated-graphics/. 13. Rojas, J.C. Multimedia macros for portable optimized programs. Ph.D. dissertation, Northeastern University, 2003.
Jonathan Parri (
[email protected]) is a Ph.D. candidate at the University of Ottawa and a senior member of the Computer Architecture Research Group. His current research focuses on design space exploration in the hardware/software domain, targeting embedded and traditional systems. Daniel Shapiro (
[email protected]) is a Ph.D. candidate at the University of Ottawa and a senior member of the Computer Architecture Research Group. His research interests include custom processor design, instruction-set extensions, IT security, and biomedical engineering. Miodrag Bolic (
[email protected]) is an associate professor at the School of Information Technology and Engineering, University of Ottawa, where he also serves as director of the Computer Architecture Research Group. His research interests include computer architectures, radio frequency identification, and biomedical signal processing. Voicu Groza (
[email protected]) works in the School of Information Technology and Engineering at the University of Ottowa, where he is co-director of the Computer Architecture Research Group. His research interests include hardware/software co-design, biomedical instrumentation and measurement, along with reconfigurable computing.
Scalable Parallel Programming with CUDA John Nickolls, Ian Buck, Michael Garland, and Kevin Skadron http://queue.acm.org/detail.cfm?id=1365500 Data-Parallel Computing Chas. Boyd http://queue.acm.org/detail.cfm?id=1365499
© 2011 ACM 0001-0782/11/04 $10.00
april 2 0 1 1 | vol. 5 4 | n o. 4 | c o m m u n ic at i o n s of t he acm
43
practice doi:10.1145/1924421.1924438
Article development led by queue.acm.org
Knowing where to begin is half the battle. by Thomas A. Limoncelli, With an introduction by Vinton G. Cerf
Successful Strategies for IPv6 Rollouts. Really. TCP/IP began in 1973 when Robert Kahn and I started to explore the ramifications of interconnecting different kinds of packet-switched networks. We published a concept paper in May 1974,2 and a fairly complete specification for TCP was published in December 1974.1 By the end of 1975, several implementations had been completed and many problems were identified. Iteration began, and by 1977 it was concluded that TCP (by now called Transmission Control Protocol) should be split into two protocols: a simple Internet Protocol that carried datagrams end to end through packet networks interconnected through gateways; and a TCP that managed the flow and sequencing of packets exchanged between hosts on the contemplated Internet. This split allowed for the possibility of real Th e design of
44
communi cations o f th e acm
| april 2 0 1 1 | vol . 5 4 | no. 4
time but possibly lossy and unsequenced packet delivery to support packet voice, video, radar, and other real-time streams. By 1977, I was serving as program manager for what was then called the Internetting research program at DARPA (U.S. Defense Advanced Research Projects Agency) and was confronted with the question, “How much address space is needed for the Internet?” Every host on every network was assumed to need an address consisting of a “network part” and a “host part” that could uniquely identify a particular computer on a particular network. Gateways connecting the networks of the Internet would understand these addresses and would know how to route Internet packets from network to network until they reached the destination network, at which point the final gateway would direct the Internet packet to the correct host on that network. A debate among the engineers and scientists working on the Internet ran for nearly a year without a firm conclusion. Some suggested 32-bit addresses (8 bits of network, 24 bits of host), some said 128 bits, and others wanted variable-length addresses. The last choice was rejected by programmers who didn’t want to fiddle around finding the fields of an Internet packet. The 128-bit choice seemed excessive for an experiment that involved only a few networks to begin with. By this time, the research effort had reached its fourth iteration (the IP layer protocol was called IPv4), and as program manager, I felt a need to get on with live testing and final design of TCP and IP. In lieu of consensus, I chose 32 bits of address. I thought 4.3 billion potential addresses would be adequate for conducting the experiments to prove the technology. If it worked, then we could go back and design the production version. Of course, it is now 2011, and the experiment never exactly ended. The Internet Corporation for Assigned Names and Numbers (ICANN) succeeded Jonathan Postel as the op-
Ma p courtesy of th e lumet ta corporation
The Ipv6 Internet
Lumeta Corporation created this map of the IPv6 Internet using data collected on Feb. 5, 2009. Each end node can represent a handful of computers on a small network, or perhaps a large company with hundreds of thousands of hosts. (http://www.lumeta.com/IPv6/)
erator of what was and still is called the IANA (Internet Assigned Numbers Authority). IANA historically allocated large chunks of address space to end users or, after the commercialization of the Internet, to Internet service providers (ISPs). With the creation of the Regional Internet Registries (for
Internet addresses), IANA typically allocated 24-bit subsets of the IP address space (sufficient for 16 million hosts) to one of the five regional registries, which, in turn, allocated space to ISPs or, in some cases, very large end users. As this article was being written, ICANN announced that it had just allo-
cated the last five of these large 24-bit chunks of space. The Internet Engineering Task Force (IETF) recognized in the early 1990s that there was a high probability that the address space would be exhausted by the rapid growth of the Internet, and it concluded several years
april 2 0 1 1 | vol. 5 4 | n o. 4
c o m m u n ic at i o n s of t he acm
45
practice of debate and analysis with the design of a new, extended address format called IPv6. (IPv5 was an experiment in stream applications that did not scale and was abandoned.) IPv6 had a small number of new features and a format intended to expedite processing, but its principal advantage was 128 bits each of source and destination host addresses. This is enough for 340 trillion trillion trillion addresses—enough to last for the foreseeable future. The IPv6 format is not backwards compatible with IPv4 since an IPv4only host does not have the 128 bits of address space needed to refer to an IPv6-only destination. It is therefore necessary to implement a dual-stack design that allows hosts to speak to either protocol for the period that both are in use. Eventually, address space will not be available for additional IPv4 hosts, and IPv6-only hosts will become necessary. Hopefully, ISPs will be able to implement IPv6 support before the actual exhaustion of IPv4 addresses, but it will be necessary to allow for dual-mode operation for some years to come. World IPv6 Day is scheduled for June 8, 2011, at which time as many ISPs as are willing and able will turn on their IPv6 support to allow end users and servers to test the new protocol on a global scale for a day. The move to IPv6 is one of the most significant changes to the Internet architecture since it was standardized in the late 1970s and early 1980s. It will take dedicated effort by many to ensure that users, servers, and Internet service and access providers are properly equipped to manage concurrent operation of the old and new protocols. Here, Thomas Limoncelli considers steps that can be taken to achieve this objective. —Vinton G. Cerf
Someday the U.S. will run out of three-digit telephone area codes and will be forced to add a digit. As Vint Cerf explained, the Internet is facing a similar situation.
Strategies for Moving to IPv6 Someday the U.S. will run out of threedigit telephone area codes and will be forced to add a digit. As Vint Cerf explained, the Internet is facing a similar situation with its address structure. Often predicted and long ignored, the problem is now real. We have run out of 32-bit IP addresses (IPv4) and are moving to the 128-bit address format of IPv6. This section looks at some 46
communi cations o f th e acm
| april 2 0 1 1 | vol . 5 4 | no. 4
strategies of organizations that are making the transition. The strategies that work tend to be those that focus on specific applications or Web sites rather than trying to convert an entire organization. The biggest decision for many organizations is simply knowing where to begin. In this article, I consider three possible strategies. The first scenario we present is a cautionary tale against what might be your first instinct. Though fictional, we’ve seen this story played out in various forms. The other two examples have proven to be more successful approaches. Knowing this, we would offer the following advice to a business contemplating the transition to IPv6: start with a small, well-defined project that has obvious value to the business. Story 1: “Upgrade Everything!” While having a grand plan of upgrading everything is noble and well intentioned, it is a mistake to think this is a good first experiment. There is rarely any obvious value to it (annoys management), it is potentially biting off more than you can chew (annoys you), and mistakes affect people that you have to see in the cafeteria every day (annoys coworkers). This strategy usually happens something like this: someone runs into the boss’s office and says, “Help! Help! We have to convert everything to IPv6.” This means converting the network equipment, Domain Name System (DNS), Dynamic Host Configuration Protocol (DHCP) system, applications, clients, desktops, servers. It’s a huge project that will touch every device on the network. These people sound like Chicken Little claiming the sky is falling. These people are thrown out of the boss’s office. A better approach is to go to the boss and say, “There’s one specific thing I want to do with IPv6. Here’s why it will help the company.” These people sound focused and determined. They usually get funding. Little does the boss realize this “one specific thing” requires touching many dependencies. These include the network equipment, DNS, DHCP, and so on—yes, the same list of things
practice
This visualization, from the Cooperative Association for Internet Data Analysis (CAIDA), represents macroscopic snapshots of IPv4 and IPv6 Internet topology samples captured in January 2009.
Visua liz ation c ourt esy of ca ida
that Chicken Little was spouting off about. The difference is these people got permission to do it. Which leads us to... Story 2: Work from the outside in. Fundamentally, this second strategy is to start with your organization’s external Web presence. Usually an external Web server is hidden behind a hardware device known as a load balancer. When Web browsers connect to your Web site, they are really connecting to the load balancer. It relays the connection (being a “man in the middle”) to the actual Web server. While doing that, it performs many functions— most importantly it load-shares the incoming Web traffic among two or more redundant Web servers. In this strategy the goal is simple: upgrade to IPv6 every component on the path from your ISP to your load balancer and let the load balancer translate to IPv4 for you. Modern load balancers can receive IPv6 connections in the “front” and send IPv4 connections out the “back” to your Web servers. That is, your load balancer can be a translator that permits you to of-
fer IPv6 service without requiring you to change your Web servers. Those upgrades can be phase two. This is a bite-size project that is achievable. It has a real tangible value that you can explain to management without being too technical: “The coming wave of IPv6-only users will have faster access to our Web site. Without this upgrade, those users will have slower access to our site because of the IPv4/v6 translators that ISPs are setting up as a Band-Aid.” That is an explanation that a nontechnical executive can understand. Management may be unconvinced that there will be IPv6-only users. Isn’t everyone “dual stack” as previously described? Most are, but LTE (“4G”) phones and the myriad other LTEequipped mobile devices will eventually be IPv6-only. ARIN (American Registry for Internet Numbers) advised LTE providers that IPv4 depletion is imminent and LTE providers have prepared for a day that new LTE users will be IPv6-only.10 Obviously this new wave of IPv6-only users will want to access IPv4-only sites, so the carriers are setting up massive farms of servers to do the translation.4
There are two problems with this.3 First, the translation is expected to be slow. Second, geolocation will mistakenly identify users as being where the server farm is, not where the user is. That means if your Web site depends on advertising that is geotargeted, the advertisements will be appropriate for the location of server farm; not where your users are. Since LTE is mostly used in mobile devices, this is particularly pressing. Therefore, if your company wants to ensure that the next million or so new users have fast access to your Web site,9 or if revenue depends on advertising, then management should be concerned. Most CEOs can understand simple, nontechnical, value statements such as “New people coming onto the Internet will have better access to our site,” or “It is required to insure that highpaying, geotargeted advertisements, continue to work.” A project like this requires only modest changes: a few routers, some DNS records, and so on. It is also a safe place to make changes because your external Web presence has a good, solid testing regimen for making changes
april 2 0 1 1 | vol. 5 4 | n o. 4 | c o m m u n ic at i o n s of t he acm
47
practice in a test environment, which gate to a QA environment before hitting the production environment. Right? Once IPv6 is enabled from the ISP to the load balancer, and the load balancer is accepting IPv6 connections but sending out IPv4 connections to the Web farm, new opportunities present themselves. As each Web server becomes IPv6 ready, the load balancer no longer needs to translate for that host. Eventually your entire Web farm is dual stack. This technique provides a throttle to control the pace of change. You can make small changes, one at a time, testing along the way. In doing so you will have upgraded the routers, DNS server, and other components. While your boss would shriek if you had asked to change every layer of your network stack, you have essentially done just that. Of course, once you have completed this and shown that the world didn’t end, developers will be more willing to test their code under IPv6. You might need to enable IPv6 to the path to the QA lab. That’s another bite-size project. Another path will be requested. Then another. Then the LAN that the developers use. Then it makes sense to do it everywhere. You’ve now achieved the goal of the person from Story 1, but you’ve gotten management approval. During Google’s IPv6 efforts we learned that this strategy works really well. Most importantly we learned that it turned out to be easier and less expensive than expected.8 Story 3: “One Thing” This story involves a strategic approach in which an organization picked a single application—its “one thing”—and mounted a focused effort to move it to IPv6. Again, being focused appealed to management and still touched on many of the upgrades requested by our Chicken Little. Comcast presented a success story at the 2008 Google IPv6 Symposium,6 demonstrating how it chose one strategic thing to upgrade: the set-top box management infrastructure. Every set-top box needs an IP address so the management system can reach it. That’s more IPv4 addresses than Comcast could reasonably get allocated. Instead it used IPv6. If you get Internet service from Comcast, the set-top box 48
communi cations o f th e acm
on your TV set is IPv6 even though the cable modem sitting next to it is providing IPv4 Internet service.7 Comcast had to get IPv6 working for anything that touches the management of its network: provisioning, testing, monitoring, billing. Wait, billing? Well, if you are touching the billing system, you are basically touching a lot of things. Ooh, shiny dependencies. (This is why we put “one thing” in quotes.) The person from Story 1 must be jealous. At the same symposium Nokia presented a success story that also involved finding “one thing,” which turned out to be power consumption. Power consumption, you say? Yes. Its phones waste battery power by sending out pings to keep the NAT (network address translation) session alive. By switching to IPv6, Nokia did not need to send out pings—no NAT, no need to keep the NAT session alive. Its phones can turn off their antennae until they have data to send. That saves power. In an industry where battery life is everything, any executive can see the value. (A video from Google’s IPv6 summit details Nokia’s success.5) In the long term we should be concerned with converting all our networks and equipment to IPv6. The pattern we see, however, is that successful projects have selected one specific thing to convert, and let all the dependencies come along for the ride. Summary IPv4 address space is depleted. People who have been ignoring IPv6 for years need to start paying attention. It is real—and really important. IPv6 deployment projects seem to be revealing two successful patterns and one unsuccessful pattern. The unsuccessful pattern is to scream that the sky is falling and ask for permission to upgrade “everything.” The lessons we have learned: 1. Proposals to convert everything sound crazy and get rejected. There is no obvious business value in making such a conversion at this time. 2. Work from the outside in. A load balancer that does IPv6-to-IPv4 translation will let you offer IPv6 to external customers now, gives you a “fast win” that will bolster future projects, and provides a throttle to control the pace of change.
| april 2 0 1 1 | vol . 5 4 | no. 4
3. Proposing a high-value reason (your “one thing”) to use IPv6 is most likely to get management approval. There are no simple solutions, but there are simple explanations. Convert that “one thing” and keep repeating the value statement that got the project approved, so everyone understands why you are doing this. Your success here will lead other projects. For a long time IPv6 was safe to ignore as a “future requirement.” Now that IPv4 address space is depleted, it is time to take this issue seriously. Yes, really. Related articles on queue.acm.org DNS Complexity Paul Vixie http://queue.acm.org/detail.cfm?id=1242499 A Conversation with Van Jacobson http://queue.acm.org/detail.cfm?id=1508215 Principles of Robust Timing over the Internet Julien Ridoux, Darryl Veitch http://queue.acm.org/detail.cfm?id=1773943 References 1. Cerf, V., Dalal, Y., Sunshine, C. 1RFC 675. Specification Internet Transmission Control Program, 1974. 2. Cerf, V., Kahn, R.E. A protocol for packet network intercommunication. IEEE Transactions on Communications 22, 5 (1974), 637–648. 3. Donley, C., Howard, V., Kuarsingh, V., Chandrasekaran, A. and Ganti, V. Assessing the impact of NAT444 on network applications; http://tools.ietf.org/html/draftdonley-nat444-impacts-01/ 4. Doyle, J. Understanding carrier-grade NAT (2009); http://www.networkworld.com/community/ node/44989. 5. Google IPv6 Conference. IPv6, Nokia, and Google (2008); http://www.youtube.com/ watch?v=o5RbyK0m5OY. 6. Google IPv6 Implementors Conference; http://sites. google.com/site/ipv6implementors/2010/agenda. 7. Kuhne, M. IPv6 monitor: an interview with Alain Durand (2009); https://labs.ripe.net/Members/ mirjam/content-ipv6-monitor. 8. Marsan, C.D. Google: IPv6 is easy, not expensive (2009); http://www.networkworld.com/ news/2009/032509-google-ipv6-easy.html. 9. Miller, R. The billion-dollar HTML tag (2009); http://www.datacenterknowledge.com/ archives/2009/06/24/the-billion-dollar-html-tag/. 10. Morr, D. 2010. T-Mobile is pushing IPv6. Hard (2010); http://www.personal.psu.edu/dvm105/blogs/ ipv6/2010/06/t-mobile-is-pushing-ipv6-hard.html.
Vinton G. Cerf is Google’s vice president and chief Internet evangelist. As one of the “Fathers of the Internet,” Cerf is the codesigner of the Internet’s TCP/ IP protocols and architecture. He holds a B.S. degree in mathematics from Stanford University and M.S. and Ph.D. degrees in computer science from UCLA. Thomas A. Limoncelli is an author, speaker, and system administrator. His books include The Practice of System and Network Administration (Addison-Wesley, 2007) and Time Management for System Administrators (O’Reilly, 2005). He works at Google in New York City and blogs at http://EverythingSysadmin.com/ © 2011 ACM 0001-0782/11/04 $10.00
doi:10.1145/1924421 . 1 9 2 4 4 3 6
Article development led by queue.acm.org
Contrary to popular belief, SQL and noSQL are really just two sides of the same coin. by Erik Meijer and Gavin Bierman
A Co-Relational Model of Data for Large Shared Data Banks promise to solve the problem of distilling valuable information and business insight from big data in a scalable and programmer-friendly way, noSQL databases have been one of the hottest topics in our field recently. With a plethora of open source and commercial offerings (Membase, Fu eled by t heir
CouchDB, Cassandra, MongoDB, Riak, Redis, Dynamo, BigTable, Hadoop, Hive, Pig, among others) and a surrounding cacophony of technical terms (Paxos, CAP, Merkle trees, gossip, vector clocks, sloppy quorums, MapReduce, and so on), however, it is hard for businesses and practitioners to see the forest for the trees. The current noSQL market satisfies the three characteristics of a monopolistically competitive market: the barriers to entry and exit are low; there are many small suppliers; and these suppliers produce technically heterogeneous, highly differentiated products.12 Mo-
nopolistically competitive markets are inconsistent with the conditions for perfect competition. Hence in the long run monopolistically competitive firms will make zero economic profit. In the early 1970s, the database world was in a similar sorry state.14 An overabundance of database products exposed many low-level implementation details and, as database people like to say, forced programmers to work at the physical level instead of the logical level. The landscape changed radically when Ted Codd proposed a new data model and a structured query language (SQL) based on the math-
april 2 0 1 1 | vol. 5 4 | n o. 4 | c o m m u n i c at i o n s of t he acm
49
practice Figure 1. Object graph for Products collection.
0
Title
Author
Chars
Chars
The Right Stuff
Tom Wolfe
Year
Pages
1979
320
0
1
Keywords
Ratings
2
Chars
Chars
Chars
Book
Hardcover
American
0
Chars
Chars
4 stars
****
1
Figure 2. Object graph result for books with four star rating.
0
Title
Author
0
Year
Pages
1979
320
Keywords
Ratings
Chars
Title
Keywords
Chars
Tom Wolfe
0
1
0
Chars
Chars
4 stars
****
1
2
The Right Stuff
Chars
Chars
Chars
Book
Hardcover
American
ematical concept of relations and foreign-/primary-key relationships.4 In the relational model, data is stored in conceptually simple containers (tables of rows), and queries over this data are expressed declaratively without knowledge of the underlying physical storage organization. Codd’s relational model and SQL allowed implementations from different vendors to be (near) perfect substitutes, and hence provided the conditions for perfect competition. Standardizing on 50
communications of th e ac m
the relational model and SQL created a secondary network effect around complementary producers such as educators, tool vendors, consultants, etc., all targeting the same underlying mathematical principles. Differences between actual relational database implementations and SQL dialects became to a large extent irrelevant.7 Today, the relational database market is a classic example of an oligopoly. The market has a few large players (Oracle, IBM, Microsoft, MySQL), the bar-
| a p r i l 2 0 1 1 | vo l . 5 4 | n o. 4
riers to entry are high, and all existing SQL-based relational database products are largely indistinguishable. Oligopolies can retain high profits in the long run; today the database industry is worth an estimated $32 billion and still growing in the double digits. In this article we present a mathematical data model for the most common noSQL databases—namely, key/ value relationships—and demonstrate that this data model is the mathematical dual of SQL’s relational data model of foreign-/primary-key relationships. Following established mathematical nomenclature, we refer to the dual of SQL as coSQL. We also show how a single generalization of the relational algebra over sets—namely, monads and monad comprehensions—forms the basis of a common query language for both SQL and noSQL. Despite common wisdom, SQL and coSQL are not diabolically opposed, but instead deeply connected via beautiful mathematical theory. Just as Codd’s discovery of relational algebra as a formal basis for SQL shifted the database industry from a monopolistically competitive market to an oligopoly and thus propelled a billion-dollar industry around SQL and foreign-/primary-key stores, we believe that our categorical data-model formalization and monadic query language will allow the same economic growth to occur for coSQL key-value stores. Objects Versus Tables To set the scene let’s look at a simple example of products with authors and recommendations as found in the Amazon SimpleDB samples, and implement it using both object graphs and relational tables. While we don’t often think of it this way, the RAM for storing object graphs is actually a key-value store where keys are addresses (l-values) and values are the data stored at some address in memory (r-values). Languages such as C# and Java make no distinction between r-values and l-values, unlike C or C++, where the distinction is explicit. In C, the pointer dereference operator *p retrieves the value stored at address p in the implicit global store. In the rest of this article we conveniently confuse the words object (graph) and key-value store.
practice In C# (or any other modern object-oriented language) we can model products using the following class declaration, which for each product has scalar properties for title, author, publication date, and number of pages, and which contains two nested collections—one for keywords and another for ratings: class Product { string Title; string Author; int Year; int Pages; IEnumerable<string> Keywords; IEnumerable<string> Ratings; }
Given this class declaration, we can use object initializers to create a product and insert it into a collection using collection initializers: var _ 1579124585 = new Product { Title = “The Right Stuff”, Author = “Tom Wolfe”, Year = 1979, Pages = 320, Keywords = new[]{ “Book”, “Hardcover”, “American” }, Ratings = new[]{ “****”, “4 stars” }, } var Products = new[]{ _ 1579124585 };
The program produces in memory the object graph shown in Figure 1. Note that the two nested collections for the Keywords and Ratings properties are both represented by actual objects with their own identity. Using the LINQ (Language Integrated Query) comprehension syntax introduced in C# 3.0,11 we can find the titles and keywords of all products that have four-star ratings using the following query: var q = from product in Products where product.Ratings.Any(rating rating == “****”) select new{ product.Title, prod uct.Keywords };
The LINQ comprehension syntax is just syntactic sugar for a set of standard query operators that can be defined in any modern programming language with closures, lambda expressions
(written here as rating==>rating == “****”), or inner-classes such as Objective-C, Ruby, Python, JavaScript, Java, or C++. The C# compiler translates the previous query to the following de-sugared target expression: var q = Products.Where(product==> product.Ratings. Any(rating rating == “****”)). Select(product==> new{ product.Title, product. Keywords });
The various values in the query result, in particular the Keywords collection, are fully shared with the original object graph, and the result is a perfectly valid object graph, shown in Figure 2. Now let’s redo this example using the relational model. First of all, we must normalize our nested Product class into three flat tables, for Products, Keywords, and Ratings respectively, as shown in Figure 3. Each value in the relational world requires a new primary key property (here all called ID). Furthermore, the Keywords and Ratings tables require an additional foreign-key property ProductID that encodes the one-to-many association between Products and Keywords
and Ratings. Later we will decorate these class declarations with additional metadata to reflect the underlying database tables. In most commercial relational database systems, tables are defined by executing imperative CREATE TABLE DDL (data definition language) statements. As usual in the relational world, we do not model the individual collections of keywords and ratings for each product as separate entities, but instead we directly associate multiple keywords and ratings to a particular product. This shortcut works only for one-to-many relationships. The standard practice for many-to-many relationships (multivalued functions) requires intersection tables containing nothing but pairs of foreign keys to link the related rows. Perhaps surprisingly for a “declarative” language, SQL does not have expressions that denote tables or rows directly. Instead we must fill the three tables in an imperative style using loosely typed DML statements, which we express in C# as shown in Figure 4. These DML statements create three tables filled with the rows as shown in Figure 5. An important consequence of normalizing a single type into separate tables is that the database must main-
Figure 3. Data declaration for Product database. class Products { int ID; string Title; string Author; int Year; int Pages; }
class Keywords { int ID; string Keyword; int ProductID; }
class Ratings { int ID; string Rating; int ProductID; }
Figure 4. Inserting values into Product database. Products.Insert ( 1579124585 , “The Right Stuff” , “Tom Wolfe” , 1979 , 320 );
Keywords.Insert ( 4711, “Book” , 1579124585 ); Keywords.Insert ( 1843, “Hardcover” , 1579124585 ); Keywords.Insert ( 2012, “American” , 1579124585 );
Ratings.Insert ( 787, “****” , 1579124585 ); Ratings.Insert ( 747, “4 stars” , 1579124585 );
april 2 0 1 1 | vol. 5 4 | n o. 4 | c o m m u n i c at i o n s of t he acm
51
practice diately see that SQL lacks compositionality; since there is no recursion, rows can contain only scalar values:
ID
Rating
ProductID
787
**** 4 stars
1579124585
747
Keywords
Products
Ratings
Figure 5. Relational tables for Product database.
ID
Title
Author
Year
Pages
1579124585
The Right Stuff
Tom Wolfe
1979
304
ID
Keyword
ProductID
4711
Book
1579124585
1843
Hardcover
1579124585
2012
American
1579124585
tain referential integrity to ensure that: the foreign-/primary-key relationships across related tables remain synchronized across mutations to the tables and rows; the primary key of every row remains unique within its table; and foreign keys always point to a valid primary key. For example, we cannot delete a row in the Products table without also deleting the related rows in the Keywords and Ratings tables. Referential integrity implies a closed-world assumption where transactions on the database are serialized by (conceptually) suspending the world synchronously, applying the required changes, and resuming the world again when referential integrity has been restored successfully, rolling back any chances otherwise. Assuming a closed world is, as we claim, both a strength and a weakness of the relational model. On the one hand, it simplifies the life of developers via ACID (atomicity, consistency, isolation, durability) transactions (although in practice, for efficiency, one must often deal with much weaker isolation levels) and allows for impressive (statistics-based) query optimization. The closed-world assumption, however, is in direct con-
Keywords
Figure 6. Tabular result for books with four-star ratings.
52
row ::= new { …, name = scalar, … }
1579124585
Title
Keyword
The Right Stuff
Book
The Right Stuff
Hardcover
The Right Stuff
American
communications o f th e ac m
Compare this with the definition for anonymous types, where a row can contain arbitrary values, including other rows (or nested collections): value ::= new { …, name = value, … } | scalar
tradiction with distribution and scaleout. The bigger the world, the harder it is to keep closed. Returning to our example, we present the naïve query to find the titles and keywords of all products that have four stars, expressed directly in terms of the relational model. It creates the cross-product of all possible combinations of Products, Keywords, and Ratings, and selects only the title and keyword where the keyword and rating are related to the product and the rating has four stars: var q = from product in Products from rating in Ratings from keyword in Keywords where product.ID == rating. ProductId && product.ID == keyword. ProductID && rating == “****” select new{ product.Title, keyword.Keyword };
The result of this query is the row set shown in Figure 6. Disappointingly, this row set is not itself normalized. In fact, to return the normalized representation of our object-graph query, we need to perform two queries (within a single transaction) against the database: one to return the title and its primary key, and a second query that selects the related keywords. What we observe here is SQL’s lack of compositionality—the ability arbitrarily to combine complex values from simpler values without falling outside the system. By looking at the grammar definition for table rows, we can imme-
| april 2 0 1 1 | vol . 5 4 | no. 4
SQL is rife with noncompositional features. For example, the semantics of NULL is a big mess: why does adding the number 13 to a NULL value, 13+NULL, return NULL, but summing the same two values, SUM(13, NULL), returns 13? Also, even though query optimizers in modern SQL implementations are remarkably powerful, the original query will probably run in cubic time when implemented via three nested loops that iterate over every value in the three tables. A seasoned SQL programmer would instead use explicit join syntax to ensure that the query is as efficient as our object-based query: var q = from product in Products join rating in Ratings on product.ID equals rating. ProductId where rating == “****” select product into FourStar Products from fourstarproduct in FourStarProducts join keyword in Keywords on product.ID equals keyword. ProductID select new{ product.Title, keyword.Keyword };
Depending on the encoding of the nesting of the result of a join using flat result sets, the SQL programmer must choose among various flavors of INNER, OUTER, LEFT, and RIGHT joins. Impedance Mismatch In 1984 George Copeland and David Maier recognized the impedance mismatch between the relational and the object-graph model just described,5 and in the quarter century since, we have seen an explosion of O/R (object-
practice relational) mappers that attempt to bridge the gap between the two worlds. A more skeptical view of O/R mappers is that they are undoing the damage caused by normalizing the original object model into multiple tables. For our running example this means that we have to add back information to the various tables to recover the relationships that existed in the original model. In this particular case we use the LINQ-to-SQL custom metadata annotations; other O/R mappers use similar annotations, which often can be inferred from naming conventions of types and properties. [Table(name=“Products”)] class Product { [Column(PrimaryKey=true)]int ID; [Column]string Title; [Column]string Author; [Column]int Year; [Column]int Pages; private EntitySet
_ Ratings; [Association( Storage=“ _ Ratings”, ThisKey=“ID”,OtherKey=“ProductID“ ,DeleteRule=“ONDELETECASCADE”)] ICollection Ratings{ … } private EntitySet _ Keywords; [Association( Storage=“ _ Keywords”, ThisKey=“ID” ,OtherKey=“ProductID”, DeleteRule=“ONDELETECASCADE”)] ICollection Keywords{ … } } [Table(name=“Keywords”)] class Keyword { [Column(PrimaryKey=true)]int ID; [Column]string Keyword; [Column(IsForeignKey=true)]int ProductID; } [Table(name=“Ratings”)] class Rating { [Column(PrimaryKey=true)]int ID; [Column]string Rating; [Column(IsForeignKey=true)]int ProductID; }
is necessarily more complex than we started with since we are forced to introduce explicit representations for Rating and Keyword collections that did not exist in our original object model. The existence of the various foreign- and primary-key properties is further evidence that the O/R mapping abstraction is leaky. Aside from those small differences, the net result of all this work is that we can now write the query to find all products nearly as concisely as we did before normalization: var q = from product in Products where product.Ratings.Any(rating rating.Rating == “****”) select new{ product.Title, prod uct.Keywords };
Since the results must be rendered as object graphs, the O/R mapper will make sure that the proper nested result structures are created. Unfortunately, not every O/R mapper does this efficiently.9 It is not only the programmer who needs to recover the original structure of the code. The database implementer must also jump through hoops to make queries execute efficiently by building indexes that avoid the potential cubic effect that we observed earlier. For one-to-many relationships, indexes are nothing more than nested collections resulting from precomputing joins between tables to quickly find all the rows whose foreign keys point to a row with a particular primary key. Since the relational model is not closed under composition, however, the notion of index has to be defined outside the model. Two natural indexes correspond
to the relationship between Products and Ratings and Products and Keywords, respectively. For each product in the Products table, the ratings index contains the collection of all related ratings: from rating in Ratings where rating. ProductID == product.ID select rating;
Similarly, for each product in the Product table, the keywords index contains the collection of all keywords related to that product: from keyword in Keywords where keyword.ProductID == product.ID select keyword;
If we visualize the indexes as additional columns on the Products table, the reversal of the original relationships between the tables becomes apparent. Each row in the Products table now has a collection of foreign keys pointing to the Keywords and Ratings tables much as the original object graph, as shown in Figure 7. One of the advantages touted for normalization over hierarchical data is the ability to perform ad-hoc queries— that is, to join tables on conditions not envisioned in the original data model. For example, we could try to find all pairs of products where the length of the title of one product is the same as the length of the author’s name in the other using the following query: from p1 in Products from p2 in Products where p1.Title.Length == p2.Author. Length select new{ p1, p2 };
Figure 7. Keyword and Ratings index on Products table.
Keywords ID
Title
Author
Year
Pages
4711
1579124585
The Right Stuff
Tom Wolfe
1979
304
Ratings 787
1843
2012 747
ID
Keyword
ProductID
ID
Rating
ProductID
4711
Book
1579124585
787
Hardcover
1579124585
747
**** 4 stars
1579124585
1843 2012
American
1579124585
1579124585
Note that the resulting object model april 2 0 1 1 | vol. 5 4 | n o. 4 | c o m m u n ic at i o n s of t he acm
53
practice noSQL is coSQL At this point it feels like the conceptual dissonance between the key-value and foreign-/primary-key data models is insurmountable. That would be a pity since clearly each has its strengths and weaknesses. Wouldn’t it be great if we could give a more mathematical explanation of where the relational model shines and where the object-graph model works best? As it turns out, we can find the answer to this question by taking a closer look at the (in-memory) structures created for our running example in both models. Let’s start by slightly simplifying the object-graph example. We do so by removing the object identity of the Ratings and Authors collections to reflect more directly how they are modeled in the relational world. We inline the Keywords and Ratings items directly into the parent Product, as if we had valuebased collections. Pictorially, we move from the diagram on the left to the one on the right in Figure 8: For the tabular representation, we show explicit arrows for the relationship from foreign keys to primary keys. Again, pictorially, we move from the diagram on the left to the one on the right in Figure 9: When we do a side-by-side comparison of the two rightmost diagrams, we notice two interesting facts: ˲˲ In the object-graph model, the
Without an index, however, this query requires a full table scan and hence takes quadratic time in the length of the Products table. The ability to create indexes makes a closed-world assumption. For example, if we modify the previous ad-hoc query to find all pairs of Web pages where one page has a URL referencing the other, it should be obvious that building an index for this join is quite a hard task when you do not have the whole Web available inside a single database: from p1 in WWW from p2 in WWW where p2.Contains(p1.URL) select new{ p1, p2 };
Summarizing what we have learned so far, we see that in order to use a relational database, starting with a natural hierarchical object model, the designer needs to normalize the data model into multiple types that no longer reflect the original intent; the application developer must reencode the original hierarchical structure by decorating the normalized data with extra metadata; and, finally, the database implementer has to speed up queries over the normalized data by building indexes that essentially re-create the original nested structure of the data as well.
identity of objects is intensional—that is, object identity is not part of the values themselves but determined by their keys in the store. In the relational model, object identity is extensional—that is, object identity is part of the value itself, in the form of a primary key. ˲˲ Modulo the notion of object identity, the two representations are extremely similar; the only difference is that the arrows are reversed! At this point, it appears there is a strong correspondence between these two representations: they both consist of a collection of elements (objects or rows) and a collection of arrows between the elements, the only difference being the direction of the arrows. Is there some precise way of describing such a situation? Fortunately, there is an entire area of mathematics designed exactly for this: category theory.1 Obviously, the precise formalization of SQL and noSQL as categories is outside the scope of this article, but it is illustrative to learn a little bit of category theory nonetheless. Category theory arose from studies of mathematical structures and an attempt to relate classes of structures by considering the relations between them. Thus, category theory expresses mathematical concepts in terms of objects, arrows between objects, and the composition of
Figure 8. Object graph for Products collection with keywords and ratings inlined.
0
Title
Author
Year
Pages
1979
320
Keywords
Title
Author
Year
Pages
The Right Stuff
Tom Wolfe
1979
320
Ratings
Chars American
54
Chars
Chars
0
The Right Stuff
Tom Wolfe
1
Chars
Book
Hardcover
American
communications o f th e ac m
Chars Hardcover
Chars
Chars Book 0
Chars
Chars
4 stars
****
| april 2 0 1 1 | vol. 5 4 | no. 4
Chars ****
2
Chars
Keywords
Chars 4 stars
Ratings
practice Figure 9. Relational tables for Products database with explicit relationships.
Rating
ProductID
ID
Rating
ProductID
**** 4 stars
1579124585
787
1579124585
1579124585
747
**** 4 stars
ID
Title
Author
Year
Pages
ID
Title
Author
Year
Pages
1579124585
The Right Stuff
Tom Wolfe
1979
304
1579124585
The Right Stuff
Tom Wolfe
1979
304
ID 787 747
ID
Keyword
ProductID
ID
Keyword
ProductID
4711
Book
1579124585
4711
Book
1579124585
1843
Hardcover
1579124585
1843
Hardcover
1579124585
2012
American
1579124585
2012
American
1579124585
arrows, along with some axioms that the composition of arrows should satisfy. A computational view of category theory is that it is a highly stylized, compact functional language. Small as it is, it’s enough to represent all of mathematics. For computer scientists, category theory has proved to be a rich source of techniques that are readily applicable to many real-world problems. For example, Haskell’s approach to modeling imperative programming is lifted straight from a categorical model of the problem. The first powerful concept from category theory that we will use is duality. Examples of duality abound in computer science. Every programmer is familiar with the two dual forms of De Morgan’s law:
1579124585
!(a && b) == (!a)||(!b) !(a||b) == (!a)&&(!b)
Other examples of dualities in computer science are between reference counting and tracing garbage collection, between call-by-value and call-byname, between push- and pull-based collections, and between transactional memory and garbage collection, among many others. Formally, given a category C of objects and arrows, we obtain the dual category co(C) by reversing all the arrows in C. If a statement T is true in C, then its dual statement co(T) is true in the dual category co(C). In the context of this article, we can read “opposite statement” for “dual statement”.
In the SQL category, child nodes point to parent nodes when the foreign key of a child node equals the primary key of the parent node (Figure 10). In the noSQL category, the arrows are reversed. Parent nodes point to child nodes when the child pointer in the parent equals the address of the child node in the store (see Figure 11). In other words, the noSQL category is the dual of the SQL category—noSQL is really coSQL. The implication of this duality is that coSQL and SQL are not in conflict, like good and evil. Instead they are two opposites that coexist in harmony and can transmute into each other like yin and yang. Interestingly, in Chinese philosophy yin symbolizes open and hence corresponds to the open world of coSQL, and yang symbolizes closed and hence corresponds to the closed world of SQL. Because SQL and coSQL are mathematically dual, we can reason precisely about the tradeoffs between the two instead of relying on rhetoric and anecdotal evidence. The accompanying table gives a number of statements and their duals as they hold for SQL and coSQL, respectively. If we really take the duality to heart, we may also choose to (but don’t have to) fine-tune our model for key-value stores to reflect the duality between values and computations, and that between synchronous ACID and asynchronous BASE (basically available, soft state, eventually consistent).13 Looking up a value given its address or key in a key-value story can
involve a computation, which in a truly open world has potential latency and may fail. For example, in the C# language getters and setters, known as properties, can invoke arbitrary code. Perhaps an even better example of a computation-driven key-value store with long latency and high probability of failure (always able to handle a 404) is the Web, with URI (Uniform Resource Identifier) as keys, “resources” as values, and the HTTP verbs as a primitive query and data-manipulation language. On the other hand, in a C-like key-value memory model, we usually make the simplifying assumption that a key lookup in memory takes constant time and does not fail. Traversing a relationship in the closed world of the relational model involves comparing two values for equality, which is guaranteed to succeed because of referential integrity; and vice versa, referential consistency dictates that relationships are value-based. Otherwise, we could never be sure that referential consistency actually holds. Note that comparing keys by value requires that objects in the SQL category are strongly typed, at least enough to identify primary and foreign keys; and dually, since we do not need to know anything about the value of a coSQL object to find it using its key, objects in the coSQL world can be loosely typed. Relationship to the Real World Our abstract model of the SQL category did not impose any restrictions on the
april 2 0 1 1 | vol. 5 4 | n o. 4 | c o m m u n ic at i o n s of t he acm
55
practice Figure 10. SQL arrow from child to parent.
C
P
Figure 11. coSQL arrow from parent to child.
C
P
structure of rows; we assumed only that we could determine a primary or foreign key to relate two rows. In the typical relational model we would further impose the constraint that rows consist of flat sequences of scalar values (the so-called First Normal Form, or 1-NF). If we dualize relations in 1-NF, then we get a key-value model where values consist of either scalars or keys or collections of scalars or keys. Surprisingly, this is precisely the Amazon SimpleDB data model (see Figure 12). If we assume that rows in the for-
eign-/primary-key model are simply blobs and keys are strings, then the dual key-value model is exactly the HTML5 key-value storage model: interface Storage { readonly attribute unsigned long length; getter DOMString key(in unsigned long index); getter any getItem(in DOMString key); setter creator void setItem(in DOM String key, in any data); deleter void removeItem(in DOM String key); void clear(); }
A Little More Category Theory So far we have discussed the basic data models for SQL and coSQL, but we have not yet touched upon queries. By applying a little more category theory we can show how a single abstraction, monads and monad comprehensions, can be used as a unified query language for both SQL and coSQL. To talk about queries, we need to be more precise about what we mean
Consequences of the duality between SQL and coSQL.
SQL
coSQL
Children point to parents
Parents point to children
Closed world
Open world
Entities have identity (extensional)
Environment determines identity (intensional)
Necessarily strongly typed
Potentially dynamically typed
Synchronous (ACID) updates across multiple rows
Asynchronous (BASE) updates within single values
Environment coordinates changes (transactions)
Entities responsible to react to changes (eventually consistent)
Value-based, strong reference (referentially consistent)
Computation-based, weak reference (expect 404)
Not compositional
Compositional
Query optimizer
Developer/pattern
Figure 12. Amazon SimpleDB representation of Products collection.
Title
Author
Year
Pages
Keyword
Rating
The Right Stuff
Tom Wolfe
1979
304
Hardcover
****
American Book
4 stars
by collections of values. Pure relational algebra is based on sets of rows, but actual relational databases use multisets (bags) or ordered multisets (permutations). To model collections abstractly, we look at sets/bags/permutations of rows and apply the category theory dictum: “What is the interface that these various collections of rows implement?” and “How do we generalize queries based on such an interface?” First, let us stick with simple set collections. When we write a SQL query such as SELECT F(a,b) FROM as AS a, bs AS b WHERE P(a,b)
the SQL compiler translates that pretty syntax into a relational-algebra expression in terms of selection, projection, joins, and Cartesian product. As is the custom in the relational world, the various operations are denoted using Greek symbols: πF(σP(as×bs)) There is no reason to restrict the relational algebra operators to work over just sets (or bags, or permutations) of rows. Instead, we can define similar operators on any collection M of values of arbitrary type T. The interface for such abstract collections M is a straightforward generalization of that of sets. It allows us to create an empty collection using the constant ∅; create a singleton collection of type M given some value of type T using the function {_} T→M (the notation T→M denotes a function/closure/lambda expression that maps an argument value of type T to a result collection of type M); and combine two collections into a larger collection using the binary operator ∪ (depending on the commutativity and idempotence of ∪, we obtain the various sorts of collections such as lists, permutations, bags, and sets): ∅ Î M {_} Î T → M ∪ Î M×M → M Using these constructors, we can generalize the traditional relational
56
communi cations o f th e ac m
| april 2 0 1 1 | vol. 5 4 | no. 4
practice algebra operators (selection σP, projection πF, and Cartesian product ×) to operate over generalized collections using the following signatures:
Figure 13. Foreign key relationships between three relational tables.
S
T
U
σP Î M×(T→bool) → M πF Î M×(T→S) → M<S> × Î M×M<S> → M In fact, if we assume a single operator for correlated subqueries, which we call SelectMany, but is often called CROSS APPLY in SQL SelectMany Î M×(T→M<S>) → M<S>
then we can define the other operators in terms of this single one, as follows: σP(as) = SelectMany(as, a ⇒P(a)?{a}:∅) πF(as) = SelectMany(as, a ⇒{F(a)}) as×bs = SelectMany(as, a ⇒π(b ⇒(a,b),bs)) Rather incredibly, an interface of this shape is well known in category theory. It is called a monad, where the type constructor M< _ > is a functor of the monad; the constructor {_} is the unit of the monad; SelectMany is the bind of the monad; and ∅ and ∪ are the neutral element and addition, respectively. For the rest of us, they are just the signatures for methods defined on an abstract interface for collections. This is no theoretical curiosity. We can play the same syntactic tricks that SQL does with relational algebra, but using monads instead. Such monad comprehensions have been recognized as a versatile query language by both functional programmers and database researchers.8 LINQ queries are just a more familiar SQL-like syntax for monad comprehensions. Instead of Greek symbols, LINQ uses human-readable identifiers such as xs.Where(P) for σP(xs) and xs.Select(F) for πF(xs). To accommodate a wide range of queries, the actual LINQ standard query operators contain additional operators for aggregation and grouping such as Aggregate M´(T´T→T) → T GroupBy M´(T→K) → M>
Any data source that implements the standard LINQ query operators
Figure 14. Collection of coSQL documents.
S
can be queried using comprehension syntax, including both SQL and coSQL data sources, as we show in the next section. The .NET framework defines a pair of standard interfaces IEnumerable and IQueryable that are often implemented by data sources to support querying, but it is by no means necessary to use these particular interfaces. Other standard interfaces that support LINQ query operators include the IObservable and IQbservable interfaces that make it possible to use LINQ for complex event processing.10 Scalability and Distribution In contrast to most treatments of noSQL, we did not mention scalability as a defining characteristic of coSQL.
The openness of the coSQL model eases scalable implementations across large numbers of physically distributed machines. When using LINQ to query SQL databases, typically similar rows are stored in tables of some concrete type that implements the IQueryable interface. Relationships between rows are lifted to bulk operations over collections that perform joins across these tables; hence, queries take any number of related tables and from those produce a new one. Pictorially for three tables, this is shown in Figure 13. Because relationships in SQL cross different tables, it is nontrivial to partition the single closed world into independent worlds of “strongly connected” components that can be treated independently. In a closed world, however, query optimizers can leverage all the information and statistics that are available about the various tables. This allows users to write declarative queries focusing on the “what” and letting the system take care of the “how.” In the coSQL case, a typical scenario is to have a single collection of type IQueryable<S> of (pointers to) selfcontained denormalized “documents” of type S. In that case, queries have type IQueryable<S> → R (see Figure 14). When there are no cross-table relationships, collections {x0,x1,…,x n–1} M<S> that are the source of coSQL queries can be naturally horizontally partitioned, or sharded, into individual subcollections {x0}∪{x1}∪…∪{x n–1}, and each such subcollection {xi} can be distributed across various machines on a cluster. For a large subset of coSQL queries, the shape of the query closely follows the shape of the data. Such homomorphic queries map a collection xs={x0}∪{x1}∪…∪{x n–1} to the value f(x0)⊕f(x1)⊕…⊕f(x n-1)—that is, they are of the form xs.Select(f).Aggregate(⊕) for some function f Î
Figure 15. Signature for MapReduce in DryadLINQ. MapReduce Î IQueryable<S> x(S→IEnumerable<M>) x(M→K) x(KxIEnumerable<M>→R) →IQueryable
// // // //
source mapper key selector reducer
april 2 0 1 1 | vol. 5 4 | n o. 4 | c o m m u n i c at i o n s of t he acm
57
practice S→R and binary operator ⊕ Î R´R→R. In fact, Richard Bird’s first homomorphism lemma3 says that any function h Î M<S>→R is a homomorphism with respect to ∪ if and only if it can be factored into a map followed by a reduce: h(xs) = xs.Select(f).Aggregate(⊕). Mathematics dictates that coSQL queries are performed using MapReduce.6 Practical implementations of MapReduce usually slightly generalize Bird’s lemma to use SelectMany instead of Select so that the map phase can return a collection instead of a single value, and insert an intermediate GroupBy as a way to “write” equivalence classes of values from the map phase into the key-value store for subsequent processing in the reduce phase, and then aggregate over each subcollection: xs.SelectMany(f).GroupBy(s).Select((k,g) ⇒g.Aggregate(⊕k))
For example, DryadLINQ15 uses the type PartitionedTable<S>:IQuer yable<S> to represent the partitioned input collection for LINQ queries and then implements MapReduce over the partitioned collection using the function illustrated in Figure 15. In an open world where collections are distributed across the network, it is much harder for a query optimizer to perform a global optimization taking into account latency, errors, etc. Hence, most coSQL databases rely on explicit programmatic queries of a certain pattern such as MapReduce that can be executed reliably on the target physical machine configuration or cluster. Conclusion The nascent noSQL market is extremely fragmented, with many competing vendors and technologies. Programming, deploying, and managing noSQL solutions requires specialized and low-level knowledge that does not easily carry over from one vendor’s product to another. A necessary condition for the network effect to take off in the noSQL database market is the availability of a common abstract mathematical data model and an associated query language for noSQL that removes product differentiation at the logical level and 58
communi cations o f th e ac m
instead shifts competition to the physical and operational level. The availability of such a common mathematical underpinning of all major noSQL databases can provide enough critical mass to convince businesses, developers, educational institutions, etc. to invest in noSQL. In this article we developed a mathematical data model for the most common form of noSQL—namely, key-value stores as the mathematical dual of SQL’s foreign-/primary-key stores. Because of this deep and beautiful connection, we propose changing the name of noSQL to coSQL. Moreover, we show that monads and monad comprehensions (i.e., LINQ) provide a common query mechanism for both SQL and coSQL and that many of the strengths and weaknesses of SQL and coSQL naturally follow from the mathematics. In contrast to common belief, the question of big versus small data is orthogonal to the question of SQL versus coSQL. While the coSQL model naturally supports extreme sharding, the fact that it does not require strong typing and normalization makes it attractive for “small” data as well. On the other hand, it is possible to scale SQL databases by careful partitioning.2 What this all means is that coSQL and SQL are not in conflict, like good and evil. Instead they are two opposites that coexist in harmony and can transmute into each other like yin and yang. Because of the common query language based on monads, both can be implemented using the same principles.
BASE: An Acid Alternative Dan Pritchett http://queue.acm.org/detail.cfm?id=1394128
Acknowledgments Many thanks to Brian Beckman, Jimmy “the aggregator” Nilsson, BedarraDave Thomas, Ralf Lämmel, Torsten Grust, Maarten Fokkinga, Rene Bouw, Alexander Stojanovic, and the anonymous referee for their comments that drastically improved the presentation of this paper, and of course to Dave Campbell for supporting work on all cool things LINQ.
Erik Meijer ([email protected]) has been working on “Democratizing the Cloud” for the past 15 years. He is perhaps best known for his work on the Haskell language and his contributions to LINQ and the Reactive Framework (Rx).
Bridging the Object-Relational Divide Craig Russell http://queue.acm.org/detail.cfm?id=1394139
References 1. Awodey, S. Category Theory (2nd edition). Oxford University Press, 2010. 2. Baker, J., Bond, C. et al. Megastore: providing scalable, highly available storage for interactive services. Conference on Innovative Data Systems Research. (2011). 3. Bird, R. An introduction to the theory of lists. In Logic Programming and Calculi of Discrete Design. M. Broy, ed. Springer-Verlag (1987), 3–42. 4. Codd, T. A relational model of data for large shared data banks. Commun. ACM 13 (June 1970). 5. Copeland, G.and Maier, D. Making Smalltalk a database system. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 1984. 6. Fokkinga, M. MapReduce—a two-page explanation for laymen; http://www.cs.utwente.nl/~fokkinga/ mmf2008j.pdf. 7. Ghosh, R.A. An economic basis for open standards (2005); flosspols.org. 8. Grust, T. 2003. Monad comprehensions: a versatile representation for queries. In The Functional Approach to Data Management. P. Gray, L. Kerschberg, P. King, and A. Poulovassilis, Eds. Springer Verlag, 2003, 288–311. 9. Grust, T., Rittinger, J., Schreiber, T. Avalanchesafe LINQ compilation. Proceedings of the VLDB Endowment 3 (1–2), 2010. 10. Meijer, E. Subject/Observer is dual to iterator. Presented at FIT: Fun Ideas and Thoughts at the Conference on Programming Language Design and Implementation (2010); http://www.cs.stanford.edu/ pldi10/fit.html. 11. Meijer, E., Beckman, B., Bierman, G. LINQ: reconciling objects, relations, and XML in the .NET framework. Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, New York, 2006. 12. Pirayoff, R. Economics Micro & Macro. Cliffs Notes, 2004. 13. Pritchett, D. BASE: an Acid alternative. ACM Queue (July 2008). 14. Stonebraker, M., Hellerstein, J.M. What goes around comes around. In Readings in Database Systems (Fourth Edition). M. Stonebraker, and J.M. Hellerstein, eds. MIT Press, Cambridge, MA, 2005, 2–41. 15. Yuan Yu, M.I. DryadLINQ: A system for generalpurpose distributed data-parallel computing using a high-level language. Operating Systems Design and Implementation. 2008.
Gavin Bierman ([email protected]) is a senior researcher at Microsoft Research Cambridge focusing on database query languages, type systems, semantics, programming language design and implementation, data model integration, separation logic, and dynamic software updating.
Related articles on queue.acm.org A Conversation with Erik Meijer and Jose Blakeley http://queue.acm.org/detail.cfm?id=1394137
| april 2 0 1 1 | vol. 5 4 | no. 4
© 2011 ACM 0001-0782/11/04 $10.00
AdvAnCe Your CAreer wiTh ACM TeCh PACkS…
For Serious Computing Professionals.
Searching through technology books, magazines, and websites to find the authoritative information you need takes time. That’s why ACM created “Tech Packs." • Compiled by subject matter experts to help serious computing practitioners, managers, academics, and students understand today’s vital computing topics. • Comprehensive annotated bibliographies: from ACM Digital Library articles, conference proceedings, and videos to ACM Learning Center Online Books and Courses to non-ACM books, tutorials, websites, and blogs. • Peer-reviewed for relevance and usability by computing professionals, and updated periodically to ensure currency. Access to ACM resources in Tech Packs is available to ACM members at http://techpack.acm.org or http://learning.acm.org.
Current topics include Cloud Computing and Parallel Computing. In development are Gaming, Globalization/Localization, Mobility, Security, and Software as a Service (SaaS). Suggestions for future Tech Packs? Contact: Yan Timanovsky ACM Education Manager [email protected]
contributed articles How to identify, instantiate, and evaluate domain-specific design principles for creating more effective visualizations. by Maneesh Agrawala, Wilmot Li, and Floraine Berthouzoz
Design Principles for Visual Communication diagrams, sketches, charts, photographs, video, and animation is fundamental to the process of exploring concepts and disseminating information. The most-effective visualizations capitalize on the human facility for processing visual information, thereby improving comprehension, memory, and inference. Such visualizations help analysts quickly find patterns lurking within large data sets and help audiences quickly understand complex ideas. Over the past two decades a number of books10,15,18,23 have collected examples of effective visual displays. One thing is evident from inspecting them: the best are carefully crafted by skilled human designers. Yet even with the aid of computers, hand-designing effective visualizations is time-consuming and Visual commun i c atio n via
60
communi cations o f th e ac m
| April 2 0 1 1 | vol . 5 4 | no. 4
requires considerable effort. Moreover, the rate at which people worldwide generate new data is growing exponentially year to year. Gantz et al.5 estimated we collectively produced 161 exabytes of new information in 2006, and the compound growth rate between 2007 and 2011 would be 60% annually. We are thus expected to produce 1,800 exabytes of information in 2011, 10 times more than the amount we produced in 2006. Yet acquiring and storing this data is, by itself, of little value. We must understand it to produce real value and use it to make decisions. The problem is that human designers lack the time to hand-design effective visualizations for this wealth of data. Too often, data is either poorly visualized or not visualized at all. Either way, the results can be catastrophic; for example, Tufte24 explained how Morton Thiokol engineers failed to visually communicate the risks of launching the Challenger Space Shuttle to NASA management in 1986, leading to the vehicle’s disasterous failure. While Robison et al.20 argued the engineers must not be blamed for the Challenger accident, better communication of the risks might have prevented the disaster. Skilled visual designers manipulate the perception, cognition, and
key insights D esign principles connect the visual design of a visualization with the viewer’s perception and cognition of the underlying information the visualization is meant to convey.
Identifying and formulating good
design principles often requires analyzing the best hand-designed visualizations, examining prior research on the perception and cognition of visualizations, and, when necessary, conducting user studies into how visual techniques affect perception and cognition.
G iven a set of design rules and
quantitative evaluation criteria, we can use procedural techniques and/or energy optimization to build automated visualization-design systems.
Illustratio n by M ark S killicorn
doi:10.1145/1924421.1924439
6 1 2a 5 4
7
14
2b
18 17 3
9
15
12
13 8 11
cred it t k
16
april 2 0 1 1 | vol. 5 4 | n o. 4 | c o m m u n ic at i o n s of t he acm
61
contributed articles
communicative intent of visualizations by carefully applying principles of good design. These principles explain how visual techniques can be used to either emphasize important information or de-emphasize irrelevant details; for example, the most important information in a subway map is the sequence of stops along each line and the transfer stops that allow riders to change lines. Most subway passengers do not need to know the true geographic path of each line. Based on this insight, map designer Harry Beck redesigned the map of the London Underground in 1933 using two main principles: straightening the subway lines and evenly spacing
the stops to visually emphasize the sequence of stops and transfer points (see Figure 1). Such design principles connect the visual design of a visualization with the viewer’s perception and cognition of the underlying information the visualization is meant to convey. In the field of design, there is a long-standing debate regarding the interaction of aesthetic and functional properties of designed artifacts. We do not seek to engage in this debate here; rather, we focus on how particular design choices affect the perception and cognition of the visualization, not the aesthetic style of the visualization. Accordingly, we use the term “design principle” as
Figure 2. Hand-designed cutaway and exploded-view illustrations (left) design the cuts and explosions to emphasize the shape of the missing geometry and spatial relationships among parts. Our system incorporates such principles to generate interactive cutaway and explodedview illustrations (middle, right). 62
communications o f th e acm
| April 2 0 1 1 | vol . 5 4 | no. 4
Figure 1. H a rry Bec k © Tfl f ro m t he Lond on T ransp ort M useum collect ion F igure 2. (lef t) St ephen Bi est y © D orl ing Ki nd ersley; (righ t) LifeART images
Figure 1. Harry Beck’s map of the London Underground from 1933. Beck straightened the lines and more evenly spaced the stops to visually emphasize the sequence of stops along each line.
a shorthand for guidelines that help improve viewers’ comprehension of visually encoded information. Design principles are usually not strict rules, but rules of thumb that might even oppose and contradict one another. For instance, Beck did not completely straighten the subway lines; he included a few turns in them to give viewers a sense of a line’s overall spatial layout. Skilled visual designers implicitly apply the relevant design principles and balance the trade-offs between them in an iterative process of creating example designs, critiquing the examples, and improving the designs based on the critiques. Designers usually do not directly apply an explicitly defined set of design principles. The principles are a form of tacit knowledge that designers learn by creating and studying examples. It is far more common for books on visual design to contain visual examples rather than explicit design principles. Many of the analysts and end users inundated with data and charged with creating visualizations are not trained designers. Thus, our work aims to identify domain-specific design principles, instantiating them within automated visualization design systems that enable non-designers to create effective visual displays. While other researchers have considered specific ways to use cognitive design principles to generate visualizations (see the online appendix) we have been developing a general, three-stage approach for creating visualization design systems:
contributed articles
Figure 4. David Macaul ay, T he New Way T hings Work
Stage 1. Identify design principles. We identify domain-specific design principles by analyzing the best handdesigned visualizations within a particular information domain. We connect this analysis with research on perception and cognition of visualizations; Stage 2. Instantiate design principles. We encode the design principles into algorithms and interfaces for creating visualizations; and Stage 3. Evaluate design principles. We measure improvements in information processing, communication, and decision making that result from our visualizations. These evaluations also serve to validate the design principles. We have used this three-stage approach to build automated visualization design systems in two domains: cartographic visualization and technical illustration. In the domain of cartographic visualizations we have developed automated algorithms for creating route maps1,3,12 and tourist maps of cities.8 In the domain of technical illustration we have developed automated techniques for generating assembly instructions of furniture and toys2,9 and for creating interactive cutaway and exploded-view illustrations of complex mechanical, mathematical, and biological objects.11,13,14,19 Here, we focus on articulating the techniques we have used to identify and evaluate the design principles for each domain. These techniques generalize to other domains, and applying our three-stage approach will result in a better understanding of the
Figure 3. Exploded views of complex mathematical surfaces are designed to reveal local geometric features (such as symmetries, self-intersections, and critical points).
strategies people use to make inferences from visualizations. Stage 1. Identify Design Principles Design principles are prescriptive rules describing how visual techniques affect the perception and cognition of the information in a display. In some cases, they are explicitly outlined in books; for example, books on photography techniques explain the rules for composing pleasing photographs (such as cropping images of people just below the shoulders or near the waist, rather than at the neck or the knees). Researchers have directly applied them to build a variety
of automated photo-manipulation algorithms (see the online appendix for examples). However, our experience is that design principles are rarely stated so explicitly. Thus, we have developed three strategies for extracting and formulating domain-specific design principles: (1) analyze the best hand-designed visualizations in the domain, (2) examine prior research on the perception and cognition of visualizations, and, when necessary, (3) conduct new user studies that investigate how visual techniques affect perception and cognition. Hand-designed visualizations. We have found that a useful first step in identifying design principles is to analyze examples of the best visualizations in the domain. This analysis is designed to find similarities and recurring patterns in the kinds of information the visualizations highlight, as well as the techniques used to emphasize the information. Consider the problem of depicting the internal structure of complex mechanical, mathematical, anatomical, and architectural objects. Illustrators often use cutaways and exploded views to reveal such structure. They carefully choose the size and shape of cuts, as well as the placement of the parts relative to one another, to expose and highlight the internal structure and spatial relationships between parts. We have analyzed a large corpus of cutaways and exploded views to identify the principles and conventions expert illustrators commonly use to generate these images.11,13,14,19 Our process for
Figure 4. Hand-designed “how things work” illustrations (a) use motion arrows and frame sequences to convey the motion and interactions of the parts within a mechanical assembly. Our system analyzes a geometric model (b) of a mechanical assembly to infer the motion and interactions of the parts, then generates the motion arrows and frame sequences (c–d) necessary to depict how the assembly works. april 2 0 1 1 | vol. 5 4 | n o. 4 | c o m m u n i c at i o n s of t he acm
63
contributed articles
identifying these principles is based on three main objectives: Style independence. In order to identify a general set of principles we could apply to a variety of complex 3D objects, we looked for visual techniques common across different artistic styles and types of objects; Generative rules. To ensure that we could apply the principles in a generative manner to create cutaways or exploded views, we formed explicit, well-defined rules describing when and how each principle should be applied. We designed the rules to be as general as possible while remaining consistent with the evidence from the example illustrations; and Perceptual/cognitive rationale. We motivated each principle by hypoth-
esizing a perceptual or cognitive rationale explaining how the convention helps viewers better understand the structure of the 3D object depicted. Through this analysis, we identified a set of general, perceptually motivated design principles for creating cutaways and exploded views. For instance, the size and shape of cuts in a cutaway illustration are often designed to not only reveal internal parts but to help viewers mentally reconstruct any occluding geometry that has been removed. Thus, illustrators cut radially symmetric objects with wedge-shape cutaways that emphasize the object’s cylindrical structure. Similarly, rectangular objects are cut with object-aligned cutting planes, or box cuts; skin-like covering sur-
Figure 6. A general-purpose computer-generated map of San Francisco (left) is not an effective destination map because it is cluttered with extraneous information and neighborhood roads disappear. Our destination map (right) includes only the relevant highways, arterials, and residential roads required to reach a destination. The layout and rendering style further emphasize the information required to reach it. 64
communi cations o f th e acm
| April 2 0 1 1 | vol . 5 4 | no. 4
faces are cut using window cuts; and long tubular structures are cut using transverse tube cuts. Illustrations of complex mathematical surfaces often use exploded views in which each slice is positioned to reveal local geometric features (such as symmetries, self-intersections, and critical points). We have also examined “how things work” illustrations designed to show the movement and interaction of parts within a mechanical assembly. The hand-designed illustrations often use diagrammatic motion arrows and sequences of frames to help viewers understand the causal chains of motion that transform a driving force into mechanical movement. After identifying the design principles, we implemented them algorithmically within interactive systems for generating cutaways, exploded views, and how-things-work illustrations (see Figures 2, 3, and 4). We applied a similar approach to identify the design principles for depicting route maps that provide directions from one location to another1,3 and destination maps that show multiple routes from all around a region to a single location (such as an airport or a popular restaurant).12 We analyzed a variety of such hand-drawn maps and found they are often far more useful than computer-generated driving directions (available at sites like maps. bing.com and maps.google.com) because they emphasize roads, turning points, and local landmarks. These hand-designed maps significantly distort the distance, angle, and shape of roads while eliminating many details that would only serve to clutter the
Figure 5. M apPoint screensh ot rep rinted wit h p ermissio n of Microsof t Corp. F igure 6. Google M aps
Figure 5. A computer-generated route map rendered at a fixed scale does not depict (left) all the turns necessary for navigation. A handdesigned map (middle) emphasizes the turning points by exaggerating the lengths of short roads and simplifying the shape of roads. Our LineDrive system incorporates these design principles (right) into an automated map-design algorithm.
contributed articles
Figure 7. Unique M edia
map. Tufte23 pointed out that triptiks and subway maps similarly distort the shape of routes and eliminate unnecessary detail. Hand-designed destination maps include only the major routes to a location rather than all possible routes. These maps progressively increase the level of detail, showing only the highways far from the destination while including arterial roads and finally the residential roads near the destination. Both route and destination maps typically use multi-perspective rendering in which the roads are drawn in top-down plan view while important landmark buildings are drawn from a side view so their facades are visible. Although analyzing hand-designed visualizations is often a good initial approach for identifying design principles, this strategy also involves limitations. In some cases it may be tempting to form generative rules that are too specific and do not apply outside the range of analyzed examples. In other cases the rules may be so general it is unclear how to apply them to specific examples. Such difficulties often arise when the perceptual or cognitive rationale behind a particular visual technique is not clear. In the context of route maps, for example, although our analysis revealed that mapmakers often distort road length, angle, and shape, it was not immediately clear how such distortions improved the perception and cognition of route maps. Similarly, we have found that one of the challenges in analyzing handdesigned visualizations is to factor
out differences due to artistic style. Designers may choose visual attributes (such as font type, color palette, and line width) for aesthetic reasons whereby one font may simply look nicer than another to the designer. Although such aesthetic design choices are important considerations, the goal of our analysis is to determine how the design choices improve the perception and cognition of the information, rather than how these choices improve aesthetics. The difficulty is that these design choices often affect both the aesthetics of the display and the perception and cognition of the information; how to separate the two effects is not always clear. In light of these limitations and challenges, we have found it is often useful to connect our observations and hypotheses from the analysis of hand-designed examples with relevant work from perception and cognitive psychology. These connections serve to clarify the perceptual or cognitive rationale for the design principles. Prior work in perception and cognition. In some cases, prior research in perception and cognition suggests or formalizes the appropriate design principles; for example, cognitive psychologists have shown that people think of routes as a sequence of turns25 and that when following a route the exact length of a road is far less important than properly executing the turns. The topology of the route is more important than its absolute geometry. This insight helps explain why handdrawn maps often distort geometry—
distance, angle, and shape of roads— to ensure that all roads and turning points are visible, but almost never modify the topology of the route. In this case, the prior research confirmed and formalized the perceptual/ cognitive rationale for the visual techniques we first noticed when analyzing hand-drawn route maps. Based on the resulting design principles, we developed LineDrive (http://vis.berkeley. edu/LineDrive), a fully automated system for rendering route maps in the style of hand-drawn maps.3 LineDrive has been publicly accessible since October 2000, and surveys have shown that for navigation tasks users strongly prefer LineDrive maps to computergenerated maps drawn at a fixed scale (see Figure 5). Researchers have also found that navigators familiar with a geographic area (such as cab drivers) plan routes hierarchically.4 They first select the highways necessary to get close to the destination, then the arteries, and finally the residential streets. Such hierarchical planning corresponds to the progressive increase in road detail we first identified in hand-designed destination maps. We recently applied this level-of-detail principle in conjunction with the distortion principles to build an automated system for generating destination maps.12 As in LineDrive, we produced a map that looks hand-drawn but that eliminates clutter while preserving the information necessary for anyone in the surrounding region to reach the destination. Our destination maps are
Figure 7. A hand-designed tourist map of San Francisco emphasizes semantically, visually, and structurally important landmarks, paths, districts, nodes, and edges, using multi-perspective rendering to ensure the facades of buildings are visible (left). Our tourist-map design system is based on these principles and similarly emphasizes the information most important for tourists in this map of San Francisco (right). april 2 0 1 1 | vol. 5 4 | n o. 4 | c o m m u n ic at i o n s of t he acm
65
contributed articles available through the Bing Maps Web site (http://vis.berkeley.edu/DestMap) (see Figure 6). We applied a similar approach to automatically generating maps for tourists visiting a new city.8 Prior work on mental representations of cities16 showed that people consider five main elements: landmarks, paths, districts, nodes, and edges. However, a map with every instance of such elements would be cluttered with excessive detail. The most-effective tourist maps include only those elements that are semantically meaningful (such as the home of a well-known writer), visually distinctive (such as an oddly shaped or colored building), or placed in a structurally important location (such as a building at a prominent intersection).22 After choosing the elements to include in the map, mapmakers usually apply a variety of cartographicgeneralization techniques, including simplification, displacement, deformation, and selection. Cognitive psychologists and cartographers studying the cognition of maps have shown such generalizations improve clarity because they emphasize the most important map elements while preserving spatial relationships between these elements.17 Our tourist-map-design system is based on these design principles. Input consists of a geometric model of a city, including streets, bodies of water, parks, and buildings (with textures). The system automatically determines the importance of map elements us-
ing top-down Web-based informationextraction techniques to compute semantic importance and bottom-up vision-based image/geometry analysis to compute visual and structural importance. It then generates a map that emphasizes the most important map elements, using a combination of multi-perspective rendering and cartographic generalization to highlight the important landmarks, paths, districts, nodes, and edges while deemphasizing less-important elements (see Figure 7). Experiments on perception and cognition. In some domains, new perception and cognition research is required to provide the rationale for the design principles. Working with cognitive psychologist Barbara Tversky, we developed a methodology for conducting human-subject experiments to understand how people think about and communicate the information within a domain. We first applied this methodology to identify the design principles for creating assembly instructions for everyday objects (such as furniture and toys).2,9 The experiments are conducted in three phases: Production. Participants create visualizations for a given domain. In the context of assembly instructions, they assembled a TV stand without instructions using only a photograph of the assembled stand as a guide. They then drew a set of instructions showing how to assemble it; Preference. Participants rate the effectiveness of the visualizations. In
the assembly-instructions project, a new set of participants assembled the TV stand, without instructions. They then rated the quality of the instructions created by the first set of participants, redrawn to control for clarity, legibility, and aesthetics; and Comprehension. Participants use the ranked visualizations, and we test for improvements in learning, comprehension, and decision making. In the assembly-instructions project, yet another set of participants assembled the TV stand, this time using the instructions rated in the preference phase. Tests showed the highly rated instructions were easier to use and follow; participants spent less time assembling the TV stand and made fewer errors. Following these experiments, we look for commonalities in the highly rated visualizations to identify the design principles. In the context of assembly instructions, we identified three main principles: (1) use a stepby-step sequence of diagrams showing one primary action in each diagram; (2) use guidelines and arrows to depict the actions required to fit parts together; and (3) ensure that the parts added in each step are visible. Our automated assembly-instruction-design system is based on these principles (see Figure 8). Tversky and Lee25 have studied mental representations of maps using a similar methodology, where subjects first draw maps to familiar locations, then other subjects rate the effectiveness of the maps.
Figure 8. We asked subjects to assemble a TV stand and then create instructions for a novice explaining how to assemble it (left, middle). Analyzing hand-drawn instructions, we found that diagrammatic, step-by-step instructions using guidelines and arrows to indicate the actions required for assembly and providing good visibility for the attached parts are easiest to use and follow. Our system automatically generates assembly instructions (right) based on these principles. 66
communi cations o f th e acm
| April 2 0 1 1 | vol . 5 4 | no. 4
contributed articles Stage 2. Instantiate Design Principles Designing a visualization usually requires choosing visual properties or attributes for each element in the display; for example, to create a route map, the designer must choose attributes, including position, size, and orientation for each road, landmark, and label that appears in the map. Similarly, to create a cutaway illustration, the designer must choose how and where to cut each structure that occludes the target part. Because there are many possible choices for each attribute, the design space of possible visualizations is usually quite large. To build automated visualization design systems, we treat the relevant design principles as guidelines for making these design decisions. The principles help us navigate through the design space and obtain an effective design. Most design principles are stated as qualitative guidelines, rather than as procedures we can directly instantiate in an automated design algorithm. The challenge is to transform such high-level principles into implementable algorithms. Design principles generally fall into two categories: design rules and evaluation criteria. Design rules separate the design space into regions containing effective designs and those containing inviable designs. They are essentially hard constraints in the design space. In creating route maps, for example, designers commonly adjust the turn angle to emphasize the orientation of the turn, to the left or to the right. However, adjusting the turn angle so much that a left turn appears to be a right turn or vice versa is unacceptable. This design rule puts a hard constraint on how much designers are able to adjust the turn angle. Evaluation criteria quantify the effectiveness of some aspect of the visualization. We can assess the overall effectiveness of a visualization by considering a set of evaluation criteria covering all major aspects of the visual design. In creating an exploded view, for instance, designers must balance two such criteria: part separation and compactness. A good exploded view separates the parts so all of them are visible, yet the visualization must also remain compact and maintain a
These principles explain how visual techniques can be used to either emphasize important information or de-emphasize irrelevant details in the display.
roughly square aspect ratio to make the best use of available screen space. To quantify the overall effectiveness of an exploded view we measure the visibility of each part, as well as the compactness of the overall visualization. Similarly, in designing route maps, designers must ensure that all roads are visible. To quantify this criterion, we compute the length of each road in the map and check the length is greater than some minimum visibility threshold. The number of roads longer than the threshold length is a quantitative measure of the effectiveness of the map with respect to this criterion. Given a set of design rules and quantitative evaluation criteria, we can use procedural techniques to build an automated visualization design system; for example, our system for designing cutaways and exploded views is driven exclusively by procedural techniques. In this case, we encode the design rules as a decision tree describing how to cut or explode away occluding parts based on their geometry. Another approach is to consider visualization design as an energy-minimizing optimization problem. In this case, we treat the design rules as hard constraints that define the boundaries of the design space and the evaluation criteria as soft constraints that guide the system to the optimal visualization. While this optimization-based approach is general, we have found it essential to develop a set of design rules and evaluation criteria that sufficiently limit the design space so it is feasible to complete the optimization. Both LineDrive and our assembly-instruction design system use such an energy-minimizing optimization. Stage 3. Evaluate Design Principles The final stage of our approach is to measure the usefulness of the visualizations produced by our automated design systems. We consider several such measures, including feedback from users in the form of qualitative interviews and quantitative usage statistics. In some cases, we have also conducted more-formal user studies to check how well the visualizations improve information processing, communication, and decision making.
april 2 0 1 1 | vol. 5 4 | n o. 4 | c o m m u n ic at i o n s of t he acm
67
contributed articles User feedback. We find it is critical to involve users early on and conduct qualitative interviews and surveys to check their overall impressions of the visualizations produced by our systems. Such feedback is essential for identifying problems and ensuring our design principles and the visualizations converge on effective designs. The interviews and surveys provide high-level checks of the effectiveness of our design principles and allow us to tweak the principles when not quite right; for example, early on building LineDrive, we asked users to rate handcrafted prototype route-map designs, finding that 79 out of 90 respondents preferred the distorted LineDrive prototypes to maps drawn to scale1 and confirming that users thought the distorted maps were useful. Continual feedback and evaluation yields moreeffective algorithms and tools. Another approach is to release the visualization on the Web, then check usage statistics; for example, at its peak, LineDrive was serving more than 750,000 maps per day and became the default route-mapping style for MapBlast, an early Web-based provider of route maps. Such public feedback is a strong test of effectiveness, as ineffective solutions are quickly rejected. We also recognize that usage statistics are at best an indirect measure of effectiveness. Many excellent solutions remain little-used due to a variety of external forces that have little to do with the usefulness or effectiveness of a visualization. User studies. To quantitatively assess the effectiveness of a visualization, we conduct user studies comparing visualizations created with our design algorithms to the best handdesigned visualizations in the domain; for example, we have compared our computer-designed instructions to factory-produced instructions and hand-drawn instructions for assembling a TV stand, finding that users completed the assembly task about 35% faster and made 50% fewer errors using our instructions. In addition to completion time and error rate, it is also possible to use eye-tracking to determine how a visualization affects the way people scan and process information.6,21 Such eye-tracking studies help us evaluate the effectiveness 68
communi cations o f th e acm
Many other information domains could benefit from a deeper understanding of the ways visual-display techniques affect the perception and cognition of information.
| April 2 0 1 1 | vol . 5 4 | no. 4
of low-level design choices in creating visualizations. Rigorous user studies are especially important because they also serve to validate the effectiveness of the design principles on which the visualizations are based. However, how to design such quantitative studies is not always clear. How should one visualization be compared against another visualization? For example, in the domain of anatomical illustrations it is not clear how to compare our cutaway illustrations against hand-designed illustrations. What task should we ask users to perform using the two illustrations? One approach might be to measure how quickly and accurately viewers locate a particular organ of the body. However, if the task is to learn the location of the organ, then both illustrations would label the organ, and with labels, speed and accuracy are unlikely to differ significantly. Our cutaways and exploded views are also designed to convey the layering relationship between parts. So, an alternative task might be to ask viewers to indicate the layering relationships between parts. But how can we ask them to complete this task without leading them to an answer? For many domains, like anatomical illustrations, developing a new methodology is necessary for evaluating the effectiveness of visualizations and validating underlying design principles. Conclusion The approach we’ve outlined for identifying, instantiating, and evaluating design principles for visual communication is a general methodology for combining findings about human perception and cognition with automated design algorithms. The systems we’ve built for generating route maps, tourist maps, and technical illustrations demonstrate this methodology can be used to develop effective automated visualization-design systems. However, there is much room for extending our proposed approach, and we hope researchers improve on the methods we have described. Future work can take several directions: Many other information domains could benefit from a deeper understanding of the ways visual-display techniques affect the perception and
contributed articles cognition of information. We commonly encounter a variety of different types of information, including cooking recipes, budgets and financial data, dance steps, tutorials on using software, explanations of strategies and plays in sports, and political polling numbers. Effective visualizations of such everyday information could empower citizens to make better decisions. We have focused our work on identifying domain-specific design principles. An open challenge is to generalize them across multiple domains. One approach might be to first identify domain-specific design principles in very different domains, then look for commonalities between the domainspecific principles; for example, we recently developed an automated system for generating tutorials explaining how to manipulate photographs using Photoshop and GIMP.7 The design principles for photo-manipulation tutorials are similar to those we identified for assembly instructions and include step-by-step sequences of screenshots and highlighting actions through arrows and other diagrammatic elements. Finding such similarities in design principles across multiple domains may indicate more general principles are at work. Though we presented three strategies for identifying design principles, other strategies may be possible as well. The strategies we presented all require significant human effort to identify commonalities in handdesigned visualizations, synthesize the relevant prior studies in perception and cognition, and conduct such studies. Moreover, the Internet makes a great deal of visual content publicly available, often with thousands of example visualizations within an individual information domain. Thus, a viable alternative strategy for identifying design principles may be to learn them from a large collection of examples using statistical machinelearning techniques. We have taken an initial step in this direction, with a project designed to learn how to label diagrams from a few examples.26 One advantage of this approach is that skilled designers often find it easier to create example visualizations than explicitly describe design principles.
Techniques for evaluating the effectiveness of visualizations and validating the design principles could also be improved. Design principles are essentially models that predict how visual techniques affect perception and cognition. However, as we noted, it is not always clear how to check the effectiveness of a visualization. More sophisticated evaluation methodology could provide stronger evidence for these models and thereby experimentally validate the design principles. Acknowledgments We would like to thank David Bargeron, Michael Cohen, Brian Curless, Pat Hanrahan, John Haymaker, Julie Heiser, Olga Karpenko, Jeff Klingner, Johannes Kopf, Niloy Mitra, Mark Pauly, Doantam Phan, Lincoln Ritter, David Salesin, Chris Stolte, Robert Sumner, Barbara Tversky, Dong-Ming Yan, and Yong-Liang Yang for their contributions to this research. Jeff Heer, Takeo Igarashi, and Tamara Munzner provided excellent suggestions and feedback on early drafts of this article. Figure 4a is from The New Way Things Work by David Macaulay: compilation copyright © 1988, 1998 Dorling Kindersley, Ltd., London; illustrations copyright © 1988, 1998 David Macaulay. Used by permission of Houghton Mifflin Harcourt Publishing Company. All rights reserved. References 1. Agrawala, M. Visualizing Route Maps. Ph.D. Thesis, Stanford University, Stanford, CA, 2002; http:// graphics.stanford.edu/papers/maneesh_thesis/ 2. Agrawala, M., Phan, D., Heiser, J., Haymaker, J., and Klingner, J. Designing effective step-by-step assembly instructions. In Proceedings of SIGGRAPH (San Diego, July 27–31). ACM Press, New York, 2003, 828–837. 3. Agrawala, M. and Stolte, C. Rendering effective route maps: Improving usability through generalization. In Proceedings of SIGGRAPH (Los Angeles, Aug. 12–17). ACM Press, New York, 2001, 241–250. 4. Chase, W.G. Spatial representations of taxi drivers. In Acquisition of Symbolic Skills, D.R. Rogers and J.A. Sloboda, Eds. Plenum Press, New York, 1983, 391–405. 5. Gantz, J., Chute, C., Manfrediz, A., Minton, S., Reinsel, D., Schlichting, W., and Toncheva, A. The Diverse and Exploding Digital Universe: An Updated Forecast of Worldwide Information Growth Through 2011. IDC White Paper, Mar. 2008; http://www.emc.com/about/ destination/digital universe/ 6. Goldberg, J., Stimson, M., Lewenstein, M., Scott, N., and Wichansky, A. Eye tracking in Web search tasks: Design implications. In Proceedings of the Eye Tracking Research and Applications Symposium (New Orleans, Mar. 25–27). ACM Press, New York, 2002, 51–58. 7. Grabler, F., Agrawala, M., Li, W., Dontcheva, M., and Igarashi, T. Generating photo manipulation tutorials
by demonstration. ACM Transactions on Graphics 27, 3 (Aug. 2009), 66:1–66:9. 8. Grabler, F., Agrawala, M., Sumner, R.W., and Pauly, M. Automatic generation of tourist maps. ACM Transactions on Graphics 27, 3 (Aug. 2008), 100:1–100:11. 9. Heiser, J., Phan, D., Agrawala, M., Tversky, B., and Hanrahan, P. Identification and validation of cognitive design principles for automated generation of assembly instructions. In Proceedings of Advanced Visual Interfaces (Gallipoli, Italy, May 25–28). ACM Press, New York, 2004, 311–319. 10. Hodges, E. The Guild Handbook of Scientific Illustration. Van Nostrand Reinhold, New York, 1989. 11. Karpenko, O., Li, W., Mitra, N., and Agrawala, M. Exploded-view diagrams of mathematical surfaces. IEEE Transactions on Visualization and Computer Graphics 16, 6 (Oct. 2010), 1311–1318. 12. Kopf, J., Agrawala, M., Salesin, D., Bargeron, D., and Cohen, M.F. Automatic generation of destination maps. ACM Transactions on Graphics 29, 6 (Dec. 2010), 158:1–158:12. 13. Li, W., Agrawala, M., Curless, B., and Salesin, D. Automated generation of interactive exploded-view diagrams. ACM Transactions on Graphics 27, 3 (Aug. 2008), 101:1–101:11. 14. Li, W., Ritter, L., Agrawala, M., Curless, B., and Salesin, D. Interactive cutaway illustrations of complex 3D models. ACM Transactions on Graphics 26, 3 (July 2007), 31:1–31:11. 15. London, B. and Upton, J. Photography. Longman Publishing Group, New York, 1997. 16. Lynch, K. The Image of the City. The MIT Press, Cambridge, MA, 1960. 17. MacEachren, A.M. How Maps Work. The Guilford Press, New York, 1995. 18. Mijksenaar, P. and Westendorp, P. Open Here: The Art of Instructional Design. Joost Elffers Books, New York, 1999. 19. Mitra, N.J., Yang, Y., Yan, D., Li, W., and Agrawala, M. Illustrating how mechanical assemblies work. ACM Transactions on Graphics 29, 4 (July 2010), 29:58:1–58:12. 20. Robison, W., Boisjoly, R., Hoeker, D., and Young, S. Representation and misrepresentation: Tufte and the Morton Thiokol engineers on the Challenger. Science and Engineering Ethics 8, 1 (Jan. 2002), 59–81. 21. Santella, A., Agrawala, M., DeCarlo, D., Salesin, D., and Cohen, M. Gaze-based interaction for semiautomatic photo cropping. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Montréal, Apr. 24–27). ACM Press, New York, 771–780. 22. Sorrows, M. and Hirtle, S. The nature of landmarks for real and electronic spaces. In Proceedings of the International Conference on Spatial Information Theory (Stade, Germany, Aug. 25–29). Springer-Verlag, London, U.K., 1999, 37–50. 23. Tufte, E. Visual Explanations. Graphics Press, Cheshire, CT, 1997. 24. Tufte, E. Envisioning Information. Graphics Press, Cheshire, CT, 1990. 25. Tversky, B. and Lee, P. Pictorial and verbal tools for conveying routes. In Proceedings of the International Conference on Spatial Information Theory (Stade, Germany, Aug. 25–29). Springer-Verlag, London, U.K., 1999, 51–64. 26. Vollick, I, Vogel, D., Agrawala, M., and Hertzmann, A. Specifying label layout by example. In Proceedings of the ACM Symposium on User Interface Software and Technology (Newport, RI, Oct. 7–10). ACM Press, New York, 2007, 221–230.
Maneesh Agrawala ([email protected]) is an associate professor in the Electrical Engineering and Computer Sciences Department of the University of California, Berkeley. Wilmot Li ([email protected]) is a research scientist in the Creative Technologies Lab of Adobe Systems Inc., San Francisco, CA. Floraine Berthouzoz (floraine.berthouzoz@gmail. com) is a Ph.D. candidate in the Electrical Engineering and Computer Sciences Department of the University of California, Berkeley. © 2011 ACM 0001-0782/11/04 $10.00
april 2 0 1 1 | vol. 5 4 | n o. 4 | c o m m u n i c at i o n s of t he acm
69
contributed articles doi:10.1145/1924421.1924440
Despite earlier claims, Software Transactional Memory outperforms sequential code. by Aleksandar DragojeviC´, Pascal Felber, Vincent Gramoli, and Rachid Guerraoui
Why STM Can Be More Than A Research Toy are increasingly the norm in CPUs, concurrent programming remains a daunting challenge for many. The transactional-memory paradigm simplifies concurrent programming by enabling programmers to focus on high-level synchronization concepts, or atomic blocks of code, while ignoring low-level implementation details. Hardware transactional memory has shown promising results for leveraging parallelism4 but is restrictive, handling only transactions of limited size or requiring some system events or CPU instructions to be executed outside transactions.4 Despite attempts to address these issues,19 TM systems fully implemented in hardware are unlikely to be commercially available for at least the next few W h ile m u lti c o re ar c h ite c t u res
70
communi cations o f th e ac m
| April 2 0 1 1 | vol . 5 4 | no. 4
years. More likely is that future deployed TMs will be hybrids containing a software component and a hardware component. Software transactional memory15,23 circumvents the limitations of HTM by implementing TM functionality fully in software. Moreover, several STM implementations are freely available and appealing for concurrent programming.1,5,7,11,14,16,20 Yet STM credibility depends on the extent to which it enables application code to leverage multicore architectures and outperform sequential code. Cascaval et al.’s 2008 article3 questioned this ability and suggested confining STM to the status of “research toy.” STMs indeed introduce significant runtime overhead: Synchronization costs. Each read (or write) of a memory location from inside a transaction is performed by a call to an STM routine for reading (or writing) data. With sequential code, this access is performed by a single CPU instruction. STM read and write routines are significantly more expensive than corresponding CPU instructions, as they typically “bookkeep” data about every access. STMs check for conflicts, log access, and, in case of a write, log the current (or old) value of the data. Some of these operations use expensive synchronization instructions and access shared metadata, further increasing their cost. Compiler overinstrumentation. Using
key insights STM is improving in terms of
performance, often outperforming sequential, nontransactional code, when running with just four CPU cores.
P arallel applications exhibiting high contention are not the primary target for STM so therefore are not the best benchmarks for evaluating STM performance.
STM with support for compiler
instrumentation and explicit, nontransparent privatization outperforms sequential code in all but one workload we used and still supports the programming model that is easy to use by typical programmers.
Illustratio n by studio tonne
an STM, programmers must insert STM calls for starting and ending transactions in their code and replace all memory accesses from inside transactions by STM calls for reading and writing memory locations. This process, called “instrumentation,” can be manual, in which case a programmer manually replaces all memory references with STM calls or else lets it be performed by an STM compiler. With compilers, a programmer needs to specify only which sequences of statements must be atomic by enclosing them in transactional blocks. The compiler generates code invoking appropriate STM read/write calls. While an STM compiler significantly reduces programming complexity, it can also degrade the performance of resulting programs (compared to manual instrumentation) due to overinstrumentation3,8,25; basically, the compiler instruments the code conservatively with unnecessary calls
to STM functions, as it cannot precisely determine which instructions access shared data. Transparent privatization. Making certain shared data private to a certain thread, or “privatization,” is typically used to allow nontransactional accesses to some data, to either support legacy code or improve performance by avoiding the cost of STM calls when accessing private data. Unless certain precautions are taken, privatization can result in race conditions.24 Two approaches to prevent them: Either a programmer explicitly marks transactions that privatize data or STM transparently ensures all transactions safely privatize data. Explicit privatization puts additional burden on the programmer, while transparent privatization incurs runtime overhead,25 especially when no data is actually privatized. Many research papers1–3,5,7,13,16,18,20 have discussed the scalability of STM
with increasing numbers of threads, but few compare STM to sequential code or address whether STM is a viable option for speeding execution of applications. Two notable exceptions are Minh et al.2 and Cascaval et al.3 In the former, STM was shown on a hardware simulator to outperform sequential code in most STAMP2 benchmarks. The latter reported on a series of experiments on real hardware where STM performed worse than sequential code and even implied by the article’s title that STM is only a research toy. A close look at the experiments reveal that Cascaval et al. considered a subset of the STAMP benchmark suite configured in a specific manner while using up to only eight threads. We have since gone a step further, comparing STM performance to sequential code using a larger and more diverse set of benchmarks and real
april 2 0 1 1 | vol. 5 4 | n o. 4 | c o m m u n i c at i o n s of t he acm
71
contributed articles hardware supporting higher levels of concurrency. Specifically, we experimented with a state-of-the-art STM implementation, SwissTM7 running three different STMBench712 workloads, all 10 workloads of the STAMP (0.9.10)2 benchmark suite, and four microbenchmarks, all encompassing both large- and small-scale workloads. We considered two hardware platforms: a Sun Microsystems UltraSPARC T2 CPU machine (referred to as SPARC in the rest of this article) supporting 64 hardware threads and a four quad-core AMD Opteron x86 CPU machine (referred to as x86 in the rest of this article) supporting 16 hardware threads. Finally, we also considered all combinations of privatization and compiler support for STM (see Table 1). This constitutes the most exhaustive performance comparison of STM to sequential code published to date. The experiments in this article (summarized in Table 2) show that STM does indeed outperform sequential code in most configurations and benchmarks, offering a viable paradigm for concurrent programming; STM with manually instrumented benchmarks and explicit privatization outperforms sequential code by up to 29 times on SPARC with 64 concurrent threads and by up to nine times on x86 with 16 concurrent threads. More important, STM performs well with a small number of threads on many benchmarks; for example, STM-ME outperforms sequential code with four threads on 14 of 17 workloads on our SPARC machine and on 13 of 17 workloads on our x86 machine. Basically, these results support early hope about the good performance of STM and should motivate further research. Our results contradict the results of Cascaval et al.3 for three main reasons: ˲˲ STAMP workloads in Cascaval et al.3 presented higher contention than default STAMP workloads; ˲˲ We used hardware supporting
Table 1. STM support.
Model
Instrumentation
Privatization
STM-ME
manual
explicit
STM-CE
compiler
explicit
STM-MT
manual
transparent
STM-CT
compiler
transparent
more threads and in case of x86 did not use hyperthreading; and ˲˲ We used a state-of-the-art STM implementation more efficient than those used in Cascaval et al.3 Clearly, and despite rather good STM performance in our experiments, there is room for improvement, and we use this article to highlight promising directions. Also, while use of STM involves several programming challenges3 (such as ensuring weak or strong atomicity, semantics of privatization, and support for legacy binary code), alternative concurrency programming approaches (such as fine-grain locking and lock-free techniques) are no easier to use than STM. Such a comparison was covered previously11,13,15,23 and is beyond our scope here.
conflicts eagerly. The two-phase contention manager uses different algorithms for short and long transactions. This design was chosen to provide good performance across a range of workloads7; Privatization. We implemented privatization support in SwissTM using a simple validation-barriers scheme described in Spear et al.24 To ensure safe privatization, each thread, after committing transaction T, waits for all other concurrent transactions to commit, abort, or validate before executing application code after T; and Compiler instrumentation. We used Intel’s C/C++ STM compiler1,18 for generating compiler instrumented benchmarks.a We conducted our experiments using the following benchmarks: STMBench7. STMBench712 is a synthetic STM benchmark that models realistic large-scale CAD/CAM/CASE workloads, defining three different workloads with different amounts of contention: read-dominated (10% write operations), read/write (60% write operations), and write-dominated (90% write operations). The main characteristics of STMBench7 are a large data structure and long transactions compared to other STM benchmarks. In this sense, STMBench7 is very challenging for STM implementations; STAMP. Consisting of eight different applications representative of
Evaluation Settings We first briefly describe the STM library used for our experimental evaluation, SwissTM,7 along with benchmarks and hardware settings. Note that our experiments with other state-of-the-art STMs5,14,18,20 on which we report in the companion technical report,6 confirm the results presented here; SwissTM and the benchmarks are available at http://lpd.epfl.ch/site/research/tmeval. The STM we used in our evaluation reflects three main features: Synchronization algorithm. SwissTM7 is a word-based STM that uses invisible (optimistic) reads, relying on a time-based scheme to speed up readset validation, as in Dice et al.5 and Riegel et al.21 SwissTM detects read/ write conflicts lazily and write/write
a Intel’s C/C++ STM compiler generates only x86 code so was not used in our experiments on SPARC.
Table 2. Summary of STM speedup over sequential code.
Speedup
72
STM-ME
Hw threads
Min
Min
Max
Avg
SPARC
64
1.4
29.7
9.1
—
—
—
x86
16
0.54
9.4
3.4
0.8
9.3
3.1
| April 2 0 1 1 | vol . 5 4 | no. 4
Avg
STM-MT
Hardware
communications o f th e ac m
Max
STM-CE Min
Max
STM-CT Avg
Min
Max
1.2
23.6
5.6
—
—
Avg —
0.34
5.2
1.8
0.5
5.3
1.7
contributed articles All threads execute on the same chip with SPARC, so the inter-thread communication costs less, and sequential performance of a single thread on SPARC is much lower. STM-ME delivers good performance on both SPARC and x86 architectures, clearly showing STM-ME algorithms scale and perform well in different settings. However, also important is that, while STM-ME outperforms sequential code on all the benchmarks, some achieved speedups are not very impressive (such as 1.4 times with 64 threads on the ssca2 benchmark). These lower speedups confirm that STM, despite showing great promise for some types of concurrent workloads, is not the best solution for all concurrent workloads.
Figure 1. STM-ME performance.
1
(1a) SPARC
2
4
8
17 24 20 29
14
16
32
64
16
12
Speedup
10 8 6 4 2
Skiplist
Rbtree
Linked List
Hashtable
Yada
Vacation Low
Vacation High
Ssca2
Labyrinth
Kmeans Low
Kmeans High
Intruder
Genome
Bayer
SB7 Write
SB7 Read/Write
SB7 Read
0
(1b) x86 1
2
4
8
16
10 9 8 7 6 5 4 3 2 1
Skiplist
Rbtree
Linked List
Hashtable
Yada
Vacation Low
Vacation High
Ssca2
Labyrinth
Kmeans Low
Kmeans High
Intruder
Genome
Bayer
SB7 Write
SB7 Read/Write
0 SB7 Read
STM-ME Performance Figure 1a outlines STM-ME (manual instrumentation with explicit privatization) speedup over sequential, noninstrumented code on SPARC, showing that STM-ME delivers good performance with a small number of threads, outperforming sequential code on 14 of 17 workloads with four threads. The figure also outlines that STM outperforms sequential code on all benchmarks we used by up to 29 times on the vacation low benchmark. The experiment shows that the
less contention the workload exhibits, the more benefit is expected from STM; for example, STM outperforms sequential code by more than 11 times on a read-dominated workload of STMBench7 and less than two times for a write-dominated workload of the same benchmark. On x86 (see Figure 1b), STM-ME outperforms sequential code on 13 workloads with four threads. Overall, STM clearly outperforms sequential code on all workloads, except on the challenging, high-contention STMBench7 write workload. The performance gain, compared to sequential code, is lower than on SPARC (up to nine times speedup on x86 compared to 29 times on SPARC) for two reasons:
Speedup
real-world workloads, STAMP2 offers a range of workloads and is widely used to evaluate TM systems. STAMP applications can be configured with different parameters defining different workloads. In our experiments, we used 10 workloads from the STAMP 0.9.10 distribution, including lowand high-contention workloads for kmeans and vacation applications and one workload for all other applications. The exact workload settings we used are specified in the companion technical report6; and Microbenchmarks. To evaluate lowlevel overhead of STMs (such as the cost of synchronization and logging) with smaller-scale workloads, we used four microbenchmarks that implement an integer set using different data structures. Every transaction executes a single lookup, insert, or remove of a randomly chosen integer from value range. Initially, the data structures were filled with 216 elements randomly chosen from a range of 217 values. During the experiments, 5% of the transactions were insert operations, 5% were remove operations, and 90% were search operations. It is important to note that these benchmarks are TM benchmarks, most using transactions extensively (with the exception of labyrinth and to a lesser extent genome and yada). Applications that would use transactions to simplify synchronization, but in which only a small fraction of execution time would be spent in transactions, would benefit from STM more than the benchmarks we used. In a sense, the benchmarks we used represent a worst-case scenario for STM usage. For each experiment, we computed averages from at least five runs.
april 2 0 1 1 | vol. 5 4 | n o. 4 | c o m m u n i c at i o n s of t he acm
73
contributed articles Contradicting earlier results. The results reported by Cascaval et al.3 indicated STMs do not perform well on three of the STAMP applications we also used: kmeans, vacation, and genome. In our experiments, STM delivered good performance on all three. Three main reasons for such considerable difference are: Workload characteristics. A close look at the experimental settings in Cascaval et al.3 reveals their workloads had higher contention than the default STAMP workloads. STM usually has the lowest performance in highly contended workloads, consistent with our previous experiments, as in Figure 1. To evaluate the impact of workload characteristics, we ran both default STAMP workloads and STAMP workloads from Cascaval et al.3 on a machine with two quad-core Xeon CPUs that was more similar to the machine in Cascaval et al.3 than to the x86 machine we used in our earlier experiments. Figure 2a outlines slowdown of workloads from Cascaval et al.3 compared to default STAMP workloads; we used both
low- and high-contention workloads for kmeans and vacation. Workload settings from Cascaval et al.3 do indeed degrade STM-ME performance. The performance impact is significant in kmeans (around 20% for high- and up to 200% for low-contention workloads) and in vacation (30% to 50% in both). The performance is least affected in genome (around 10%). Different hardware. We used hardware configurations with support for more hardware threads—64 and 16 in our experiments—compared to eight in Cascaval et al.3 This significant difference lets STM perform better, as there is more parallelism at the hardware level to be exploited. Our x86 machine does not use hyperthreading, while the one used by Cascaval et al.3 does; hardware-thread multiplexing in hyperthreaded CPUs can hamper performance. To evaluate the effect, we ran the default STAMP workloads on a machine with two single-core hyperthreaded Xeon CPUs. Figure 2b outlines the slowdown on a hyperthreaded machine compared
Figure 2. Impact of experimental settings3 on STM-ME performance. (2a) Workload Impact
1
2
4
8
3.5 3
Slowdown
2.5 2 1.5 1 0.5 0
Genome
Kmeans High
Kmeans Low
Vacation High
(2b) Hyper-threading Impact
Vacation Low
1
3
Slowdown
2.5 2 1.5 1 0.5 0
74
Genome
Kmeans High
communications o f th e acm
Kmeans Low
Vacation High
| April 2 0 1 1 | vol . 5 4 | no. 4
Vacation Low
2
4
to a similar machine without hyperthreading. The figure shows that hyperthreading has a significant effect on performance, especially with higher thread counts. Slowdown in genome with four threads is around 65% and on two vacation workloads around 40%. The performance difference in kmeans workloads is significant, even with a single thread, due to differences in CPUs not related to hyperthreading. Still, even with kmeans, slowdown with four threads is much higher than with one and two threads. More-efficient STM. Part of the performance difference is due to a more efficient STM implementation. The results reported by Dragojevic ´ et al.7 suggest that SwissTM has better performance than TL2, performing comparably to the IBM STM in Cascaval et al.3 We also experimented with TL2,5 McRT-STM,1 and TinySTM.20 Tim Harris of Microsoft Research provided us with the Bartok STM14 performance results on a subset of STAMP. All these experiments confirm our general conclusion about good STM performance on a range of workloads.6 Further optimizations. In some workloads, performance degraded when we used too many concurrent threads. One possible alternative to improving performance in these cases would be to modify the thread scheduler so it avoids running more concurrent threads than is optimal for a given workload, based on the information provided by the STM runtime. STM-MT Performance Validation barriers we use to ensure privatization safety require frequent communication among all threads in the system and can degrade performance due to the time threads spend waiting for one another and the increased number of cache misses. A similar technique is known to significantly affect performance of STM in certain cases,25 as confirmed by our experiments. We must highlight the fact that none of the benchmarks we used requires privatization. We thus measured the worst case: Supporting transparent privatization incurs overhead without the performance benefits of reading and writing privatized data outside
contributed articles still scales and performs well on a range of applications. We also conclude that reducing costs of cache-coherence traffic by having more cores on a single chip reduces the cost of transparent
privatization, resulting in better performance and scalability. Further optimizations. Two recent proposals16,17 aim to improve scalability of transparent privatization by employspeedupSTM-MT
Table 3. Transparent privatization costs (1 — speedupSTM-ME ). SPARC
x86
Threads
Min
Avg
Min
1
0
0.06
0
0
0.45
0.08
2
0.02
0.47
0.16
0.03
0.58
0.29
4
0.03
0.59
0.26
0.06
0.64
0.4
8
0.03
0.66
0.32
0.08
0.69
0.48
Max
Max
Avg
0.51
16
0
0.75
0.35
0.17
0.85
32
0
0.77
0.34
—
—
—
64
0
0.8
0.35
—
—
—
Figure 3. STM-MT performance.
1
(3a) SPARC
2
4
8
16
32
64
18 23
16 14
Speedup
12 10 8 6 4 2
Skiplist
Rbtree
Linked List
Hashtable
Yada
Vacation Low
Vacation High
Ssca2
Labyrinth
Kmeans Low
Kmeans High
Intruder
Genome
Bayer
SB7 Write
SB7 Read/Write
SB7 Read
0
(3b) X86 1
2
4
8
16
6 5 4 3 2 1
Skiplist
Rbtree
Linked List
Hashtable
Yada
Vacation Low
Vacation High
Ssca2
Labyrinth
Kmeans Low
Kmeans High
Intruder
Genome
Bayer
SB7 Write
SB7 Read/Write
0 SB7 Read
Speedup
of transactions. Also, the measured performance costs are specific to our choice of privatization technique and implementation; ways to reduce privatization costs have been proposed.16,17 Figure 3a shows performance of STM-MT (manual instrumentation with transparent privatization) with SPARC, conveying that transparent privatization affects the performance of STM significantly but that STM-MT still performs well, outperforming sequential code on 11 of 17 workloads with four threads and on 13 workloads with eight threads. Also, STM-MT outperforms sequential code on all benchmarks. However, performance is not as good; STMMT outperforms sequential code by up to 23 times compared to 29 times with STM-ME and by 5.6 times on average compared to 9.1 times with STM-ME. Our experiments show performance for some workloads (such as ssca2) is unaffected but also that privatization costs can be as high as 80% (such as in vacation low and yada). Also, in general, costs increase with the number of concurrent threads, affecting both performance and scalability of STM. Table 3 summarizes the costs of transparent privatization with SPARC. We repeated the experiments with the x86 machine (see Figure 3b), with results confirming that STM-MT has lower performance than STM-ME. STM-MT outperforms sequential code on eight of 17 workloads with four threads and on 14 workloads with eight threads. Overall, transparent privatization overhead reduces STM performance below performance of sequential code in three benchmarks: STMBench7 read/write, STMBench7 write, and kmeans high. Note that performance is affected most with microbenchmarks due to cache contention for shared privatization metadata induced by small transactions. Our experiments show that privatization costs can be as high as 80% and confirm that transparent privatization costs increase with the number of threads. The cost of transparent privatization is higher on our four-CPU x86 machine than on SPARC, due mainly to the higher costs of interthread communication; Table 3 lists the costs of transparent privatization with x86. While the effect of transparent privatization can be significant, STM-MT
april 2 0 1 1 | vol. 5 4 | n o. 4 | c o m m u n i c at i o n s of t he acm
75
contributed articles Table 4. Compiler instrumentation cost with x86 (1 —
Threads
speedupSTM-CE speedupSTM-ME ).
Min
Max
Avg
1
0
0.42
0.16
2
0
0.4
0.17
4
0
0.4
0.11
8
0
0.47
0.11
16
0
0.44
0.17
ing partially visible reads. By making readers only partially visible, the cost of reads is reduced, compared to fully visible reads, while improving the scalability of privatization support. To implement partially visible readers, Marathe et al.17 used timestamps, while Lev et al.16 used a variant of SNZI counters.10 In addition, Lev et al.16 avoided use of centralized privatization metadata to improve scalability. STM-CE Performance Compiler instrumentation often replaces more memory references by STM load and store calls than is strictly necessary, resulting in reduced performance of generated code, or “overinstrumentation.”3,8,25 Ideally, the compiler replaces only memory accesses with STM calls when they reference some shared data. However, the compiler does not have information about all uses of variables in the whole program or semantic information about variable use typically available only to the programmer (such as which vari-
ables are private to some threads and which are read-only). For this reason, the compiler, conservatively, generates more STM calls than necessary; unnecessary STM calls reduce performance because they are more expensive than the CPU instructions they replace. Figure 4 outlines STM-CE (compiler instrumentation with explicit privatization) speedup over sequential code,b showing that STM-CE has good performance, outperforming sequential code on 10 of 14 workloads with four threads and on 13 workloads with eight threads. Overall, STM-CE outperforms sequential code in all benchmarks but kmeans high though scales well on kmeans high, promising to outperform sequential code with additional hardware threads. The cost of compiler instrumentation is about 20% for all workloads but kmeans, where it is about 40%. Also, on some workloads (such as labyrinth, ssca2, and hashtable), compiler instrumentation does not introduce significant costs, and the performance of STM-ME and STM-CE is almost the same. In our experiments the costs of compiler instrumentation are approximately the same for all thread counts, conveying that compiler instrumentation does not affect STM scalability; Table 4 summarizes costs introduced by compiler instrumentation. The b The data presented here does not include STMBench7 workloads due to the limitations of the STM compiler we used.
Figure 4. STM-CE performance with 16-core x86.
1
2
4
8
10 9
Speedup
8 7 6 5 4 3 2
communi cations o f th e acm
| April 2 0 1 1 | vol . 5 4 | no. 4
Skiplist
Rbtree
Linked List
Hashtable
Yada
Vacation Low
Vacation High
Ssca2
Labyrinth
Kmeans Low
Kmeans High
Intruder
Bayes
76
Genome
1 0
16
additional overheads introduced by compiler instrumentation remain acceptable, as STM-CE outperforms sequential code on 10 of 14 workloads with only four threads and on all but one workload overall. Further optimizations. Ni et al.18 described optimizations that replace full STM load and store calls with specialized, faster versions of the same calls; for example, some STMs perform fast reads of memory locations previously accessed for writing inside the same transaction. While the compiler we used supports these optimizations, we have not yet implemented the lowercost STM barriers in SwissTM. Compiler data structure analysis was used by Riegel et al.22 to optimize the code generated by the Tanger STM compiler. Adl-Tabatabai et al.1 proposed several optimizations in the Java context to eliminate transactional accesses to immutable data and data allocated inside current transactions. Harris et al.14 used flow-sensitive interprocedural compiler analysis, as well as runtime log filtering in Bartok-STM, to identify objects allocated in the current transaction and eliminate transactional accesses to them. Eddon and Herlihy9 used dataflow analysis of Java programs to eliminate some unnecessary transactional accesses. STM-CT performance. We also performed experiments with STM-CT (using both compiler instrumentation and transparent privatization) but defer the result to the companion technical report.6 Our experiments showed that, despite the high costs of transparent privatization and compiler overinstrumentation, STM-CT outperformed sequential code on all but four workloads out of 14. However, STM-CT requires higher thread counts to outperform sequential code than previous STM variants for the same workloads, as it outperformed sequential code in only five of 14 workloads with four threads. The overheads of STM-CT are largely a simple combination of STMCE and STM-MT overheads; the same techniques for reducing transparent privatization and compilation overheads are applicable here. Programming model. The experiments we report here imply that STMCE (compiler instrumentation with explicit privatization) may be the most
contributed articles appropriate programming model for STM. STM-ME is likely too tedious and error-prone for use in most applications and might instead be appropriate only for smaller applications or performance-critical sections of code. Clearly, an STM compiler is crucial for usability, yet transparent privatization support might not be absolutely needed from STM. It seems that programmers make a conscious decision to privatize a piece of data, rather than let the data be privatized by accident. This might imply that, for the programmer, explicitly marking privatizing transactions would not require much additional effort. Apart from semantic issues, our experiments show that STM-CE offers good performance while scaling well.c Conclusion We reported on the most exhaustive evaluation to date of the ability of an STM to outperform sequential code, showing it can deliver good performance across a range of workloads and multicore architectures. Though we do not claim STM is a silver bullet for general-purpose concurrent programming, our results contradict Cascaval et al.3 and suggest STM is usable for a range of applications. They also support the initial hopes about STM performance and should motivate further research in the field. Many improvements promise to boost STM performance further, making it even more appealing; for example, static segregation of memory locations, depending on whether or not the locations are shared, can minimize compiler instrumentation overhead, partially visible reads can improve privatization performance, and reduction of accesses to shared data can enhance scalability. Acknowledgments We are grateful to Tim Harris for running the Bartok-STM experiments, to Calin Cascaval for providing us with the experimental settings of Cascaval et al.,3 and to Yang Ni for running experic Because the experiments reported in Cascaval et al.3 were conducted with STM variants not supporting transparent privatization, our observation about the programming model does not alter the performance comparisons covered earlier in the article.
ments with McRT-STM, confirming our speculation about the effect of hyperthreading. We also thank Hillel Avni, Derin Harmanci, Michał Kapałka, Patrick Marlier, Maged Michael, and Mark Moir for their comments. This work was funded by the Velox FP7 European Project and the Swiss National Science Foundation grant 200021-116745/1. References 1. Adl-Tabatabai, A.-R., Lewis, B.T., Menon, V., Murphy, B.R., Saha, B., and Shpeisman, T. Compiler and runtime support for efficient software transactional memory. In Proceedings of the 2006 ACM SIGPLAN Conference on Programming Language Design and Implementation (Ottawa, June 10–16). ACM Press, New York, 2006, 26–37. 2. Cao Minh, C., Chung, J., Kozyrakis, C., and Olukotun, K. STAMP: Stanford Transactional Applications for Multiprocessing. In Proceedings of the 2008 IEEE International Symposium on Workload Characterization (Seattle, Sept. 14–16). IEEE Computer Society, Washington, D.C., 2008, 35–46. 3. Cascaval, C., Blundell, C., Michael, M.M., Cain, H.W., Wu, P., Chiras, S., and Chatterjee, S. Software transactional memory: Why is it only a research toy? Commun. ACM 51, 11 (Nov. 2008), 40–46. 4. Dice, D., Lev, Y., Moir, M., and Nussbaum, D. Early experience with a commercial hardware transactional memory implementation. In Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems (Washington, D.C., Mar. 7–11). ACM Press, New York, 2009, 157–168. 5. Dice, D., Shalev, O., and Shavit, N. Transactional locking II. In Proceedings of the 20th International Symposium on Distributed Computing (Stockholm, Sept. 18–20). Springer-Verlag, Berlin, 2006, 194–208. 6. Dragojevic´, A., Felber, P., Gramoli, V., and Guerraoui, R. Why STM Can Be More Than a Research Toy. Technical Report LPD-REPORT-2009-003. EPFL, Lausanne, Switzerland, 2009. 7. Dragojevic´, A., Guerraoui, R., and Kapalka, M. Stretching transactional memory. In Proceedings of the 2009 ACM SIGPLAN Conference on Programming Language Design and Implementation (Dublin, 2009). ACM Press, New York, 2009, 155–165. 8. Dragojevic´, A., Ni, Y., and Adl-Tabatabai, A.-R. Optimizing transactions for captured memory. In Proceedings of the 21st ACM Symposium on Parallelism in Algorithms and Architectures (Calgary, Aug. 11–13). ACM Press, New York, 2009, 214–222. 9. Eddon, G. and Herlihy, M. Language support and compiler optimizations for STM and transactional boosting. In Proceedings of the Fourth International Conference on Distributed Computing and Internet Technology (Bangalore, Dec. 17–20). Springer-Verlag, Berlin, 2007, 209–224. 10. Ellen, F., Lev, Y., Luchangco, V., and Moir, M. Snzi: Scalable nonzero indicators. In Proceedings of the 26th Annual ACM SIGACT-SIGOPS Symposium on Principles of Distributed Computing (Portland, OR, Aug. 12–15). ACM Press, New York, 2007, 13–22. 11. Felber, P., Gramoli, V., and Guerraoui, R. Elastic transactions. In Proceedings of the 23rd International Symposium on Distributed Computing (Elche/Elx, Spain, Sept. 23–25). Springer-Verlag, Berlin, 2009, 93–107. 12. Guerraoui, R., Kapalka, M., and Vitek, J. STMBench7: A benchmark for software transactional memory. In Proceedings of the Second ACM SIGOPS/EuroSys European Conference on Computer Systems (Lisbon, Mar. 21–23). ACM Press, New York, 2007, 315–324. 13. Harris, T. and Fraser, K. Language support for lightweight transactions. In Proceedings of the 18th Annual ACM Conference on Object-Oriented Programming, Systems, Languages, and Applications (Anaheim, CA, Oct. 26–30). ACM Press, New York, 2003, 388–402. 14. Harris, T., Plesko, M., Shinnar, A., and Tarditi, D. Optimizing memory transactions. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (Ottawa, June 10–16). ACM Press, New York, 2006, 14–25.
15. Herlihy, M., Luchangco, V., Moir, M., and Scherer III, W.N. Software transactional memory for dynamicsized data structures. In Proceedings of the 22nd ACM Symposium on Principles of Distributed Computing (Boston, July 13–16). ACM Press, New York, 2003, 92–101. 16. Lev, Y., Luchangco, V., Marathe, V., Moir, M., Nussbaum, D., and Olszewski, M. Anatomy of a scalable software transactional memory. In Proceedings of the Fourth ACM SIGPLAN Workshop on Transactional Computing (Raleigh, NC, Feb. 15, 2009). 17. Marathe, V.J., Spear, M.F., and Scott, M.L. Scalable techniques for transparent privatization in software transactional memory. In Proceedings of the 37th International Conference on Parallel Processing (Portland, OR, Sept. 8–12). IEEE Computer Society, Washington, D.C., 2008, 67–74. 18. Ni, Y., Welc, A., Adl-Tabatabai, A.-R., Bach, M., Berkowits, S., Cownie, J., Geva, R., Kozhukow, S., Narayanaswamy, R., Olivier, J., Preis, S., Saha, B., Tal, A., and Tian, X. Design and implementation of transactional constructs for C/C++. In Proceedings of the 23rd Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications (Nashville, Oct. 19–23). ACM Press, New York, 2008, 195–212. 19. Rajwar, R., Herlihy, M., and Lai, K. Virtualizing transactional memory. In Proceedings of the 32nd Annual International Symposium on Computer Architecture (Madison, WI, June 4–8). IEEE Computer Society, Washington, D.C., 2005, 494–505. 20. Riegel, T., Felber, P., and Fetzer, C. Dynamic performance tuning of word-based software transactional memory. In Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (Salt Lake City, Feb. 20–23). ACM Press, New York, 2008, 237–246. 21. Riegel, T., Felber, P., and Fetzer, C. A lazy snapshot algorithm with eager validation. In Proceedings of the 20th International Symposium on Distributed Computing (Stockholm, Sept. 18–20). Springer-Verlag, Berlin, 2006, 284–298. 22. Riegel, T., Fetzer, C., and Felber, P. Automatic data partitioning in software transactional memories. In Proceedings of the 20th ACM Symposium on Parallelism in Algorithms and Architectures (Münich, June 14–16). ACM Press, New York, 2008, 152–159. 23. Shavit, N. and Touitou, D. Software transactional memory. In Proceedings of the 14th ACM SIGACTSIGOPS Symposium on Principles of Distributed Computing (Ottawa, Aug. 20–23). ACM Press, New York, 1995, 204–213. 24. Spear, M.F., Marathe, V.J., Dalessandro, L., and Scott, M.L. Privatization techniques for software transactional memory. In Proceedings of the 26th Annual ACM SIGACT-SIGOPS Symposium on Principles of Distributed Computing (Portland, OR, Aug. 12–15). ACM Press, New York, 2007, 338–339; extended version available as TR 915, Computer Science Department, University of Rochester, Feb. 2007. 25. Yoo, R.M., Ni, Y., Welc, A., Saha, B., Adl-Tabatabai, A.-R., and Lee, H.H.S. Kicking the tires of software transactional memory: Why the going gets tough. In Proceedings of the 20th ACM Symposium on Parallelism in Algorithms and Architectures (Münich, June 14–16). ACM Press, New York, 2008, 265–274.
Aleksandar Dragojevic´ (aleksandar.dragojevic@epfl. ch) is a Ph.D. student in the Distributed Programming Laboratory of the École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland. Pascal Felber ([email protected]) is a professor in the Computer Science Department of the Université de Neuchâtel, Neuchâtel, Switzerland. Vincent Gramoli ([email protected]) is a postdoctoral researcher in the Distributed Computing Laboratory in the École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland, and a postdoctoral researcher in the Computer Science Department of the Université de Neuchâtel, Neuchâtel, Switzerland. Rachid Guerraoui ([email protected]) is a professor in the Distributed Programming Laboratory in the École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland. © 2011 ACM 0001-0782/11/04 $10.00
april 2 0 1 1 | vol. 5 4 | n o. 4 | c o m m u n ic at i o n s of t he acm
77
contributed articles Finding 10 balloons across the U.S. illustrates how the Internet has changed the way we solve highly distributed problems. by John C. Tang, Manuel Cebrian, Nicklaus A. Giacobe, Hyun-Woo Kim, Taemie Kim, and Douglas “Beaker” Wickert
Reflecting on the DARPA Red Balloon Challenge Red Balloon Challenge (also known as the DARPA Network Challenge) explored how the Internet and social networking can be used to solve a distributed, time-critical, geo-location problem. Teams had to find 10 red weather balloons deployed at undisclosed locations across the continental U.S. The first team to correctly identify the locations of all 10 would win a $40,000 prize. A team from the Massachusetts Institute of Technology (MIT) won in less than nine hours (http://networkchallenge.darpa. mil/). Here, we reflect on lessons learned from the strategies used by the various teams. The Challenge commemorated the 40th anniversary of the first remote log-in to the ARPANet (October 29, 1969), an event widely heralded as the birth of the Internet. The Challenge was designed to identify Th e 2009 DARPA
78
communi cations o f th e acm
| April 2 0 1 1 | vol . 5 4 | no. 4
how more recent developments (such as social media and crowdsourcing) could be used to solve challenging problems involving distributed geolocations. Since the Challenge was announced only about one month before the balloons were deployed, it was not only a timed contest to find the balloons but also a time-limited challenge to prepare for the contest. Both the diffusion of how teams heard about the Challenge and the solution itself demonstrated the relative effectiveness of mass media and social media. The surprising efficiency of applying social networks of acquaintances to solve widely distributed tasks was demonstrated in Stanley Milgram’s celebrated work9 popularizing the notion of “six degrees of separation”; that is, it typically takes no more than six intermediaries to connect any arbitrary pair of people. Meanwhile, the Internet and other communication technologies have emerged that increase the ease and opportunity for connections. These developments have enabled crowdsourcing—aggregating bits of information across a large number of users to create productive value—as a popular mechanism for creating encyclopedias of information (such as Wikipedia) and solving other highly distributed problems.1 The Challenge was announced at the “40th Anniversary of the Internet” event (http://www.engineer.ucla.edu/IA40/ index.html). On December 5, 2009, at 10:00 a.m. Eastern time, 10 numbered, eight-foot-diameter red weather balloons were deployed at moored loca-
key insights C rowdsourcing, social networking,
and traditional media enabled teams to quickly find 10 weather balloons scattered across the U.S.
B esides finding the balloons,
distinguishing correct balloon sightings from misleading claims turned out to be an important part of the effort.
V ariations in the strategies of the
competing teams reflected differences in how social media can be tailored to fit a given task.
Photogra ph courtesy of da rpa
d oi:10.1145/1924421.1924441
tions across the continental U.S. (see Figure 1). DARPA selected readily accessible public sites where the balloons would be visible from nearby roads, each staffed by a DARPA agent who would issue a certificate validating each balloon location. While general information about the Challenge (such as date and time of the deployment, with a picture of a balloon) had already been distributed, some details were not announced (such as DARPA’s banner on
each balloon and an attendant at each balloon issuing certificates). Teams submitted their guesses to a DARPA Web site and were given feedback as to which balloons had been identified correctly. While the balloons were scheduled to be taken down at 5:00 p.m. local time, DARPA was prepared to re-deploy them a second day and leave the submission process open for up to a week until a team identified all 10. The team from MIT correctly identi-
fied the location of all of them in eight hours, 52 minutes, 41 seconds. A team from the Georgia Tech Research Institute (GTRI) placed second by locating nine balloons within nine hours. Two more teams found eight balloons, five teams found seven balloons, and the iSchools team (representing Pennsylvania State University, University of Illinois at Urbana-Champaign, University of Pittsburgh, Syracuse University, and University of North Carolina at
april 2 0 1 1 | vol. 5 4 | n o. 4 | c o m m u n ic at i o n s of t he acm
79
contributed articles
Figure 1. Locations in the DARPA Red Balloon Challenge. Figure 2. Example recursive incentive-structure process for the MIT team.
Alice wins $750 Bob wins $500 Carol wins $1,000 Dave wins $2,000
ALICE $250 $500
BOB $500
$1,000
CAROL $1,000
DAVE $2,000
$2,000
Balloon found!
Chapel Hill) finished tenth by locating six balloons. Two months later, at the Computer-Supported Cooperative Work Conference (http://www.cscw2010.org/) in Savannah, GA, a special session dedicated to lessons learned from the Challenge brought together representatives from the winning MIT team, the GTRI team, and the iSchools team to compare and contrast among the strategies and experiences across the teams. There, members of the MIT and iSchools teams reflected on their strategies, how they validated their 80
communi cations o f th e ac m
Another balloon found!
balloon sightings, and the role of social networking tools in their process. While the GTRI team was unavailable for this article, we report on what they shared at the CSCW session and published elsewhere.6,11,12 MIT Team The MIT team learned about the Challenge only a few days before the balloons were deployed and developed a strategy that emphasized both speed (in terms of number of people recruited) and breadth (covering as much U.S. geography as possible). They set up a
| April 2 0 1 1 | vol . 5 4 | no. 4
platform for viral collaboration that used recursive incentives to align the public’s interest with the goal of winning the Challenge. This approach was inspired by the work of Peter S. Dodds et al.5 that found that success in using social networks to tackle widely distributed search problems depends on individual incentives. The work of Mason and Watts7 also informed the use of financial incentives to motivate crowdsourcing productivity. The MIT team’s winning strategy was to use the prize money as a financial incentive structure rewarding not only the people who correctly located balloons but also those connecting the finder to the MIT team. Should the team win, they would allocate $4,000 in prize money to each balloon. They promised $2,000 per balloon to the first person to send in the correct balloon coordinates. They promised $1,000 to the person who invited that balloon finder onto the team, $500 to whoever invited the inviter, $250 to whoever invited that person, and so on. Any remaining reward money would be donated to charity. Figure 2 outlines an example of this recursive incentive structure. Alice joins the team and is given an invite link, like http://balloon.mit.edu/alice. Alice then emails her link to Bob, who uses it to join the team as well. Bob gets a unique link, like http://balloon. mit.edu/bob, and posts it on Facebook. His friend Carol sees it, signs up, then twitters about http://balloon.mit. edu/carol. Dave uses Carol’s link to join, then spots one of the DARPA balloons. Dave is the first person to report the balloon’s location to the MIT team, helping it win the Challenge. Once that happens, the team sends Dave $2,000 for finding the balloon. Carol gets $1,000 for inviting Dave, Bob gets $500 for inviting Carol, and Alice gets $250 for inviting Bob. The remaining $250 is donated to charity. The recursive incentive structure differed from the direct-reward option of giving $4,000 per balloon found in two key ways: First, a direct reward might actually deter people from spreading the word about the MIT team, as any new person recruited would be extra competition for the reward. Second, it would eliminate people living outside the U.S., as there
contributed articles was no possibility of them spotting a balloon. These two factors played a key role in the success of the MIT approach, as illustrated by the fact that the depth of the tree of invites went up to 15 people, and approximately one of three tweets spreading information about the team originated outside the U.S. Distributing the reward money more broadly motivated a much larger number of people (more than 5,000) to join the team, including some from outside of the U.S. who could be rewarded for simply knowing someone who could find a balloon. This strategy combined the incentive of personal gain with the power of social networks to connect people locating each balloon with the MIT team. The MIT team received more than 200 submissions of balloon sightings, of which 30 to 40 turned out to be accurate. Given the considerably noisy submission data, including deliberate attempts at misdirection, the team had to develop a strategy to accurately identify the correct sightings. It did not have time to build a sophisticated machine-learning system to automate the process, nor did it have access to a trusted human network to verify balloon sightings. Instead, most of its strategies relied on using human reasoning to analyze the information submitted with the balloon sightings and eliminate submissions with inconsistencies. The first strategy was to observe the patterns of submissions about a certain balloon site. Since the balloons were all located in public spaces, each one tended to elicit multiple submissions. Multiple submissions at a specific location increased the probability of a report being accurate. However, those deliberately falsifying balloon sightings also submitted multiple sightings for each false location. To filter out these submissions, the team observed differing patterns in how balloon locations were reported (see Figure 3). Multiple submissions about a real balloon location tended to differ a little from one another, reflecting natural variation in representing a certain location: address, crossroads, nearby landmarks. Malicious submissions tended to have identical representations for a single location, making them suspicious. Another simple strategy the team
used involved comparing the IP address of the submission with where a balloon was reported found; for example, one submission reporting a balloon in Florida came from an IP address in the Los Angeles area. A simple IP trace and common sense filtered out such false submissions.
Many submissions included pictures, some contrived to confirm misleading submissions. Most altered pictures involved shots of a balloon from a distance and lacked the DARPA agent holding the balloon and the DARPA banner (an unannounced detail). Figure 4 shows examples of authentic and
Figure 3. Typical real (top) and false (bottom) locations of balloons, with bottom map depicting five submissions with identical locations.
Figure 4. Typical real (left) and contrived (center and right) pictures of balloons. april 2 0 1 1 | vol. 5 4 | n o. 4 | c o m m u n ic at i o n s of t he acm
81
contributed articles contrived confirmation photos. While the MIT team succeeded in quickly collecting balloon-sighting data, it relied on human analysis of the data to detect inconsistencies or patterns reflecting attempts to mislead with false sightings. Analyses exposed bogus balloon sightings, resulting in identifying the accurate sightings by process of elimination. GTRI Team In contrast, the GTRI team (also known as the “I Spy a Red Balloon” team) was one of the quickest to launch a Web site and start recruiting a network of participants, eventually growing to about 1,400 people. It even explored partnering with a major shipping company to leverage its network of drivers covering the U.S. to find and report balloon locations. Ultimately, the company declined over “concerns about driver safety and focus on the job.”12 The GTRI team instead promoted its own Web page, registered a Google Voice number, and formed a Facebook group to communicate with participants searching for balloons for the team. A major aspect of the GTRI strategy was to promote the visibility of
the team so anyone spotting a balloon would be more likely to report it to the team. Besides activating the team Web site three weeks before launch day, it leveraged mass media coverage of the team and search engine rank optimization for the Web site to make its participation in the Challenge readily discoverable. This approach capitalized on its longer lead time advantage by starting early to prepare for the contest. The team also declared it would donate all prize money to charity, appealing to the intrinsic motivation of altruism to encourage people to help the team. While the GTRI team also had to validate the accuracy of reported sightings, it believed its charitable intentions deterred submission of false reports to the team.11 The strategy focused on personally confirming balloon sightings. Where possible, the team had a direct conversation with the balloon spotter to verify a report, creating a social situation whereby it was more difficult to fabricate balloon sightings. If the team could not personally contact a balloon spotter, it called nearby businesses to solicit help validating sightings. Such cold calls produced mixed results; some were oblig-
Figure 5. Fabricated photo posted during the challenge (left) (http://twitpic.com/s9kun) and photo taken by a pre-recruited observer in Albany, NY (right).
Figure 6. Photo Mapping with Google Maps and Panoramio (Location: Chaparral Park, Scottsdale, AZ); photo report from Twitter (http://twitpic.com/s9ffv) (left); Google Maps with Panoramio photos (center); and view-in image from Panoramio (right) (http://www.panoramio.com/photo/2412514). 82
communi cations o f th e ac m
| April 2 0 1 1 | vol . 5 4 | no. 4
ing, while others simply dismissed the request. In essence, the GTRI team largely relied on social persuasion of strangers, either of potential balloon spotters or of people in the vicinity of a balloon sighting, to validate balloon locations. While the GTRI team correctly identified nine balloons, it had no record of a report of the tenth balloon (in Katy, TX) being submitted to the team. The mechanism for personally validating balloon sightings (and perhaps its charitable intentions) seemed to engender more social cooperation, but the effort fell short of eliciting a report of all 10 balloons. iSchools Team The iSchools team formed about two weeks before the launch date, recruiting observers from member organizations for direct search for the balloons and employing Open Source Intelligence methods8 for cyberspace search. Confirmation techniques were a key element of the iSchools team’s ability to locate six of the 10 balloons, helping it claim tenth place. Most of the six were located through the cyberspace search approach, using humans as sensors in a participatory sensing experiment,2 whereas directly recruiting them as observers in advance of an event is often problematic or impossible. The team tried using the wide geographic footprint of its member organizations. Since it included colleges and universities from across the continental U.S., it had a good chance of recruiting observers wherever DARPA placed balloons. Current students, faculty, and staff, as well as alumni, were recruited through messages sent to email lists, Twitter feeds, and Facebook groups, when available. Only a handful of pre-registered observers actively participated during launch day, yielding only a single valid balloon location through direct search. In the cyberspace-search approach, a group of analysts sought evidence of balloon-sighting reports that were accessible on publicly available Internet sites, including public Twitter feeds, Web sites of competing teams, and any other source they could access without hacking. This approach was the primary source of data for finding the other balloon locations. Evidence was
contributed articles gathered from all sources, compiled, and manually evaluated. The validity of evidence was assessed to include the content of the data, as well as the reputation of its source; for example, solitary tweets without detail sent from new Twitter accounts with no followers were discounted, while those from established users with geo-tagged photographs (attached) were given a higherreliability assessment. The cyberspace search approach also used a Twitter-capture system to store and search tweets about the Challenge offline, as well as a custom Web crawler set to record data from the publicly accessible parts of Web sites of competing teams. Since analysis of the data captured by the crawler required more time, it could have been more helpful if the Challenge continued longer than the one day the balloons were deployed. However, the Twitter-capture system turned out to be more helpful, as it revealed locations from users allowing their smartphones to embed geo-data with their tweets. Unlike manually geotagging photos, falsifying data was more difficult through the current set of tools and therefore deemed more reliable. Several reports were confirmed as false through a combination of photograph analysis and secondary confirmation by dispatched observers. Observers from the pre-event recruiting effort were used where possible (such as detecting the fabricated photo in Figure 5 of the balloon over Albany, NY). Where no pre-recruited observer was available, the command staff and cyberspace search staff called and recruited observers from the iSchools Caucus member organizations, family, and friends known to be near unconfirmed sightings. This technique was used to confirm the valid location in Portland, OR, and disqualify the fabricated image of Providence, RI, in Figure 4, right, and the non-DARPA balloon over Royal Oak, MI, in Figure 4, center. In one case, a competing team unintentionally leaked details on its Web site of an accurate balloon sighting in Scottsdale, AZ. An attempt to cover up the leak and misdirect others to think the balloon report was in another state created an inconsistency in the story posted with the photograph. To identify the true location of the sighting,
the iSchools team triangulated information across many social networking sites. Following geographical clues in the original posting,3,10 the team confirmed the true identity and likely home location of the original poster. The location of the balloon was then confirmed by matching the original text description (in the park near the poster’s house) and comparing the poster’s photograph of the balloon with photographs of the park on Panoramio (see Figure 6). This illustrates the potential of piecing together bits of publicly available information across disparate sources in a timely way to solve a piece of the puzzle. The iSchools team found that cyberspace-search techniques are effective and inexpensive. Especially in situations where observers could not be recruited in advance, existing observer networks and publicly available information can be leveraged to address intelligence tasks. Moreover, the iSchools team’s approach can be leveraged in intelligence and law enforcement, especially where grassroots organizations are more able to recruit and motivate observers. The team also learned that secondary confirmation techniques must be employed to overcome deception. During the Challenge, secondary observers, photograph analysis, and metadata analysis were combined to assess the validity of scarce data. Social networking tools have provided public access to large numbers of people and enough data to enable both discovery and independent verification of intelligence information.
Reflections This experience generated insights at several levels. Diffusion of the Challenge itself demonstrated the complementary roles of traditional mass media and social media. Comparing the strategies of the three teams at the CSCW panel yielded interesting contrasts and implications for how to validate submitted information, adding to DARPA’s reflection across all participating teams. Diffusion of the Challenge through mass media and social media channels provided a good comparison of the relative roles of traditional and social media methods in network mobilization. The initial announcement at the “40th Anniversary of the Internet” event in October and some widely circulated blog posts (at mssv.net and Slashdot) generated a steady trickle of traffic to the DARPA Web site, averaging about 1,000 hits per day. Initial expectations that the diffusion of the Challenge would progress virally were not realized until the final week before balloon launch. A steep increase in Web-site hits corresponded with the appearance of a story in the New York Times, November 30, 2009, with Web-site traffic increasing to an average of 20,000 hits a day in the last week before the balloons were launched. Diffusion of the Challenge (itself an experiment in social media) showed how traditional mass media and social media channels are complementary. At least for this target audience and in this time frame, it took a combination of mass media and social media to effectively disseminate information
Approaches used by three teams—MIT, GTRI, and iSchools—in the Challenge.
Team Motivation
Balloon Validation Social Media Usage Mass Media Usage
MIT
Extrinsic financial incentive
Human analysis of submitted information
Form team network, Publicity on launch day Attract balloon may have attracted sightings balloon sightings
GTRI
Intrinsic altruism
Direct personal verification
Form team network, Publicity before launch Attract balloon may have added to sightings team and attracted balloon sightings
iSchools
None
Human and machine analysis of submitted and public information
Mined for published None information
april 2 0 1 1 | vol. 5 4 | n o. 4 | c o m m u n ic at i o n s of t he acm
83
contributed articles to the intended audience. While mass media played a key role in making the general public aware of the Challenge, social media were an important factor in viral diffusion of Challenge information, especially among the teams relying on them to quickly recruit and connect participants. Reflecting across the three teams revealed similarities and interesting differences in strategy and implications for each team’s operations. All three set up co-located operations centers where a core team assembled on launch day to actively monitor the real-time chatter in social networking feeds to learn of balloon sightings and possible clues of their validity. The bulk of the effort involved analyzing the balloon sighting information to determine which reports were accurate. Beyond these similarities, the table here summarizes the main differences concerning how the teams motivated participants, validated balloon sightings, and used social and mass media. The MIT team aligned individual incentives with connecting a social network so it would grow quickly and autonomously. Financial incentives served as extrinsic motivation to work with strangers, both in quickly recruiting the network and in activating the network to locate the balloons. The MIT team also developed strategies for verifying the accuracy of reported balloon sightings largely by analyzing the balloon sighting information submitted to the team. The GTRI team took advantage of its early start and relied on a combination of social media and mass media coverage to make the team’s quest visible to the vast audience of potential participants. But GTRI’s network size from three weeks of recruiting was far smaller than the network the MIT team recruited in three days. While difficult to determine the causes (such as motivational incentives and social connections), the wide range of responses to the MIT and GTRI teams shows the great variability in dissemination that is somewhat characteristic of social media today. The iSchools team mined publicly available information through Twitter to identify balloon sightings. In this sense, the team did not offer any motivation or incentives to attract 84
communi cations o f th e acm
The Challenge demonstrated that geospatial intelligence is potentially available to anyone with an Internet connection, not just to government intelligence analysts.
| April 2 0 1 1 | vol . 5 4 | no. 4
people to help the team but exploited information people made public voluntarily. The advent of social media tools has made a wealth of information publicly available, and the iSchools team’s strategy demonstrated that this information could be mined to tackle a time-urgent problem. While the strategies of the MIT and GTRI teams relied on social media tools to quickly extend their reach to people who could help solve the problem, the iSchools’ datamining strategy would have been impossible without the social networking tools that elicited data to be made publicly available in the first place. However, since the information providers had no motivation to help the iSchools team win, the team had perhaps the most challenging job of identifying accurate sightings among the wide range of noisy information circulating through Twitter. The team was able to identify five balloons simply through publicly available information, performing better than many teams that actively recruited members. Its approach is most relevant for tackling problems where advance preparation, direct recruiting, and financial incentives are inappropriate. Together, the three teams exhibited a range of strategies that relied on intrinsic or extrinsic motivation and proactive recruiting or reactive data mining. While social networking tools played a role (to varying degrees) in data collection for all teams, the data generated could not be trusted without first verifying its accuracy. The teams’ strategies for validation also varied but relied largely on analyzing the internal consistency of the data or independently verifying balloon sightings, often through social networking tools or trusted social connections. The MIT team’s approach enabled it to solve the game-like problem within a day, while the iSchools team had planned for more extensive data-mining tools that would be useful in a more long-lived challenge. Comparing the teams highlights the different ways social media were used to recruit participants, collect balloon sightings, and validate balloon sighting data. DARPA View While the DARPA Web site registered more than 4,000 individuals (from
contributed articles 39 countries) as participants in the Challenge, interview data and team estimates of network size indicate that more than 350,000 people participated in some way. Based on the 922 balloon sighting submissions to the DARPA Web site and team interview data, DARPA tracked 58 teams that were able to correctly locate at least two balloons. Following the Challenge, DARPA conducted 53 interviews with team leaders who had competed in the Challenge. These interviews supplemented the quantitative submission-log data collected on the DARPA Web site with qualitative data about participating team strategies, social and technical tools used, network size, mobilization speed, and important social dynamic factors. This data enabled DARPA to reflect on the experience of teams beyond the three in the CSCW session.4 The Challenge clearly demonstrated the variety, efficiency, and effectiveness of crowdsourcing solutions to a distributed, geo-located, time-urgent problem. The network mobilization time was far faster than expected by DARPA program managers, requiring days instead of weeks. The MIT team constructed a motivated network exceeding 5,000 individuals from four initial nodes in just a few days. Other teams that built around existing networks were able to mobilize them in a day. In one case, a highly connected individual successfully mobilized his contacts through Twitter in less than an hour. As impressive as their use of the network to discover balloons, many teams also used it to do precise, targeted dispatching to verify balloon sightings. Balloon verification, from initial report to confirmation by a targeted dispatch, was typically less than two hours. While the power of social networks and the manner in which they are poised to transform our society have been gaining attention, the Challenge revealed several promising means for using them to mobilize groups of people for a specific purpose. It also demonstrated the speed at which social networks could be used to solve challenging, national geo-location problems. This potential has profound implications for a variety of applications, from natural disaster response to quickly locating missing
children. However, the Challenge also demonstrated this wealth of data is very noisy, reflecting the need for better search methods and verification algorithms. Much of the transformative potential of social networks lies in the promise of democratization of information and capabilities that had previously been the exclusive purview of privileged government or corporate institutions. The Challenge demonstrated that geospatial intelligence is potentially available to anyone with an Internet connection, not just to government intelligence analysts. Social media and crowdsourcing practices have given almost any individual the potential to tap the inherent power of information. However, along with that power comes the need to cultivate a concomitant sense of responsibility for its appropriate and constructive use. As indicated by recent events, like information disclosed through WikiLeaks and the role of social networking in civil uprisings, appropriate use of these new tools reflects an evolving debate. Acknowledgments We thank the Computer Supported Cooperative Work 2010 conference co-chairs Kori Inkpen and Carl Gutwin for supporting its “Reflecting on the DARPA Red Balloon Challenge” session. The MIT team thanks Alex (Sandy) Pentland, Riley Crane, Anmol Madan, Wei Pan, and Galen Pickard from the Human Dynamics Laboratory at the MIT Media Lab. The iSchool team is grateful for the support and advice provided by John Yen, David Hall, Wade Shumaker, Anthony Maslowski, Gregory Traylor, Gregory O’Neill, Avner Ahmad, Madian Khabsa, Guruprasad Airy, and Leilei Zhu of Penn State University; Maeve Reilly and John Unsworth of the University of Illinois; Martin Weiss of the University of Pittsburgh; Jeffrey Stanton of Syracuse University; and Gary Marchionini of the University of North Carolina. We also acknowledge Ethan Trewhitt and Elizabeth Whitaker from GTRI for their participation in the CSCW session. We also thank Peter Lee and Norman Whitaker from DARPA and the DARPA Service Chiefs’ Program Fellows who conceived and executed the Challenge: Col. Phillip Reiman (USMC), CDR Rog-
er Plasse (USN), CDR Gus Gutierrez (USN), MAJ Paul Panozzo (USA), Maj. Jay Orson (USAF), Timothy McDonald (NGA), Capt. Derek Filipe (USMC), and CPT Deborah Chen (USA). References 1. Brabham, D.C. Crowdsourcing as a model for problem solving. Convergence: The International Journal of Research into New Media Technologies 14, 1 (2008), 75–90. 2. Burke, J., Estrin, D., Hansen, M., Parker, A., Ramanathan, N., Reddy, S., and Srivastava, M.B. Participatory sensing. In Proceedings of the Workshop on World-Sensor-Web at Sensys (Boulder, CO, Oct. 31, 2006). 3. Buyukkokten, O., Cho, J., Garcia-Molina, H., Gravano, L., and Shivakumar, N. Exploiting geographical location information of Web pages. In the Informal Proceedings of the International Workshop on the Web and Databases (Providence, RI, June 28, 1999), 91–96. 4. Defense Advanced Research Projects Agency. DARPA Network Challenge Project Report. DARPA, Arlington, VA, Feb. 16, 2010; https://networkchallenge.darpa.mil/ ProjectReport.pdf 5. Dodds P.S., Muhamad, R., and Watts, D.J. An experimental study of search in global social networks. Science 301, 5634 (Aug. 8, 2003), 827–829. 6. Greenemeier, L. Inflated expectations: Crowdsourcing comes of age in the DARPA Network Challenge. Scientific American (Dec. 21, 2009); http://www. scientificamerican.com/article.cfm?id=darpa-networkchallenge-results 7. Mason, W.A. and Watts, D.J. Financial incentives and the ‘performance of crowds.’ In Proceedings of the KDD Workshop on Human Computation (Paris, June 28–July 1). ACM Press, New York, 2009, 77–85. 8. Mercado, S. A venerable source in a new era: Sailing the sea of OSINT [Open Source Intelligence] in the information age. Studies in Intelligence 48, 3 (2004), 45–55. 9. Milgram, S. The Small World Problem. Psychology Today 1, 1 (May 1967), 60–67. 10. Padamanabban, V. and Subramanian, L. Determining the geographic location of Internet hosts. ACM SIGMETRICS Performance Evaluation Review 29, 1 (June 2001), 324–325. 11. Tang, J.C. Special Session: Reflecting on the DARPA Red Balloon Challenge. At the 2010 ACM Conference on Computer Supported Cooperative Work (Savannah, GA, Feb. 9, 2010); http://www.cscw2010.org/program/ special.php 12. Trewhitt, E. Preparing for the DARPA Network Challenge (Feb. 24, 2010); http://cacm.acm.org/blogs/ blog-cacm/76324-preparing-for-the-darpa-networkchallenge/fulltext
John C. Tang ([email protected]) is a senior researcher in Microsoft Research, Mountain View, CA. Manuel Cebrian ([email protected]) is an assistant research scientist in the Department of Computer Science & Engineering, University of California at San Diego, La Jolla, CA, though at the Massachusetts Institute of Technology, Cambridge, MA, at the time of this work. Nicklaus A. Giacobe ([email protected]) is a Ph.D. candidate and research associate in the College of Information Sciences and Technology at the Pennsylvania State University, University Park, PA. Hyun-Woo Kim ([email protected]) is a Ph.D. candidate in the College of Information Sciences and Technology at The Pennsylvania State University, University Park, PA. Taemie Kim ([email protected]) is a Ph.D. candidate in the Media Laboratory of the Massachusetts Institute of Technology, Cambridge, MA. Douglas “Beaker” Wickert ([email protected]) is a Major in the U.S. Air Force and a program fellow in the Transformational Convergence Technology Office of the Defense Advanced Research Projects Agency, Arlington, VA. © 2011 ACM 0001-0782/11/04 $10.00
april 2 0 1 1 | vol. 5 4 | n o. 4 | c o m m u n ic at i o n s of t he acm
85
review articles doi:10.1145/1924421.1924442
The practice of crowdsourcing is transforming the Web and giving rise to a new field. by AnHai Doan, Raghu Ramakrishnan, and Alon Y. Halevy
Crowdsourcing Systems on the World-Wide Web
enlist a multitude of humans to help solve a wide variety of problems. Over the past decade, numerous such systems have appeared on the World-Wide Web. Prime examples include Wikipedia, Linux, Yahoo! Answers, Mechanical Turk-based systems, and much effort is being directed toward developing many more. As is typical for an emerging area, this effort has appeared under many names, including peer production, user-powered systems, user-generated content, collaborative systems, community systems,
Crowds ourcing systems
86
communi cations o f th e acm
| april 2 0 1 1 | vol . 5 4 | no. 4
social systems, social search, social media, collective intelligence, wikinomics, crowd wisdom, smart mobs, mass collaboration, and human computation. The topic has been discussed extensively in books, popular press, and academia.1,5,15,23,29,35 But this body of work has considered mostly efforts in the physical world.23,29,30 Some do consider crowdsourcing systems on the Web, but only certain system types28,33 or challenges (for example, how to evaluate users12). This survey attempts to provide a global picture of crowdsourcing systems on the Web. We define and classify such systems, then describe a broad sample of systems. The sample
key insights
Crowdsourcing systems face four key challenges: How to recruit contributors, what they can do, how to combine their contributions, and how to manage abuse. Many systems “in the wild” must also carefully balance openness with quality.
The race is on to build general crowd-
sourcing platforms that can be used to quickly build crowdsourcing applications in many domains. Using these, we can already build databases previously unimaginable at lightning speed.
Art work by a a ro n koblin and Ta kashi Kawash ima
ranges from relatively simple well-established systems such as reviewing books to complex emerging systems that build structured knowledge bases to systems that “piggyback” onto other popular systems. We discuss fundamental challenges such as how to recruit and evaluate users, and to merge their contributions. Given the space limitation, we do not attempt to be exhaustive. Rather, we sketch only the most important aspects of the global picture, using real-world examples. The goal is to further our collective understanding—both conceptual and practical—of this important emerging topic. It is also important to note that many crowdsourcing platforms have been built. Examples include Mechanical Turk, Turkit, Mob4hire, uTest, Freelancer, eLance, oDesk, Guru, Topcoder, Trada, 99design, Innocentive, CloudCrowd, and CloudFlower. Using these platforms, we can quickly build crowdsourcing systems in many domains. In this survey, we consider these systems (that is, applications), not the crowdsourcing platforms themselves.
Crowdsourcing Systems Defining crowdsourcing (CS) systems turns out to be surprisingly tricky. Since many view Wikipedia and Linux as well-known CS examples, as a natural starting point, we can say that a CS system enlists a crowd of users to explicitly collaborate to build a longlasting artifact that is beneficial to the whole community. This definition, however, appears too restricted. It excludes, for example, the ESP game,32 where users implicitly collaborate to label images as a side effect while playing the game. ESP clearly benefits from a crowd of users. More importantly, it faces the same human-centric challenges of Wikipedia and Linux, such as how to recruit and evaluate users, and to combine their contributions. Given this, it seems unsatisfactory to consider only explicit collaborations; we ought to allow implicit ones as well. The definition also excludes, for example, an Amazon’s Mechanical Turkbased system that enlists users to find a missing boat in thousands of satellite images.18 Here, users do not build any artifact, arguably nothing is long lasting, and no community exists either
Ten Thousand Cents is a digital artwork by Aaron Koblin that creates a representation of a $100 bill. Using a custom drawing tool, thousands of individuals, working in isolation from one another, painted a tiny part of the bill without knowledge of the overall task.
(just users coming together for this particular task). And yet, like ESP, this system clearly benefits from users, and faces similar human-centric challenges. Given this, it ought to be considered a CS system, and the goal of building artifacts ought to be relaxed into the more general goal of solving problems. Indeed, it appears that in principle any non-trivial problem can benefit from crowdsourcing: we can describe the problem on the Web, solicit user inputs, and examine the inputs to develop a solution. This system may not be practical (and better systems may exist), but it can arguably be considered a primitive CS system. Consequently, we do not restrict the type of collaboration nor the target problem. Rather, we view CS as a general-purpose problem-solving method. We say that a system is a CS system if it enlists a crowd of humans to help solve a problem defined by the system owners, and if in doing so, it addresses the following four fundamental challenges:
april 2 0 1 1 | vol. 5 4 | n o. 4 | c o m m u n ic at i o n s of t he acm
87
review articles A sample of basic CS system types on the World-Wide Web.
Nature of Collaboration Architecture
Explicit
Standalone
Must recruit users? What users do?
Examples
Target Problems
Comments
Evaluating • review, vote, tag
• reviewing and voting at Amazon,
Evaluating a collection of items (e.g., products, users)
Humans as perspective providers. No or loose combination of inputs.
tagging Web pages at del.ici.ous.com and Google Co-op
Sharing
• Napster, YouTube, Flickr, CPAN,
• items • textual knowledge • structured knowledge
programmableweb.com • Mailing lists, Yahoo! Answers, QUIQ, ehow.com, Quora • Swivel, Many Eyes, Google Fusion Tables, Google Base, bmrb.wisc.edu, galaxyzoo, Piazza, Orchestra
Networking
• LinkedIn, MySpace, Facebook
Building social networks
Humans as component providers. Loose combination of inputs.
Building artifacts
• Linux, Apache, Hadoop • Wikipedia, openmind, Intellipedia,
Building physical artifacts
Humans can play all roles. Typically tight combination of inputs. Some systems ask both humans and machines to contribute.
Building a (distributed Humans as content or central) collection providers. No or loose combination of inputs. of items that can be shared among users.
Yes • software • textual knowledge
ecolicommunity • Wikipedia infoboxes/DBpedia, IWP, bases • structured knowledge Google Fusion Tables, YAGO-NAGA, Cimple/DBLife bases • systems • Wikia Search, mahalo, Freebase, • others Eurekster • newspaper at Digg.com, Second Life Task execution
• Finding extraterrestrials, elections,
Possibly any problem
finding people, content creation (e.g., Demand Media, Associated Content) Standalone
Yes
• play games with a
purpose • bet on prediction markets • use private accounts • solve captchas • buy/sell/auction, play massive multiplayer games
Implicit Piggyback on No another system
• keyword search • buy products • browse Web sites
• • • • •
ESP intrade.com, Iowa Electronic Markets IMDB private accounts recaptcha.net eBay, World of Warcraft
• Google, Microsoft, Yahoo • recommendation feature of Amazon • adaptive Web sites
(e.g., Yahoo! front page)
How to recruit and retain users? What contributions can users make? How to combine user contributions to solve the target problem? How to evaluate users and their contributions? Not all human-centric systems address these challenges. Consider a system that manages car traffic in Madison, WI. Its goal is to, say, coordinate the behaviors of a crowd of human drivers (that already exist within the system) in order to minimize traffic jams. Clearly, this system does not want to recruit more human drivers (in fact, it wants far fewer of them). We call such systems crowd management (CM) systems. CM techniques (a.k.a., “crowd 88
communications o f th e ac m
coordination”31) can be relevant to CS contexts. But the two system classes are clearly distinct. In this survey we focus on CS systems that leverage the Web to solve the four challenges mentioned here (or a significant subset of them). The Web is unique in that it can help recruit a large number of users, enable a high degree of automation, and provide a large set of social software (for example, email, wiki, discussion group, blogging, and tagging) that CS systems can use to manage their users. As such, compared to the physical world, the Web can dramatically improve existing CS systems and give birth to novel system types.
| april 2 0 1 1 | vol. 5 4 | no. 4
• • • •
labeling images predicting events rating movies digitizing written text • building a user community (for purposes such as charging fees, advertising)
Humans can play all roles. Input combination can be loose or tight.
• spelling correction,
Humans can play epidemic prediction all roles. Input • recommending combination can be loose or tight. products • reorganizing a Web site for better access
Classifying CS systems. CS systems can be classified along many dimensions. Here, we discuss nine dimensions we consider most important. The two that immediately come to mind are the nature of collaboration and type of target problem. As discussed previously, collaboration can be explicit or implicit, and the target problem can be any problem defined by the system owners (for example, building temporary or permanent artifacts, executing tasks). The next four dimensions refer respectively to how a CS system solves the four fundamental challenges described earlier: how to recruit and retain users; what can users do; how to combine
review articles their inputs; and how to evaluate them. Later, we will discuss these challenges and the corresponding dimensions in detail. Here, we discuss the remaining three dimensions: degree of manual effort, role of human users, and standalone versus piggyback architectures. Degree of manual effort. When building a CS system, we must decide how much manual effort is required to solve each of the four CS challenges. This can range from relatively little (for example, combining ratings) to substantial (for example, combining code), and clearly also depends on how much the system is automated. We must decide how to divide the manual effort between the users and the system owners. Some systems ask the users to do relatively little and the owners a great deal. For example, to detect malicious users, the users may simply click a button to report suspicious behaviors, whereas the owners must carefully examine all relevant evidence to determine if a user is indeed malicious. Some systems do the reverse. For example, most of the manual burden of merging Wikipedia edits falls on the users (who are currently editing), not the owners. Role of human users. We consider four basic roles of humans in a CS system. Slaves: humans help solve the problem in a divide-and-conquer fashion, to minimize the resources (for example, time, effort) of the owners. Examples are ESP and finding a missing boat in satellite images using Mechanical Turk. Perspective providers: humans contribute different perspectives, which when combined often produce a better solution (than with a single human). Examples are reviewing books and aggregating user bets to make predictions.29 Content providers: humans contribute self-generated content (for example, videos on YouTube, images on Flickr). Component providers: humans function as components in the target artifact, such as a social network, or simply just a community of users (so that the owner can, say, sell ads). Humans often play multiple roles within a single CS system (for example, slaves, perspective providers, and content providers in Wikipedia). It is important to know these roles because that may determine how to recruit. For example, to use humans as perspective providers, it is important to recruit a
Compared to the physical world, the Web can dramatically improve existing crowdsourcing systems and give birth to novel system types.
diverse crowd where each human can make independent decisions, to avoid “group think.”29 Standalone versus piggyback. When building a CS system, we may decide to piggyback on a well-established system, by exploiting traces that users leave in that system to solve our target problem. For example, Google’s “Did you mean” and Yahoo’s Search Assist utilize the search log and user clicks of a search engine to correct spelling mistakes. Another system may exploit user purchases in an online bookstore (Amazon) to recommend books. Unlike standalone systems, such piggyback systems do not have to solve the challenges of recruiting users and deciding what they can do. But they still have to decide how to evaluate users and their inputs (such as traces in this case), and to combine such inputs to solve the target problem. Sample CS Systems on the Web Building on this discussion of CS dimensions, we now focus on CS systems on the Web, first describing a set of basic system types, and then showing how deployed CS systems often combine multiple such types. The accompanying table shows a set of basic CS system types. The set is not meant to be exhaustive; it shows only those types that have received most attention. From left to right, it is organized by collaboration, architecture, the need to recruit users, and then by the actions users can take. We now discuss the set, starting with explicit systems. Explicit Systems: These standalone systems let users collaborate explicitly. In particular, users can evaluate, share, network, build artifacts, and execute tasks. We discuss these systems in turn. Evaluating: These systems let users evaluate “items” (for example, books, movies, Web pages, other users) using textual comments, numeric scores, or tags.10 Sharing: These systems let users share “items” such as products, services, textual knowledge, and structured knowledge. Systems that share products and services include Napster, YouTube, CPAN, and the site programmableweb.com (for sharing files, videos, software, and mashups, respectively). Systems that share textual knowledge include mailing lists, Twitter, how-to
april 2 0 1 1 | vol. 5 4 | n o. 4 | c o m m u n i c at i o n s of t he acm
89
review articles
repositories (such as ehow.com, which lets users contribute and search howto articles), Q&A Web sites (such as Yahoo! Answers2), online customer support systems (such as QUIQ,22 which powered Ask Jeeves’ AnswerPoint, a Yahoo! Answers-like site). Systems that share structured knowledge (for example, relational, XML, RDF data) include Swivel, Many Eyes, Google Fusion Tables, Google Base, many escience Web sites (such as bmrb.wisc. edu, galaxyzoo.org), and many peer-topeer systems developed in the Semantic Web, database, AI, and IR communities (such as Orchestra8,27). Swivel, for example, bills itself as the “YouTube 90
communi cations o f th e ac m
of structured data,” which lets users share, query, and visualize census- and voting data, among others. In general, sharing systems can be central (such as YouTube, ehow, Google Fusion Tables, Swivel) or distributed, in a peer-to-peer fashion (such as Napster, Orchestra). Networking: These systems let users collaboratively construct a large social network graph, by adding nodes and edges over time (such as homepages, friendships). Then they exploit the graph to provide services (for example, friend updates, ads, and so on). To a lesser degree, blogging systems are also networking systems in that bloggers often link to other bloggers.
| april 2 0 1 1 | vol. 5 4 | no. 4
A key distinguishing aspect of systems that evaluate, share, or network is that they do not merge user inputs, or do so automatically in relatively simple fashions. For example, evaluation systems typically do not merge textual user reviews. They often merge user inputs such as movie ratings, but do so automatically using some formulas. Similarly, networking systems automatically merge user inputs by adding them as nodes and edges to a social network graph. As a result, users of such systems do not need (and, in fact, often are not allowed) to edit other users’ input. Building Artifacts: In contrast, systems that let users build artifacts such
Art work by a a ro n koblin
review articles
as Wikipedia often merge user inputs tightly, and require users to edit and merge one another’s inputs. A well-known artifact is software (such as Apache, Linux, Hadoop). Another popular artifact is textual knowledge bases (KBs). To build such KBs (such as Wikipedia), users contribute data such as sentences, paragraphs, Web pages, then edit and merge one another’s contributions. The knowledge capture (k-cap.org) and AI communities have studied building such KBs for over a decade. A well-known early attempt is openmind,28 which enlists volunteers to build a KB of commonsense facts (for example, “the sky is blue”). Re-
cently, the success of Wikipedia has inspired many “community wikipedias,” such as Intellipedia (for the U.S. intelligence community) and EcoliHub (at ecolicommunity.org, to capture all information about the E. coli bacterium). Yet another popular target artifact is structured KBs. For example, the set of all Wikipedia infoboxes (that is, attribute-value pairs such as city-name = Madison, state = WI) can be viewed as a structured KB collaboratively created by Wikipedia users. Indeed, this KB has recently been extracted as DBpedia and used in several applications (see dbpedia.org). Freebase.com builds an open structured database, where us-
The Sheep Market by Aaron Koblin is a collection of 10,000 sheep made by workers on Amazon’s Mechanical Turk. Workers were paid $0.02 (USD) to “draw a sheep facing to the left.” Animations of each sheep’s creation may be viewed at TheSheepMarket.com.
ers can create and populate schemas to describe topics of interest, and build collections of interlinked topics using a flexible graph model of data. As yet another example, Google Fusion Tables (tables.googlelabs.com) lets users upload tabular data and collaborate on it by merging tables from different sources, commenting on data items, and sharing visualizations on the Web. Several recent academic projects have also studied building structured
april 2 0 1 1 | vol. 5 4 | n o. 4 | c o m m u n ic at i o n s of t he acm
91
review articles KBs in a CS fashion. The IWP project35 extracts structured data from the textual pages of Wikipedia, then asks users to verify the extraction accuracy. The Cimple/DBLife project4,5 lets users correct the extracted structured data, expose it in wiki pages, then add even more textual and structured data. Thus, it builds structured “community wikipedias,” whose wiki pages mix textual data with structured data (that comes from an underlying structured KB). Other related works include YAGONAGA,11 BioPortal,17 and many recent projects in the Web, Semantic Web, and AI communities.1,16,36 In general, building a structured KB often requires selecting a set of data sources, extracting structured data from them, then integrating the data (for example, matching and merging “David Smith” and “D.M. Smith”). Users can help these steps in two ways. First, they can improve the automatic algorithms of the steps (if any), by editing their code, creating more training data,17 answering their questions12,13 or providing feedback on their output.12,35 Second, users can manually participate in the steps. For example, they can manually add or remove data sources, extract or integrate structured data, or add even more structured data, data not available in the current sources but judged relevant.5 In addition, a CS system may perform inferences over its KB to infer more structured data. To help this step, users can contribute inference rules and domain knowledge.25 During all such activities, users can naturally cross-edit and merge one another’s contributions, just like in those systems that build textual KBs. Another interesting target problem is building and improving systems running on the Web. The project Wikia Search (search.wikia.com) lets users build an open source search engine, by contributing code, suggesting URLs to crawl, and editing search result pages (for example, promoting or demoting URLs). Wikia Search was recently disbanded, but similar features (such as editing search pages) appear in other search engines (such as Google, mahalo.com). Freebase lets users create custom browsing and search systems (deployed at Freebase), using the community-curated data and a suite of development tools 92
communi cations o f th e ac m
(such as the Metaweb query language and a hosted development environment). Eurekster.com lets users collaboratively build vertical search engines called swickis, by customizing a generic search engine (for example, specifying all URLs the system should crawl). Finally, MOBS, an academic project,12,13 studies how to collaboratively build data integration systems, those that provide a uniform query interface to a set of data sources. MOBS enlists users to create a crucial system component, namely the semantic mappings (for example, “location” = “address”) between the data sources. In general, users can help build and improve a system running on the Web in several ways. First, they can edit the system’s code. Second, the system typically contains a set of internal components (such as URLs to crawl, semantic mappings), and users can help improve these without even touching the system’s code (such as adding new URLs, correcting mappings). Third, users can edit system inputs and outputs. In the case of a search engine, for instance, users can suggest that if someone queries for “home equity loan for seniors,” the system should also suggest querying for “reverse mortgage.” Users can also edit search result pages (such as promoting and demoting URLs, as mentioned earlier). Finally, users can monitor the running system and provide feedback. We note that besides software, KBs, and systems, many other target artifacts have also been considered. Examples include community newspapers built by asking users to contribute and evaluate articles (such as Digg) and massive multi-player games that build virtual artifacts (such as Second Life, a 3D virtual world partly built and maintained by users). Executing Tasks: The last type of explicit systems we consider is the kind that executes tasks. Examples include finding extraterrestrials, mining for gold, searching for missing people,23,29,30,31 and cooperative debugging (cs.wisc.edu/cbi, early work of this project received the ACM Doctoral Dissertation Award in 2005). The 2008 election is a well-known example, where the Obama team ran a large online CS operation asking numerous volunteers to help mobilize voters. To apply CS to
| april 2 0 1 1 | vol. 5 4 | no. 4
a task, we must find task parts that can be “crowdsourced,” such that each user can make a contribution and the contributions in turn can be combined to solve the parts. Finding such parts and combining user contributions are often task specific. Crowdsourcing the parts, however, can be fairly general, and plaforms have been developed to assist that process. For example, Amazon’s Mechanical Turk can help distribute pieces of a task to a crowd of users (and several recent interesting toolkits have even been developed for using Mechanical Turk13,37). It was used recently to search for Jim Gray, a database researcher lost at sea, by asking volunteers to examine pieces of satellite images for any sign of Jim Gray’s boat.18 Implicit Systems: As discussed earlier, such systems let users collaborate implicitly to solve a problem of the system owners. They fall into two groups: standalone and piggyback. A standalone system provides a service such that when using it users implicitly collaborate (as a side effect) to solve a problem. Many such systems exist, and the table here lists a few representative examples. The ESP game32 lets users play a game of guessing common words that describe images (shown independently to each user), then uses those words to label images. Google Image Labeler builds on this game, and many other “games with a purpose” exist.33 Prediction markets23,29 let users bet on events (such as elections, sport events), then aggregate the bets to make predictions. The intuition is that the “collective wisdom” is often accurate (under certain conditions)31 and that this helps incorporate inside information available from users. The Internet Movie Database (IMDB) lets users import movies into private accounts (hosted by IMDB). It designed the accounts such that users are strongly motivated to rate the imported movies, as doing so bring many private benefits (such as they can query to find all imported action movies rated at least 7/10, or the system can recommend action movies highly rated by people with similar taste). IMDB then aggregates all private ratings to obtain a public rating for each movie, for the benefit of the public. reCAPTCHA asks users to solve captchas to prove they are humans (to gain access to a site), then leverages
review articles the results for digitizing written text.34 Finally, it can be argued that the target problem of many systems (that provide user services) is simply to grow a large community of users, for various reasons (such as personal satisfaction, charging subscription fees, selling ads, selling the systems to other companies). Buy/ sell/auction websites (such as eBay) and massive multiplayer games (such as World of Warcraft) for instance fit this description. Here, by simply joining the system, users can be viewed as implicitly collaborating to solve the target problem (of growing user communities). The second kind of implicit system we consider is a piggyback system that exploits the user traces of yet another system (thus, making the users of this latter system implicitly collaborate) to solve a problem. For example, over time many piggyback CS systems have been built on top of major search engines, such as Google, Yahoo!, and Microsoft. These systems exploit the traces of search engine users (such as search logs, user clicks) for a wide range of tasks (such as spelling correction, finding synonyms, flu epidemic prediction, and keyword generation for ads6). Other examples include exploiting user purchases to recommend products,26 and exploiting click logs to improve the presentation of a Web site.19 CS Systems on the Web We now build on basic system types to discuss deployed CS systems on the Web. Founded on static HTML pages, the Web soon offered many interactive services. Some services serve machines (such as DNS servers, Google Map API server), but most serve humans. Many such services do not need to recruit users (in the sense that the more the better). Examples include pay-parkingticket services (for city residents) and room-reservation services. (As noted, we call these crowd management systems). Many services, however, face CS challenges, including the need to grow large user bases. For example, online stores such as Amazon want a growing user base for their services, to maximize profits, and startups such as epinions.com grow their user bases for advertising. They started out as primitive CS systems, but quickly improved over time with additional CS features (such as reviewing, rating,
The user interface should make it easy for users to contribute. This is highly non-trivial.
networking). Then around 2003, aided by the proliferation of social software (for example, discussion groups, wiki, blog), many full-fledged CS systems (such as Wikipedia, Flickr, YouTube, Facebook, MySpace) appeared, marking the arrival of Web 2.0. This Web is growing rapidly, with many new CS systems being developed and non-CS systems adding CS features. These CS systems often combine multiple basic CS features. For example, Wikipedia primarily builds a textual KB. But it also builds a structured KB (via infoboxes) and hosts many knowledge sharing forums (for example, discussion groups). YouTube lets users both share and evaluate videos. Community portals often combine all CS features discussed so far. Finally, we note that the Semantic Web, an ambitious attempt to add structure to the Web, can be viewed as a CS attempt to share structured data, and to integrate such data to build a Web-scale structured KB. The World-Wide Web itself is perhaps the largest CS system of all, encompassing everything we have discussed. Challenges and Solutions Here, we discuss the key challenges of CS systems: How to recruit and retain users? Recruiting users is one of the most important CS challenges, for which five major solutions exist. First, we can require users to make contributions if we have the authority to do so (for example, a manager may require 100 employees to help build a company-wide system). Second, we can pay users. Mechanical Turk for example provides a way to pay users on the Web to help with a task. Third, we can ask for volunteers. This solution is free and easy to execute, and hence is most popular. Most current CS systems on the Web (such as Wikipedia, YouTube) use this solution. The downside of volunteering is that it is hard to predict how many users we can recruit for a particular application. The fourth solution is to make users pay for service. The basic idea is to require the users of a system A to “pay” for using A, by contributing to a CS system B. Consider for example a blog website (that is, system A), where a user U can leave a comment only after solving a puzzle (called a captcha) to prove that U is a human. As a part of the puzzle, we
april 2 0 1 1 | vol. 5 4 | n o. 4 | c o m m u n i c at i o n s of t he acm
93
review articles can ask U to retype a word that an OCR program has failed to recognize (the “payment”), thereby contributing to a CS effort on digitizing written text (that is, system B). This is the key idea behind the reCAPTCHA project.34 The MOBS project12,13 employs the same solution. In particular, it ran experiments where a user U can access a Web site (such as a class homepage) only after answering a relatively simple question (such as, is string “1960” in “born in 1960” a birth date?). MOBS leverages the answers to help build a data integration system. This solution works best when the “payment” is unintrusive or cognitively simple, to avoid deterring users from using system A. The fifth solution is to piggyback on the user traces of a well-established system (such as building a spelling correction system by exploiting user traces of a search engine, as discussed previously). This gives us a steady stream of users. But we must still solve the difficult challenge of determining how the traces can be exploited for our purpose. Once we have selected a recruitment strategy, we should consider how to further encourage and retain users. Many encouragement and retention (E&R) schemes exist. We briefly discuss the most popular ones. First, we can provide instant gratification, by immediately showing a user how his or her contribution makes a difference.16 Second, we can provide an enjoyable experience or a necessary service, such as game playing (while making a contribution).32 Third, we can provide ways to establish, measure, and show fame/trust/ reputation.7,13,24,25 Fourth, we can set up competitions, such as showing top rated users. Finally, we can provide ownership situations, where a user may feel he or she “owns” a part of the system, and thus is compelled to “cultivate” that part. For example, zillow.com displays houses and estimates their market prices. It provides a way for a house owner to claim his or her house and provide the correct data (such as number of bedroomss), which in turn helps improve the price estimation. These E&R schemes apply naturally to volunteering, but can also work well for other recruitment solutions. For example, after requiring a set of users to contribute, we can still provide instant gratification, enjoyable experi94
communi cations o f th e ac m
Given the success of current crowdsourcing systems, we expect that this emerging field will grow rapidly.
| april 2 0 1 1 | vol. 5 4 | no. 4
ence, fame management, and so on, to maximize user participation. Finally, we note that deployed CS systems often employ a mixture of recruitment methods (such as bootstrapping with “requirement” or “paying,” then switching to “volunteering” once the system is sufficiently “mature”). What contributions can users make? In many CS systems the kinds of contributions users can make are somewhat limited. For example, to evaluate, users review, rate, or tag; to share, users add items to a central Web site; to network, users link to other users; to find a missing boat in satellite images, users examine those images. In more complex CS systems, however, users often can make a far wider range of contributions, from simple low-hanging fruit to cognitively complex ones. For example, when building a structured KB, users can add a URL, flag incorrect data, and supply attribute-value pairs (as low-hanging fruit).3,5 But they can also supply inference rules, resolve controversial issues, and merge conflicting inputs (as cognitively complex contributions).25 The challenge is to define this range of possible contributions (and design the system such that it can gather a critical crowd of such contributions). Toward this goal, we should consider four important factors. First, how cognitively demanding are the contributions? A CS system often has a way to classify users into groups, such as guests, regulars, editors, admins, and “dictators.” We should take care to design cognitively appropriate contribution types for different user groups. Low-ranking users (such as guests, regulars) often want to make only “easy” contributions (such as answering a simple question, editing one to two sentences, flagging an incorrect data piece). If the cognitive load is high, they may be reluctant to participate. High-ranking users (such as editors, admins) are more willing to make “hard” contributions (such as resolving controversial issues). Second, what should be the impact of a contribution? We can measure the potential impact by considering how the contribution potentially affects the CS system. For example, editing a sentence in a Wikipedia page largely affects only that page, whereas revis-
review articles ing an edit policy may potentially affect million of pages. As another example, when building a structured KB, flagging an incorrect data piece typically has less potential impact than supplying an inference rule, which may be used in many parts of the CS system. Quantifying the potential impact of a contribution type in a complex CS system may be difficult.12,13 But it is important to do so, because we typically have far fewer high-ranking users such as editors and admins (than regulars, say). To maximize the total contribution of these few users, we should ask them to make potentially high-impact contributions whenever possible. Third, what about machine contributions? If a CS system employs an algorithm for a task, then we want human users to make contributions that are easy for humans, but difficult for machines. For example, examining textual and image descriptions to decide if two products match is relatively easy for humans but very difficult for machines. In short, the CS work should be distributed between human users and machines according to what each of them is best at, in a complementary and synergistic fashion. Finally, the user interface should make it easy for users to contribute. This is highly non-trivial. For example, how can users easily enter domain knowledge such as “no current living person was born before 1850” (which can be used in a KB to detect, say, incorrect birth dates)? A natural language format (such as in openmind. org) is easy for users, but difficult for machines to understand and use, and a formal language format has the reverse problem. As another example, when building a structured KB, contributing attribute-value pairs is relatively easy (as Wikipedia infoboxes and Freebase demonstrate). But contributing more complex structured data pieces can be quite difficult for naive users, as this often requires them to learn the KB schema, among others.5 How to combine user contributions? Many CS systems do not combine contributions, or do so in a loose fashion. For example, current evaluation systems do not combine reviews, and combine numeric ratings using relatively simple formulas. Networking systems simply link contributions (homepages
and friendships) to form a social network graph. More complex CS systems, however, such as those that build software, KBs, systems, and games, combine contributions more tightly. Exactly how this happens is application dependent. Wikipedia, for example, lets users manually merge edits, while ESP does so automatically, by waiting until two users agree on a common word. No matter how contributions are combined, a key problem is to decide what to do if users differ, such as when three users assert “A” and two users “not A.” Both automatic and manual solutions have been developed for this problem. Current automatic solutions typically combine contributions weighted by some user scores. The work12,13 for example lets users vote on the correctness of system components (the semantic mappings of a data integration systems in this case20), then combines the votes weighted by the trustworthiness of each user. The work25 lets users contribute structured KB fragments, then combines them into a coherent probabilistic KB by computing the probabilities that each user is correct, then weighting contributed fragments by these probabilities. Manual dispute management solutions typically let users fight and settle among themselves. Unresolved issues then percolate up the user hierarchy. Systems such as Wikipedia and Linux employ such methods. Automatic solutions are more efficient. But they work only for relatively simple forms of contributions (such as voting), or forms that are complex but amenable to algorithmic manipulation (such as structured KB fragments). Manual solutions are still the currently preferred way to combine “messy” conflicting contributions. To further complicate the matter, sometimes not just human users, but machines also make contributions. Combining such contributions is difficult. To see why, suppose we employ a machine M to help create Wikipedia infoboxes.35 Suppose on Day 1 M asserts population = 5500 in a city infobox. On Day 2, a user U may correct this into population = 7500, based on his or her knowledge. On Day 3, however, M may have managed to process more Web data, and obtained higher confidence that population = 5500 is indeed cor-
rect. Should M override U’s assertion? And if so, how can M explain its reasoning to U? The main problem here is it is difficult for a machine to enter into a manual dispute with a human user. The currently preferred method is for M to alert U, and then leave it up to U to decide what to do. But this method clearly will not scale with the number of conflicting contributions. How to evaluate users and contributions? CS systems often must manage malicious users. To do so, we can use a combination of techniques that block, detect, and deter. First, we can block many malicious users by limiting who can make what kinds of contributions. Many e-science CS systems, for example, allow anyone to submit data, but only certain domain scientists to clean and merge this data into the central database. Second, we can detect malicious users and contributions using a variety of techniques. Manual techniques include monitoring the system by the owners, distributing the monitoring workload among a set of trusted users, and enlisting ordinary users (such as flagging bad contributions on message boards). Automatic methods typically involve some tests. For example, a system can ask users questions for which it already knows the answers, then use the answers of the users to compute their reliability scores.13,34 Many other schemes to compute users’ reliability/trust/fame/reputation have been proposed.9,26 Finally, we can deter malicious users with threats of “punishment.” A common punishment is banning. A newer, more controversial form of punishment is “public shaming,” where a user U judged malicious is publicly branded as a malicious or “crazy” user for the rest of the community (possibly without U’s knowledge). For example, a chat room may allow users to rate other users. If the (hidden) score of a user U goes below a threshold, other users will only see a mechanically garbled version of U’s comments, whereas U continues to see his or her comments exactly as written. No matter how well we manage malicious users, malicious contributions often still seep into the system. If so, the CS system must find a way to undo those. If the system does not combine
april 2 0 1 1 | vol. 5 4 | n o. 4 | c o m m u n ic at i o n s of t he acm
95
review articles contributions (such as reviews) or does so only in a loose fashion (such as ratings), undoing is relatively easy. If the system combines contributions tightly, but keeps them localized, then we can still undo with relatively simple logging. For example, user edits in Wikipedia can be combined extensively within a single page, but kept localized to that page (not propagated to other pages). Consequently, we can undo with page-level logging, as Wikipedia does. Hoever, if the contributions are pushed deep into the system, then undoing can be very difficult. For example, suppose an inference rule R is contributed to a KB on Day 1. We then use R to infer many facts, apply other rules to these facts and other facts in the KB to infer more facts, let users edit the facts extensively, and so on. Then on Day 3, should R be found incorrect, it would be very difficult to remove R without reverting the KB to its state on Day 1, thereby losing all good contributions made between Day 1 and Day 3. At the other end of the user spectrum, many CS systems also identify and leverage influential users, using both manual and automatic techniques. For example, productive users in Wikipedia can be recommended by other users, promoted, and given more responsibilities. As another example, certain users of social networks highly influence buy/sell decisions of other users. Consequently, some work has examined how to automatically identify these users, and leverage them in viral marketing within a user community.24 Conclusion We have discussed CS systems on the World-Wide Web. Our discussion shows that crowdsourcing can be applied to a wide variety of problems, and that it raises numerous interesting technical and social challenges. Given the success of current CS systems, we expect that this emerging field will grow rapidly. In the near future, we foresee three major directions: more generic platforms, more applications and structure, and more users and complex contributions. First, the various systems built in the past decade have clearly demonstrated the value of crowdsourcing. The race is 96
communications o f th e ac m
now on to move beyond building individual systems, toward building general CS platforms that can be used to develop such systems quickly. Second, we expect that crowdsourcing will be applied to ever more classes of applications. Many of these applications will be formal and structured in some sense, making it easier to employ automatic techniques and to coordinate them with human users.37–40 In particular, a large chunk of the Web is about data and services. Consequently, we expect crowdsourcing to build structured databases and structured services (Web services with formalized input and output) will receive increasing attention. Finally, we expect many techniques will be developed to engage an ever broader range of users in crowdsourcings, and to enable them, especially naïve users, to make increasingly complex contributions, such as creating software programs and building mashups (without writing any code), and specifying complex structured data pieces (without knowing any structured query languages). References 1. AAAI-08 Workshop. Wikipedia and artificial intelligence: An evolving synergy, 2008. 2. Adamic, L.A., Zhang, J., Bakshy, E. and Ackerman, M.S. Knowledge sharing and Yahoo answers: Everyone knows something. In Proceedings of WWW, 2008. 3. Chai, X., Vuong, B., Doan, A. and Naughton, J.F. Efficiently incorporating user feedback into information extraction and integration programs. In Proceedings of SIGMOD, 2009. 4. The Cimple/DBLife project; http://pages.cs.wisc. edu/~anhai/projects/cimple. 5. DeRose, P., Chai, X., Gao, B.J., Shen, W., Doan, A., Bohannon, P. and Zhu, X. Building community Wikipedias: A machine-human partnership approach. In Proceedings of ICDE, 2008. 6. Fuxman, A., Tsaparas, P., Achan, K. and Agrawal, R. Using the wisdom of the crowds for keyword generation. In Proceedings of WWW, 2008. 7. Golbeck, J. Computing and applying trust in Webbased social network, 2005. Ph.D. Dissertation, University of Maryland. 8. Ives, Z.G., Khandelwal, N., Kapur, A., and Cakir, M. Orchestra: Rapid, collaborative sharing of dynamic data. In Proceedings of CIDR, 2005. 9. Kasneci, G., Ramanath, M., Suchanek, M. and Weiku, G. The yago-naga approach to knowledge discovery. SIGMOD Record 37, 4, (2008), 41–47. 10. Koutrika, G., Bercovitz, B., Kaliszan, F., Liou, H. and Garcia-Molina, H. Courserank: A closed-community social system through the magnifying glass. In The 3rd Int’l AAAI Conference on Weblogs and Social Media (ICWSM), 2009. 11. Little, G., Chilton, L.B., Miller, R.C. and Goldman, M. Turkit: Tools for iterative tasks on mechanical turk, 2009. Technical Report. Available from glittle.org. 12. McCann, R., Doan, A., Varadarajan, V., and Kramnik, A. Building data integration systems: A mass collaboration approach. In WebDB, 2003. 13. McCann, R., Shen, W. and Doan, A. Matching schemas in online communities: A Web 2.0 approach. In Proceedings of ICDE, 2008. 14. McDowell, L., Etzioni, O., Gribble, S.D., Halevy, A.Y., Levy, H.M., Pentney, W., Verma, D. and Vlasseva, S.
| april 2 0 1 1 | vol . 5 4 | no. 4
Mangrove: Enticing ordinary people onto the semantic web via instant gratification. In Proceedings of ISWC, 2003. 15. Mihalcea, R. and Chklovski, T. Building sense tagged corpora with volunteer contributions over the Web. In Proceedings of RANLP, 2003. 16. Noy, N.F., Chugh, A. and Alani, H. The CKC challenge: Exploring tools for collaborative knowledge construction. IEEE Intelligent Systems 23, 1, (2008) 64–68. 17. Noy, N.F., Griffith, N. and Munsen, M.A. Collecting community-based mappings in an ontology repository. In Proceedings of ISWC, 2008. 18. Olson, M. The amateur search. SIGMOD Record 37, 2 (2008), 21–24. 19. Perkowitz, M. and Etzioni, O. Adaptive web sites. Comm. ACM 43, 8 (Aug. 2000). 20. Rahm, E. and Bernstein, P.A. A survey of approaches to automatic schema matching. VLDB J. 10, 4, (2001), 334–350. 21. Ramakrishnan, R. Collaboration and data mining, 2001. Keynote talk, KDD. 22. Ramakrishnan, R., Baptist, A., Ercegovac, A., Hanselman, M., Kabra, N., Marathe, A. and Shaft, U. Mass collaboration: A case study. In Proceedings of IDEAS, 2004. 23. Rheingold, H. Smart Mobs. Perseus Publishing, 2003. 24. Richardson, M. and Domingos, P. Mining knowledgesharing sites for viral marketing. In Proceedings of KDD, 2002. 25. Richardson, M. and Domingos, P. Building large knowledge bases by mass collaboration. In Proceedings of K-CAP, 2003. 26. Sarwar, B.M., Karypis, G., Konstan, J.A. and Riedl, J. Item-based collaborative filtering recommendation algorithms. In Proceedings of WWW, 2001. 27. Steinmetz, R. and Wehrle, K. eds. Peer-to-peer systems and applications. Lecture Notes in Computer Science. 3485; Springer, 2005. 28. Stork, D.G. Using open data collection for intelligent software. IEEE Computer 33, 10, (2000), 104–106. 29. Surowiecki, J. The Wisdom of Crowds. Anchor Books, 2005. 30. Tapscott, D. and Williams, A.D. Wikinomics. Portfolio, 2006. 31. Time. Special Issue Person of the year: You, 2006; http://www.time.com/time/magazine/ article/0,9171,1569514,00.html. 32. von Ahn, L. and Dabbish, L. Labeling images with a computer game. In Proc. of CHI, 2004. 33. von Ahn, L. and Dabbish, L. Designing games with a purpose. Comm. ACM 51, 8 (Aug. 2008), 58–67. 34. von Ahn, L., Maurer, B., McMillen, C., Abraham, D. and Blum, M. Recaptcha: Human-based character recognition via Web security measures. Science 321, 5895, (2008), 1465–1468. 35. Weld, D.S., Wu, F., Adar, E., Amershi, S., Fogarty, J., Hoffmann, R., Patel, K. and Skinner, M. Intelligence in Wikipedia. AAAI, 2008. 36. Workshop on collaborative construction, management and linking of structured knowledge (CK 2009), 2009. http://users.ecs.soton.ac.uk/gc3/iswc-workshop. 37. Franklin, M, Kossman, D., Kraska, T, Ramesh, S, and Xin, R. CrowdDB: Answering queries with crowdsourcing. In Proceedings of SIGMOD 2011. 38. Marcus, A., Wu, E. and Madden, S. Crowdsourcing databases: Query processing with people. In Proceedings of CRDR 2011. 39. Parameswaran, A., Sarma, A., Garcia-Molina, H., Polyzotis, N. and Widom, J. Human-assisted graph search: It’s okay to ask questions. In Proceedings of VLDB 2011. 40. Parameswaran, A. and Polyzotis, N. Answering queries using humans, algorithms, and databases. In Proceedings of CIDR 2011. AnHai Doan ([email protected]) is an associate professor of computer science at the University of Wisconsin-Madison and Chief Scientist at Kosmix Corp. Raghu Ramakrishnan ([email protected]) is Chief Scientist for Search & Cloud Computing, and a Fellow at Yahoo! Research, Silicon Valley, CA, where he heads the Community Systems group. Alon Y. Halevy ([email protected]) heads the Structured Data Group at Google Research, Mountain View, CA. © 2011 ACM 0001-0782/11/04 $10.00
research highlights p. 98
Technical Perspective Liability Issues in Software Engineering
p. 99
Liability Issues in Software Engineering The use of formal methods to reduce legal uncertainties
By Daniel M. Berry
By Daniel Le Métayer, Manuel Maarek, Eduardo Mazza, Marie-Laure Potet, Stéphane Frénot, Valérie Viet Triem Tong, Nicolas Craipeau, and Ronan Hardouin
p. 107
p. 108
Technical Perspective Patterns Hidden from Simple Algorithms By Madhu Sudan
Poly-logarithmic Independence Fools Bounded-Depth Boolean Circuits By Mark Braverman
april 2 0 1 1 | vol. 5 4 | n o. 4 | c o m m u n ic at i o n s of t he acm
97
research highlights doi:10.1145/1924421.1 9 2 4 4 4 3
Technical Perspective Liability Issues in Software Engineering By Daniel M. Berry
The following paper by
LeMétayer et al. addresses one technical issue in a large and serious problem in the production of mass-market software (MMSW), that of the lack of liability by the producers of this MMSW. MMSW is software that is sold—actually licensed—to the consumer on the open market. The Introduction puts the problem in perspective. It points out correctly that “Software contracts usually include strong liability limitations or even exemptions of the providers for damagesa caused by their products.’’ The authors observe correctly that this lack of liability “does not favor the development of high quality software because the software industry does not have sufficient economical incentives to apply stringent development and verification methods.’’ It must be emphasized that the contracts being discussed are for MMSW. Such a contract is generally in the form of a shrink-wrapped, or click-yesto-buy, end-user license agreement (EULA), which the customer must agree to in order to touch or download the MMSW. This EULA typically says that the producer warrants only the medium on which the software is supplied and nothing at all about the software’s functions, and that the producer’s liability is limited to the cost of the product, that is, you get your money back. While the authors do not explicitly say so, the contracts of the paper do not include those for bespoke software (BSW) developed to do specific functions for one customer who intends to be the sole user of the BSW when it is finished. In the contract for such BSW, the producer promises to deliver specific functions and agrees to be liable should the BSW be delivered late, with functions missing or incorrectly implemented, and so on. The difference between contracts a The paper uses the word “damages” both as the law does to mean the cost of the effect of a failure and as the vernacular does to mean the harm caused by the use of a product. 98
communications o f th e acm
for MMSW and contracts for BSW is the power of the customer in the contract negotiations. For BSW, the customer has a lot of power, able to go to a competitor if it finds that it is negotiating with an unreasonable producer. For MMSW, the producer dictates the terms, its lawyers having written the EULA even before the MMSW is put on the market. Basically the producer says to customers “Take it or leave it!”’ The paper cites several calls for MMSW producers to warrant the behavior of their MMSW and to be subject to liability for the behavior of their MMSW, as are manufacturers of consumer electro-mechanical devices. Among these calls was a paper I wrote in 2000, comparing the warranties for typical mass-market consumer appliances, such as vacuum cleaners, to the EULAs for typical MMSW. After observing that the appliances in my house were more reliable than the MMSW on my computers, I concluded that appliance manufacturers warrant and accept liability because they are required to do so by law in most jurisdictions and that MMSW producers warrant nothing and accept no liability because they are not required to do so by law and buyers show that they accept the situation by continuing to buy existing MMSW. The following paper contributes a way to automatically apportion liability among the stakeholders of a system S constructed for the mass market. It describes the stakeholders of S as including the user and the producers of any of S’s needed hardware, software, and Internet supplied services. The paper assumes that all potentially liable stakeholders of S have executed an informal agreement, expressed in natural language. For each particular kind of failure of S to deliver promised behavior, the informal agreement apportions liability among the stakeholders based on the contributions of the user, hardware, software, and services to the failure. The informal agreement is translated into essentially a program P, written in set theory and predicate
| april 2 0 1 1 | vol . 5 4 | no. 4
logic. When S fails to deliver promised behavior and the affected customer seeks compensation for that failure, P is executed to compute each liable stakeholder’s portion of the damages payable to the customer. If the informal agreement is written in precise legalistic style with careful uses of “if,” “then,” “and,” and “or,” then its translation into P seems straightforward. Execution of P examines the trace of events leading to the failure in order to identify which components or user contributed to the failure and then it apportions the liability to the identified stakeholders. The paper gives a formal model of the information that must be in traces to allow this identification and apportionment. This trace examination is an extension of the traditional method of finding a bug’s source by examining a trace of the execution leading to the bug. These traces are negative scenarios,b and traditional methods of identifying a system’s failure modesc can be used to find these negative scenarios. This contribution is an interesting application of lightweight formal methods.1 The paper’s formal model of liability is intentionally strong enough for formalizing liability apportionment agreements, but not for traditional uses of formal methods, such as verifying correctness. The paper’s technique seems to be workable and should help manage liability for systems that are constructed out of components supplied by multiple stakeholders. b http://discuss.it.uts.edu.au/pipermail/reonline/2006-January/000219.html/ c http://en.wikipedia.org/wiki/Failure_mode_ and_effects_analysis/
References 1. Feature, M.S. Rapid application of lightweight formal methods for consistency analyses. IEEE Transactions on Software Engineering 24 (1998). IEEE Computer Society, Los Alamitos, CA, 949-959. Daniel M. Berry ([email protected]) is a professor at the Cheriton School of Computer Science, University of Waterloo, Ontario, Canada. © 2011 ACM 0001-0782/11/04 $10.00
Liability Issues in Software Engineering
doi:10.1145/1924421 . 1 9 2 4 4 4 4
The use of formal methods to reduce legal uncertainties By Daniel Le Métayer, Manuel Maarek, Eduardo Mazza, Marie-Laure Potet, Stéphane Frénot, Valérie Viet Triem Tong, Nicolas Craipeau, and Ronan Hardouin
Abstract This paper reports on the results of a multidisciplinary project involving lawyers and computer scientists with the aim to put forward a set of methods and tools to (1) define software liability in a precise and unambiguous way and (2) establish such liability in case of incident. The overall approach taken in the project is presented through an electronic signature case study. The case study illustrates a situation where, in order to reduce legal uncertainties, the parties wish to include in the contract specific clauses to define as precisely as possible the share of liabilities between them for the main types of failures of the system. 1. INTRODUCTION Software contracts usually include strong liability limitations or even exemptions of the providers for damages caused by their products. This situation does not favor the development of high-quality software because the software industry does not have sufficient economical incentives to apply stringent development and verification methods. Indeed, experience shows that products tend to be of higher quality and more secure when the actors in position to influence their development are also the actors bearing the liability for their defects.1 The usual argument to justify this lack of liability is the fact that software products are too complex and versatile objects whose expected features (and potential defects) cannot be characterized precisely, and which therefore cannot be treated as traditional (tangible) goods. Admittedly, this argument is not without any ground: It is well known that defining in an unambiguous, comprehensive, and understandable way the expected behavior of systems integrating a variety of components is quite a challenge, not to mention the use of such definition as a basis for a liability agreement. Taking up this challenge is precisely our objective: Our aim is to study liability issues both from the legal and technical points of view and to put forward a formal framework to (1) define liability in a precise and unambiguous way and (2) establish such liability in case of incident. Note that we focus on liabilities for software flaws here and do not consider infringement or any other intellectual property right liabilities which involve very different issues. Obviously, specifying all liabilities in a formal framework is neither possible nor desirable. Usually, the parties wish to express as precisely as possible certain aspects which are of prime importance for them and prefer to
state other aspects less precisely (either because it is impossible to foresee at contracting time all the events that may occur or because they do not want to be bound by too precise commitments). Taking this requirement into account, we provide a set of tools and methods to be used on a need basis in the contract drafting process (as opposed to a monolithic, “all or nothing” approach). Our model is based on execution traces which are abstractions of the log files of the system. In a nutshell, liability is specified as a function taking as parameters a claim and an execution trace and returning a set of “responsible” actors. This set of actors (ideally a singleton) depends on the claim and the errors occurring in the trace. Both errors and claims are expressed as trace properties. The liability function can be made as precise or detailed as necessary by choosing the claims and errors relevant for a given situation. The presentation of this paper is based on a case study: an electronic signature application installed on a mobile phone. Section 2 describes the starting point (IT system subject to the agreement, parties involved, informal agreement between the parties and legal context); Section 3 presents our formal framework for the definition of liabilities. Section 4 provides a sketch of related work, and Section 5 identifies avenues for further research. 2. STARTING POINT Let us consider an electronic signature system allowing an e-commerce company (ECC) to send a document to be signed by an individual on his mobile phone. The signature of the document is subject to the individual’s approval (and authentication) and all communications and signature operations are performed through his mobile phone. In a real situation, the activation of the signature system would be preceded by a request from the individual or by a negotiation with the ECC, but we do not consider this negotiation phase here. The mobile phone itself incorporates a smart card (for the verification of the Personal Identification Number This paper presents the results of the LISE (Liability Issues in Software Engineering) project funded by ANR (Agence Nationale de la Recherche) under grant ANR-07-SESU007. A previous version of this paper was published in the Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering, ICSE 2010. april 2 0 1 1 | vol. 5 4 | n o. 4 | c o m m u n i c at i o n s of t he acm
99
research highlights [PIN] of the user) and a signature application. We assume that the mobile phone provider (MPP), the signature application provider (SAP), and the smart card provider (SCP) want to execute an agreement to put such a mobile phone signature solution on the market. In order to reduce legal uncertainties, the parties wish to include in the agreement specific provisions to define as precisely as possible the share of liabilities between them for the main types of failures of the system. Their intention is to use these provisions to settle liability issues in an amicable way by the application of well-defined rules. At this stage, it may be the case that all the components (software and hardware) are already available and the only remaining task is their integration. It may also be the case that some or all the components still have to be developed. In general, no assumption can thus be made on the fact that software components can be designed or modified in a specific way to make the implementation of liabilities easier. The only assumptions made at this stage are: • On the technical side: the availability of the functional architecture of the system (interfaces of the com ponents and informal definition of their expected behavior) •• On the business side: an informal agreement between the parties with respect to the share of liabilities The objective of the infrastructure described in this paper is to allow the parties to translate this informal agreement into a contract which is both valid in the legal sense and as precise as possible, in particular with respect to technical issues, in order to minimize legal uncertainties. In the remainder of this section, we describe the initial technical and legal situation: the IT system itself (Section 2.1), the actors involved (Section 2.2), the informal agreement between the parties signing the agreement (Section 2.3), and the legal context surrounding the agreement (Section 2.4). 2.1. IT system At the start of the contractual phase, the IT system is usually defined in an informal way by its architecture: its components, their interfaces, expected behaviors, and interactions. In our case study, we assume that the electronic signature system is made of the following components: • •• •• ••
A Server (Serv) A Signature Application (SigApp) A Smart Card (Card) A Mobile Input/Output (IO) component which gathers the keyboard and the display of the mobile phone (including their drivers) •• An Operating System (OpSys) All the components except Serv are embedded in the mobile phone. In this paper, we focus on liabilities related to the mobile phone system and do not consider liabilities related to Serv or the communication network. These liabilities could be handled in the same way by adding the ECC and the telecommunication operator as additional 100
communi cations o f t h e ac m | april 2 0 1 1 | vol . 5 4 | no. 4
parties. The only functionality of OpSys that we consider here is its role of medium for all communications between the mobile phone components (i.e., between SigApp, Card, and IO). The architecture of the system and its information flows are pictured in Figure 1. The protocol starts with the ECCECC requesting a signature for document D (message 1). The document is forwarded by Serv and SigApp, and presented to the owner of the mobile phone (OWN) by IO (messages 2, 3, and 4). If OWN refuses to sign, ECC is informed through IO, SigApp, and Serv (messages 5-n, 6-n, 7-n, and 8-n). If OWN agrees, the document and the PIN code entered by OWN are forwarded to Card by SigApp (messages 5-y, 6-y, and 7-y). Next, depending on whether Card authenticates the PIN code or not, the document and the signature produced by Card are sent to ECC via SigApp and Serv (messages 8-y-r, 9-y-r, and 10-y-r), or ECC is informed via SigApp and Serv of the authentication failure (messages 8-y-w, 9-y-w, and 10-y-w). Note that this scenario is a simplified version of a real protocol which is sufficient to illustrate our purpose: for example, in a real system, the PIN code would not be sent in clear for security reasons – it would be provided through a hash encoding. 2.2. Actors We assume that the contract is to be executed by the three parties involved in the manufacture and distribution of the signature solution: • The Mobile Phone Provider (MPP) •• The Signature Application Provider (SAP) •• The Smart Card Provider (SCP) The owner of the mobile phone (OWN) and the ECC are supposed to execute different contracts with MPP which also plays the role of mobile phone operator. We are concerned only with the B2B contract between MPP, SAP, and SCP here. We come back in Section 2.4 to the legal consequences of including the owner of the mobile phone (OWN) among the parties. In the sequel, we shall use the word “party” for MPP, SAP, and SCP, and the word “user” for the end-users of the system (ECC and OWN). Each component in the system is provided by one of the parties. In our case, we assume that: • The SigApp component is provided by SAP. •• The Card component is provided by SCP. •• The IO and OpSys components are provided by MPP. 2.3. Informal agreement The parties wish to define as precisely as possible the share of liabilities between themselves in case of a claim from the owner of the mobile phone. In practice, claims will typically be addressed to MPP because MPP is the only party in direct contact (and contractual relationship) with the owner of the mobile phone (both as a MPP and operator). MPP will have to indemnify the owner of the mobile phone if his claim
Figure 1. Signature protocol.
is valid and may in turn be indemnified by one (or several) of the other parties depending on the type of the claim, the available log files, and the liability share defined in the agreement. In the following, we assume that each document to be signed is originally stamped by ECC and this stamp q is (i) unique, (ii) always included in the messages of a given session, and (iii) never modified. This stamp q can be seen as a transaction number which makes it easier to distinguish messages pertaining to different signature sessions. Lifting or relaxing this assumption is possible, but at the expense of a more complex model. As an illustration, we consider two kinds of claims from the owner of the mobile phone, called DiffDoc and NotSigned, concerning the signature of an alleged document D stamped q: 1. DiffDoc: The plaintiff OWN claims that he has been presented a document D¢ stamped q different from the alleged document D (stamped q). In the case of a purchase order, for example, D and D¢ may differ with respect to the quantity or price of the ordered items. 2. NotSigned: The plaintiff OWN claims that he has never been presented any document stamped q. We assume that the parties agree on the following informal share of liabilities for these two types of claims: 1. If OWN claims that he has been presented a document D¢ stamped q different from the alleged document D (stamped q), then, if this claim is valid a. SAP shall be liable if SigApp has forwarded to OWN a document (stamped q) different from the document received from ECC. b. Otherwise MPP shall be liable. 2. If OWN claims that he has never been presented any
document stamped q, then, if this claim is valid a. If the smart card has wrongly validated a PIN for document D stamped q, then SCP shall be liable. b. Otherwise MPP shall be liable. We do not discuss the value or justifications for this informal agreement here and just take it as an example of a possible share of liabilities. It should be clear that this share of liabilities is the result of a negotiation between the parties, based on a combination of technical as well as business and legal arguments, and it does not have to (and usually cannot) be justified formally. For example, the above rules for the liability by default may be found acceptable by MPP because he takes a significant part of the revenues of the business at the expense of bearing the risk in connection with the customer. The point is that the formal framework should not impose any undue constraint on the share of liabilities but should provide means for the parties to express their wishes as precisely as possible. 2.4. Legal context Even though the intention of the parties is to settle liability issues in an amicable way, according to well-defined rules, it is obviously necessary to take into account the legal context pursuant to computer systems. Any misconception or overlooking of the legal constraints might lead to contractual clauses that could be invalidated in court, thus increasing rather than reducing legal risks. The two main categories of legal constraints to be considered here concern the two main phases of the process: (1) the formal definition of the share of liabilities among the parties and (2) the analysis of the log files to establish these liabilities after the fact. In the following, we examine these two categories of legal constraints in turn. Liability Limitations: The first criterion to be taken into account to assess the validity of contractual liability limitations is the qualification of the parties. For example, april 2 0 1 1 | vol. 5 4 | n o. 4 | c o m m u n ic at i o n s of t h e acm
101
research highlights most jurisdictions provide specific protections to consumers because they are considered as the weak party in a contract with a professional. Let us first consider contracts involving only professionals. Several cases of invalidity of liability limitation clauses are defined by law. The first obvious cases where the liability limitation would be considered null and void are when the party claiming the benefit of the clause has committed acts of intentional fault, willful misrepresentation, or gross negligence. Another case is the situation where the limitation would undermine an essential obligation of a party and would thus introduce an unacceptable imbalance in the contract. This situation is more difficult to assess though, and left to the appraisal of the judge who may either accept the limitation, consider it null, or even impose a different liability cap.a As far as consumers are concerned, the law offers a number of protections which severely restrict the applicability of liability limitation clauses. The philosophy of these rules is that the consumer is in a weak position in the contractual relationship and legal guarantees should be provided to maintain some form of balance in the contract. For example, professionals must provide to their consumers “nonconformance” and “hidden defects” warranties in French law and “implied warranty” (including “merchantability” and “fitness”) in the American Uniform Commercial Code. Any clause which would introduce a significant imbalance at the prejudice of the consumer would be considered unconscionable. Let us note that we have focused on contractual liability here (liability which is defined in the contract itself): Of course, strict liability (when a defect in a product causes personal or property damages) will always apply with respect to third parties (actors who are not parties to the contract). It is still possible though for professionals to define contractual rules specifying their respective share of indemnities due to a victim (third party) by one of the parties.b To conclude this subsection, let us mention other criteria that need to be taken into account to refine the legal analysis, in particular: the qualification of the contract itself (product or service agreement), in case of a product agreement, whether it is qualified as a purchase agreement or a license agreement, the nature of the software (dedicated or off-the-shelf software), the behavior of the actors, etc. Log Files as Evidence: The first observation concerning the contractual use of log files is that digital evidence
The “Faurecia case” illustrates the different interpretations of the notion of “essential contractual obligation” in France. In June 2010, the chamber of commerce of the final court of appeal (“Cour de cassation”) has rejected a referral of the case to the court of appeal which had itself declared that the limitation of liability was not in contradiction with the essential obligation of the software provider (Oracle) because the customer (Faurecia) could get a reasonable compensation. The philosophy of the decision is that the overall balance of the contract and the behavior of the parties should be considered to decide upon the validity of liability limitations. b In European laws, the victim of a defect caused by a product can sue any of the actors involved in the manufacturing or distribution of the product. a
102
communi cations o f t h e acm | april 2 0 1 1 | vol . 5 4 | no. 4
is now put on par with traditional written evidence. In addition, as far as legal facts are concerned (as opposed to legal acts, such as contracts), the general rule is that no constraint is imposed on the means that can be used to provide evidence. As far as legal acts are concerned, the rules depend on the amount of the transaction: for example, no constraint is put on the means to establish evidence for contracts of value less than one thousand and five hundred Euros in France. The logs to be used in the context of our project concern the behavior of software components, which can be qualified as legal facts. Even though they would also be used to establish the existence and content of electronic contracts (as in our case study), we can consider at this stage that their value would be under the threshold imposed by law to require “written evidence” or that the evidence provided by the log files would be accepted as “written evidence” under the aforementioned equivalence principle. A potential obstacle to the use of log files in court could be the principle according to which “no one can form for himself his own evidence.” It seems increasingly admitted, however, that this general principle allows exceptions for evidence produced by computers. As an illustration, the printed list of an airline company showing the late arrival of a traveler at the boarding desk was accepted as evidence by the French Cour de cassation.c Another condition for the validity of log files as evidence is their fairness and legality. For example, a letter, message, or phone conversation recorded without the sender or receiver knowing it cannot be used against them. As far as our project is concerned, attention should be paid to the risk of recording personal data in log files: In certain cases, such recording might be judged unfair and make it impossible to use the log as evidence in court. Generally speaking, to ensure the strength of the logbased evidence provisions in the agreement, it is recommended to define precisely all the technical steps for the production of the log files, their storage, and the means used to ensure their authenticity and integrity. Last but not least, as in the previous subsection, the cases where consumers are involved deserve specific attention with respect to evidence: Any contractual clause limiting the possibilities of the consumer to defend his case by providing useful evidence is likely to be considered unconscionable in court. International Law: To conclude this section, let us mention the issue of applicable law. Needless to say, the information technology business is in essence international and, even though we have focused on European regulations in a first stage, more attention will be paid in the future to broaden the scope of the legal study and understand in which respect differences in laws and jurisdictions should be taken into account in the design of our framework. For example, certain types of liability limitations are more likely to be considered as valid by American courts which put greater emphasis on
Cass. civ. 1re, July 13th. 2004: Bull. civ. 2004, I, n° 207.
c
contractual freedom.14 3. FORMAL SPECIFICATION OF LIABILITIES The share of liabilities between the parties was expressed in Section 2.3 in a traditional, informal way. Texts in natural language, even in legal language, often conceal ambiguities and misleading representations. The situation is even worse when such statements refer to mechanisms which are as complex as software. Such ambiguities are sources of legal uncertainties for the parties executing the contract. The use of formal (mathematical) methods has long been studied and put into practice in the computer science community to define the specification of software systems (their expected behavior) and to prove their correctness or to detect errors in their implementations. For various reasons however (both technical and economical), it remains difficult to apply formal methods at a large scale to prove the correctness of a complete system. In contrast with previous work on formal methods, our goal here is not to apply them to the verification of the system itself (the mobile phone solution in our case study) but to define liabilities in case of malfunction and to build an analysis tool to establish these liabilities from the log files of the system. It should be clear however, as stated in Section 1, that our goal is not to provide a monolithic framework in which all liabilities would have to be expressed. The method proposed here can be used at the discretion of the parties involved and as much as necessary to express the liabilities concerning the features or potential failures deemed to represent the highest sources of risks for them. In this section, we present successively the parameters which are used to establish liabilities (Sections 3.1 and 3.2) before introducing the liability function in Section 3.3. 3.1. Trace model Following the informal description in Section 2, the sets of components, parties, and users are defined as follows: Components = {Serv, SigApp, Card, IO, OpSys} Parties = {MPP, SAP, SCP} Users = {OWN, ECC} Q is the set of stamps and C the set of communicating e ntities (components and users). O and M denote, respectively, the set of communication operations and message contents. The distinction between send and receive events allows us to capture communication errors.d
and the type (document, response, PIN code, and signature) of each element composing a message is implicitly associated with the element itself in order to avoid any ambiguity. We denote by Traces the set of all traces, a trace T being defined as a function associating a stamp with a list of items. Each item is defined by the communication operation (Send or Receive), the sender, the receiver, and the content of the message:
A first comment on the above definition is the fact that we use a functional type (from stamps to lists of items) to r epresent traces. This choice makes the manipulation of traces easier in the sequel because we are always i nterested in the items corresponding to a given session. Other r epresentations could have been chosen as well, such as lists of items i ncluding the stamp information. Let us note that, in order to make the mathematical definitions and the reasoning simpler, the notions used in this section represent an abstraction of the real system. Therefore, we use the term “trace” here and keep the word “log” to denote the actual information recorded by the system. The link between traces and logs is described by Le Métayer et al.12 3.2. Trace properties We present successively the two types of trace properties used in this paper: error properties and claim properties. Error Properties: The most important parameter to determine the allocation of liabilities is the nature of the errors which can be detected in the log files of the system. Ideally, the framework should be general enough to reflect the wishes of the parties and to make it possible to explore the combinations of errors in a systematic way. One possible way to realize this exploration is to start with a specification of the key properties to be satisfied by the system and derive the cases which can lead to the negation of these properties. Our goal being to analyze log files, we characterize the expected properties of the system directly in terms of traces. For example, the fact that SigApp should send to IO the document D received from Serv (and only this document) can be expressed as follows: Same-Doc (T,q) ≡
∀D ∈ Documents, (Receive, Serv, SigApp, [D]) ∈ T (q) ⇔ (Send, SigApp, IO, [D]) ∈ T (q)
In the scenario considered here, the systematic study of the cases of violation of this property leads to the following errors: We assume that signature sessions in traces are complete
This feature is not illustrated in this paper.
d
SigApp-Diff (T,q) ≡ $D, D′ ∈ Documents, D ≠ D′ Ù (Receive, Serv, SigApp, [D]) ∈ T(q) Ù (Send, SigApp, IO, [D′ ]) ∈ T(q) april 2 0 1 1 | vol. 5 4 | n o. 4 | c o m m u n ic at i o n s of t h e acm
103
research highlights SigApp-Not(T,q) ≡
$D ∈ Documents, (Receive, Serv, SigApp, [D]) ∈ T(q) Ù "D′ ∈ Documents, (Send, SigApp, IO, [D′ ]) ∉ T(q)
SigApp-Un(T,q) ≡
$D ∈ Documents, (Send, SigApp, IO, [D]) ∈ T(q) Ù "D′ ∈ Documents, (Receive, Serv, SigApp, [D′ ]) ∉ T(q)
Space considerations prevent us from presenting this s ystematic derivation here. It relies on a decomposition of the negation of the properties into disjunctive normal form and selective application of additional decomposition transformations for “nonexistence” properties. The terms of the above disjunction correspond to three typical types of errors: 1. The first term defines a case where a message is sent with content different from expected. 2. The second term is a case of expected message which is not sent. 3. The third term is a case where an unexpected message is sent. Similarly, the negation of the property that Card returns a signature only when it has received OWN’s PIN code Pown leads to several errors of the three aforementioned types, from which we assume that only two are deemed relevant by the parties. The first one, Card-WrongVal, describes a case where an approval and a signature are sent by Card even though it has not received a right PIN code: Card-WrongVal(T,q) ≡ $D ∈ Documents, $S ∈Signatures, (Send, Card, SigApp, [Yes; D; S]) ∈ T(q) Ù (Receive, SigApp, Card, [D; POWN]) ∉ T(q) The second one, Card-WrongInval, defines a case where Card refuses to sign a document even though it has received the correct PIN code POWN: Card-WrongVal(T,q) ≡
impact on the overall process, but it may make some of the technical steps, such as the analysis of the log architecture, more or less difficult.13 Claim Properties: Claim properties represent the “grounds” for the claims of the users: They correspond to failures of the system as experienced by the users. In practice, such failures should cause damages to the user to give rise to liabilities, but damages are left out of the formal model. Claims can thus be expressed, like errors, as properties on traces. We consider two claim properties here, Claim-DiffDoc and Claim-NotSigned, which define the grounds for the claims introduced in Section 2.3: Claim-DiffDoc(T,q) ≡ $D, D′ ∈ Documents, $S ∈Signatures, D ≠ D′ Ù (Send, SigApp, Serv, [Yes; D; S]) ∈ T(q) Ù (Receive, SigApp, IO, [D′ ]) ∈ T(q) Claim-NotSigned(T,q) ≡ $D ∈ Documents, $S ∈Signatures, (Send, SigApp, Serv, [Yes; D; S]) ∈ T(q) Ù ∀D′ ∈ Documents, (Receive, SigApp, IO, [D′ ]) ∉ T(q) The first definition defines a claim corresponding to a case where OWN has been presented a document D′ with stamp q (as indicated by (Receive, SigApp, IO, [D′]) different from the document D sent by the signature application to the server (message (Send, SigApp, Serv, [Yes; D; S]) ). The second definition defines a claim corresponding to a case where the signature application has sent to the server a message (Send, SigApp, Serv, [Yes; D; S]) indicating that OWN has signed a document stamped q when OWN has never been presented any document stamped q. 3.3. Liability function The formal specification of liabilities can be defined as a function mapping a claim, a trace, and a stamp onto a set of partiese:
where Claims = {DiffDoc, NotSigned}. As an illustration, the following function captures the share of liabilities introduced in Section 2.3: Liability(C, T, q ) ≡ If C = DiffDoc Ù Claim-DiffDoc(T, q ) then If SignApp-Diff(T, q) Then {SAP} Else {MPP} If C = NotSigned Ù Claim-NotSigned(T, q ) then If Card-WrongVal(T, q ) Then {SCP} Else {MPP} Else 0/
$D ∈ Documents, (Send, Card, SigApp, [No; D]) ∈ T(q) Ù (Receive, SigApp, Card, [D; POWN]) ∈ T(q) Needless to say, errors can also be defined directly based on the parties’ understanding of the potential sources of failure of the system and their desire to handle specific cases. The derivation method suggested here can be used when the parties wish to take a more systematic approach to minimize the risk of missing relevant errors. Last but not least, the language used to express properties for this case study is relatively simple as it does not account for the ordering of items in traces. In general, richer logics may be needed, for example, to express temporal properties. The choice of the language of properties does not have any 104
communi cations o f t h e ac m | april 2 0 1 1 | vol . 5 4 | no. 4
The two cases in Liability correspond to the two types of claims considered in Section 2.3. For each type of claim, the goal of the first test is to check the validity of the claim e
P(S) denotes the set of all subsets (powerset) of S.
raised by OWN. If OWN raises a claim which is not confirmed by the trace, then the result of Liability is the empty set because no party has to be made liable for an unjustified claim. Otherwise, if OWN claims to have been presented a document D′ different from the alleged document D, then SAP is liable if SigApp has forwarded to OWN a document (stamped q) different from the document received from ECC (SigApp-Diff(T, q)); otherwise, MPP is liable. Similarly, if OWN’s claim is that he has never been presented any document stamped q, then SCP is liable if the smart card has wrongly validated a PIN in session q (Card-WrongVal(T, q)); otherwise MPP is liable. 4. RELATED WORK The significance of liability, warranty, and accountability and their potential impact on software quality have already been emphasized by computer scientists as well as lawyers.1, 4, 21, 22 However, we are not aware of previous work on the application of formal methods to the definition of software liability. Earlier works on the specification of contracts mostly deal with obligations in a general sense6, 9, 19 or with specific types of contracts such as commercial contracts or privacy rules11, 18 but do not address liabilities related to software errors. Service Level Agreements also define contractual provisions but generally focus on Quality of Service rather than functional requirements and do not put emphasis on formal specifications. A notable exception is the SLAng language24, 25 which is endowed with a formal semantics and can be used to specify a variety of services such as Application Service Provision, Internet Service Provision, or Storage Service Provision. In addition, the monitorability and monitoring of SLAng services have been considered both from the formal and practical points of view.26 Several other connected areas share part of our objectives such as software dependability,3 model-based diagnosis,5, 16 intrusion detection,10 forensics,2, 20 and digital evidence.7, 27, 28 Needless to say, each of these areas is a useful source of inspiration for our project, but we believe that none of them, because of their different objectives, provides the answer to the key problem addressed in this paper, namely, the formal specification and instrumentation of liability. 5. CONCLUSION As stated in Section 1, the goal of this paper is to provide an overview of our approach through a case study. Because the presentation is targeted toward the case study rather than stated in general terms, a number of technical issues and options have not been described here. In this section, we suggest some generalizations and avenues for further research. The notions of traces and properties have been presented in a somewhat simplistic way in this paper. In general, the parties may wish to define liabilities in a more flexible way and establish a link between the erroneous execution of certain components and the failures of the
system (and resulting damages). To address this need, we have defined a concept of “logical causality” in Goesseler et al.8 Causality plays a key role in the legal characterization of liability. It has also been studied for a long time in computer science, but with quite different perspectives and goals. In the distributed systems community, causality is seen essentially as a temporal property. In Goesseler et al.,8 we have defined several variants of logical causality allowing us to express the fact that an event e2 (e.g., a failure) would not have occurred if another event e1 had not occurred (“necessary causality”) or the fact that e2 could not have been avoided as soon as e1 had occurred (“sufficient causality”). We have shown that these causality properties are decidable and proposed trace analysis procedures to establish them. These notions of causality can be integrated in the framework presented here to increase the expressive power of the language used for the definition of liability functions. Another aspect which requires further consideration is the nature of the evidence itself and the values stored in the logs. First, it may be the case that some values, e.g., for privacy or security reasons, must not be recorded in the logs. It may also be the case that not all the relevant information is included in the log files of the system. For example, in our case study, the fact that the owner of the mobile phone has declared the theft of his device or has signed an acknowledgment receipt for a product sent by the ECC can be useful information to analyze the situation (depending on the liability rules decided by the parties). Traces can thus be more than abstract versions of the log files and include other types of actions from all the actors involved. The simple case study used to illustrate this paper did not allow us to discuss the issues related to distributed systems. A key design choice in this context is the distribution of the log files themselves. Indeed, recording log entries on a device controlled by an actor who may be involved in a claim for which this log would be used as evidence may not be acceptable for the other parties. In Le Métayer et al.,13 we introduce a framework for the specification of log architectures and propose criteria to characterize “acceptable log architectures.” These criteria depend on the functional architecture of the system itself and the potential claims between the parties. They can be used to check if a log architecture is appropriate for a given set of potential claims and to suggest improvements to derive an acceptable log architecture from a non-acceptable log architecture. On the formal side, we have shown that, for a given threat model, the logs produced by acceptable log architectures can be trusted as evidence for the determination of liabilities: Technically speaking, any conclusive evaluation of a claim on these logs returns the same answer as the evaluation of the claim on the sequence of real events. As far as the log analysis itself is concerned, we have proposed a formal specification of the analyzer using the B method in Mazza et al.15 and we have shown the correctness of an incremental analysis process. This result makes it possible to build on the output of a first analysis to improve it by considering additional logs or further properties. april 2 0 1 1 | vol. 5 4 | n o. 4 | c o m m u n i c at i o n s of t h e acm
105
research highlights To conclude, we should stress that the methods and tools provided by our framework can be applied in an incremental way, depending on the wishes of the parties, the economic stakes, and the timing constraints for drafting the contract. 1. The first level of application is a systematic (but informal) definition of liabilities in the style of Section 2.3. 2. The second level is the formal definition of liabilities as presented in Section 3.3. This formal definition itself can be more or less detailed and encompass only a part of the liability rules defined informally. In addition, it does not require a complete specification of the software but only the properties relevant for the targeted liability rules. 3. The third level is the implementation of a log infrastructure or the enhancement of existing logging facilities to ensure that all the information required to establish liabilities will be available if a claim is raised. An example of implementation of our framework is described in Le Métayer et al.,12 which defines a log architecture for OSGi. 4. The fourth level is the implementation of a log analyzer to assist human experts in the otherwise tedious and error-prone log inspection. 5. A fifth level would be the verification of the correctness of the log analyzer with respect to the formal definition of liabilities (considering the correspondence between log files and traces). This level would bring an additional guarantee about the validity of the results produced by the system. Each of these levels contributes to reducing further the uncertainties with respect to liabilities and the parties can decide to choose the level commensurate with the risks involved with the potential failures of the system. Last but not least, we are currently working on two other key issues related to log files which have not been discussed here: their optimization in terms of storage (compaction, retention delay, etc.) using an index-based factorization method and techniques to ensure their authenticity and integrity17, 23 including trusted serialization of log items. Acknowledgments We would like to thank all the members of the LISE project, who have, through many discussions and exchanges, contributed to the work described here, in particular Christophe Alleaume, Valérie-Laure Benabou, Denis Beras, Christophe Bidan, Gregor Goessler, Julien Le Clainche, Ludovic Mé, and Sylvain Steer.
References 1. Anderson, R., Moore, T. Information security economics—and beyond. Information Security Summit (IS2)
106
(2009). 2. Arasteh, A.R., Debbabi, M., Sakha, A., Saleh, M. Analyzing multiple logs for forensic evidence. Dig. Invest. 4
communications o f t h e acm | april 2 0 1 1 | vol . 5 4 | no. 4
(2007), 82–91. 3. Avizienis, A., Laprie, J.-C., Randell, B. Fundamental concepts of computer system dependability. In IARP/IEEE-RAS Workshop on robot dependability: technological challenges of dependable robots in human environments (2001). 4. Berry, D.M. Appliances and software: The importance of the buyer's warranty and the developer's liability in promoting the use of systematic quality assurance and formal methods. In Proceedings of the Workshop on Modeling Software System Structures in a Fastly Moving Scenario (Santa Margherita Ligure, Italy, 2000); http://www.montereyworkshop.org/ PROCEEDINGS/BERRY/ 5. Brandan-Briones, L., Lazovik, A., Dague, P. Optimal observability for diagnosability. In International Workshop on Principles of Diagnosis (2008). 6. Farrell, A.D.H., Sergot, M.J., Sallé, M., Bartolini, C. Using the event calculus for tracking the normative state of contracts. Int. J. Coop. Inform. Sys. (IJCIS) 14, 2–3 (2005), 99–129. 7. Gladyshev, P. Enbacka, A. Rigorous development of automated inconsistency checks for digital evidence using the B method. Int. J. Dig. Evidence (IJDE) 6, 2 (2007), 1–21. 8. Goessler, G., Raclet, J.-B., Le Métayer, D. Causality analysis in contract violation. In International Conference on Runtime Verification (RV 2010), LNCS 6418 (2010), 270–284. 9. Governatori, G., Milosevic, Z., Sadiq, S.W. Compliance checking between business processes and business contracts. In EDOC. IEEE Computer Society (2006), 221–232. 10. Jones, A.K., Sielken, R.S. Computer System Intrusion Detection: A Survey, TR, University of Virginia Computer Science Department, 1999. 11. Le Métayer, D. A formal privacy management framework. In Formal Aspects of Security and Trust (FAST), Springer Verlag, LNCS 5491 (2009), 162–176. 12. Le Métayer, D., Maarek, M., Mazza, E., Potet, M.-L., Frénot, S., Viet Triem Tong, V., Craipeau, N., Hardouin, R. Liability in software engineering— Overview of the LISE approach and illustration on a case study. In International Conference on Software Engineering, Volume 1. ACM/IEEE (2010), 135–144. 13. Le Métayer, D, Mazza, E, Potet, M.-L. Designing log architectures for legal evidence. In International Conference on Software Engineering and Formal Methods (SEFM 2010). IEEE (2010), 156–165.
14. Lipovetsky, S. Les clauses limitatives de responsabilité et de garantie dans les contrats informatiques. Approche comparative France/ États-Unis. Quelles limitations. Expertises des systèmes d’information, n° 237 (May 2000), 143–148. 15. Mazza, E, Potet, M.-L., Le Métayer, D. A formal framework for specifying and analyzing logs as electronic evidence. In Brazilian Symposium on Formal Methods (SBMF 2010) (2010). 16. Papadopoulos, Y. Model-based system monitoring and diagnosis of failures using statecharts and fault trees. Reliab. Eng. Syst. Safety 81 (2003), 325–341. 17. Parrend, P. Frénot, S. Security benchmarks of OSGi platforms: Toward hardened OSGi. Softw.– Prac. Exp. (SPE) 39, 5 (2009), 471–499. 18. Peyton Jones, S.L., Eber, J.-M. How to write a financial contract. In The Fun of Programming, Cornerstones of Computing, chapter 6, 2003. 19. Prisacariu, C., Schneider, G. A formal language for electronic contracts. In FMOODS, Springer, LNCS 4468 (2007), 174–189. 20. Rekhis, S., Boudriga, N. A temporal logic-based model for forensic investigation in networked system security. Comput. Netw. Security 3685 (2005), 325–338. 21. Ryan, D.J. Two views on security vand software liability. Let the legal system decide. IEEE Security Privacy (January–February 2003). 22. Schneider, F.B. Accountability for perfection. IEEE Security Privacy (March–April 2009). 23. Schneier, B., Kelsey, J. Secure audit logs to support computer forensics. ACM Trans. Inform. Syst. Security (TISSEC) 2, 2 (1999), 159–176. 24. Skene, J., Lamanna, D.D., Emmerich, W. Precise service level agreements. In ACM/IEEE International Conference on Software Engineering (ICSE), IEEE (2004), 179–188. 25. Skene, J., Raimondi, F., Emmerich, W. Service-level agreements for electronic services. IEEE Tran. Software Eng. (TSE) 36, 2 (2010), 288–304. 26. Skene, J., Skene, A., Crampton, J., Emmerich, W. The monitorability of service-level agreements for application-service provision. In International Workshop on Software and Performance (WOSP), ACM (2007), 3–14. 27. Solon, M., Harper, P. Preparing evidence for court. Digit. Invest. 1 (2004), 279–283. 28. Stephenson, P. Modeling of postincident root cause analysis. Digit. Evidence 2, 2 (2003).
Daniel Le Métayer and Manuel Maarek, INRIA, Grenoble Rhône-Alpes, France.
Valérie Viet Triem Tong, SSIR, Supélec Rennes, France.
Eduardo Mazza and Marie-Laure Potet, VERIMAG, University of Grenoble, France.
Nicolas Craipeau, PrINT, University of Caen, France.
Stéphane Frénot, INRIA, Grenoble Rhône-Alpes, and INSA, Lyon, France.
Ronan Hardouin, DANTE, University of Versailles, Saint-Quentin, France.
© 2011 ACM 0001-0782/11/04 $10.00
research highlights doi:10.1145/1924421.1 9 2 4 4 4 5
Technical Perspective Patterns Hidden from Simple Algorithms By Madhu Sudan
9021960864034418159813 random? Educated opinions might vary from “No! No single string can be random,” to the more contemptuous ”Come on! Those are just the 714th to 733rd digits of π.” Yet, to my limited mind, the string did appear random. Is there a way to use some formal mathematics to justify my naïveté? The modern theory of pseudorandomness2,5 indeed manages to explain such phenomena, where strings appear random to simple minds. The key, this theory argues, is that randomness is really in the “eyes of the beholder,” or rather in the computing power of the tester of randomness. More things appear random to simpler, or resourcelimited, algorithms than to complex, powerful, algorithms. Why should we care? Because randomness is a key resource in computation. Monte Carlo simulations abound in the use of computation for prediction. On the theoretical side too, algorithms get simplified or speeded up incredibly if they use randomness. Fundamental tasks in distributed computing (such as synchronization) can be solved only with randomness. And there is no hope for maintaining privacy and secrecy without randomness. At the same time much of the randomness we deal with in reality is not “pure.” They don’t come as a collection of independent random bits, but are generated by other processes. Knowing how an algorithm, or a computational process, works in the presence of somewhat random strings becomes the essence of a recurring question. (For example, should you really trust nuclear power safety calculations made by a Monte-Carlo algorithm using randomness from the C++ rand program?) Such questions get formalized in the theory of pseudo-randomness as follows: It considers some source of randomness X (formally given by some probability distribution over n-bit strings), and a class of Boolean Is the number
algorithms A (algorithms that decide Boolean questions) and asks if some algorithm in A behaves very differently given access to a random string generated by X, than it does on pure random strings? If every algorithm in A behaves roughly the same on X as on pure randomness, we say X is pseudo-random to A. In the example here, X = Xπ may be the source that picks a random integer i between 1 and 10000 and outputs π [i + 1]; . . . ; π [i + 20] where π [j] denotes the jth digit of π. Given some class of algorithms A one could now ask, does Xπ look pseudo-random to A? Unfortunately answering such questions leads to a fundamental challenge in the theory of computing. Proving, say, that Xπ looks random to A involves showing that no algorithm in A can detect patterns shown by the digits of π. And proving that some class of algorithms can’t do some task is a major challenge to theoretical computer science (the most notorious question being “Is P = NP?”). Given such major obstacles to analyzing the seeming randomness in strings, it is no surprise that the study of pseudo-randomness remains in its infancy. The following paper by Mark Braverman sheds new light on this field. It illustrates that a broad class of sources, as long as they have sufficient local randomness, fool a broad, but relatively simple, class of algorithms— those that compute “AC0 functions.” AC0 functions are those whose inputoutput behavior can be described by a Boolean logic circuit consisting n input wires and polynomially many NOT gates and AND and OR gates of arbitrary fan-in. AC0 functions—that correspond to tasks computable in constant time on highly parallel machines—are at the forefront of the class of computational tasks that we do seem to understand. For instance, AC0 functions can compute the sum of two n /2-bit integers, but not their product (and so we do know things they can’t do). In fact, we even
knew a explicit source of relatively small entropy that appeared pseudo-random to this class.4 Yet our understanding of the essence of what sources fooled this class of algorithms was unknown, leading Linial and Nisan3 to conjecture that a certain form of “local randomness” was sufficient to seem random to this class of algorithms. The local randomness they pointed to is widely termed “limited independence.”A source X is said to be k-wise independent if any particular k bits of the random source are totally random and independent. Linial and Nisan conjectured that for every constant depth circuit, some (log n)O(1) -wise independence would look like pure n bits of randomness. For over two decades this conjecture remained unresolved. Only recently, Bazzi1 resolved a special case of the conjecture (for the case of AC0 functions corresponding to depth 2 circuits, or “DNF formulae”). Now Braverman’s work resolves the conjecture affirmatively. In the process he reveals novel, elegant ways in which AC0 functions can be described by low-degree multivariate polynomials, showing some of the interplay between Boolean computation and more classical mathematics. We need to see many more such connections before we can hope to address the wide challenges of computational complexity (and P vs. NP). Indeed even to address the question implied by the opening paragraph “Are the digits of π pseudo-random to AC0 functions?,” one needs to understand much more. Fortunately, Braverman’s work may have reduced this question to a purely number-theoretic one: Are the digits of π locally random? References 1. Bazzi, L.M.J. Polylogarithmic independence can fool dnf formulas. SIAM J. Comput. 38, 6 (2009), 2220-2272. 2. Blum, M. and Micali, S. How to generate cryptographically strong sequences of pseudorandom bits. SIAM J. on Computing 13, 4 (Nov. 1984), 850-864. 3. Linial, N. and Nisan, N. Approximate inclusionexclusion. Combinatorica 10, 4 (1990), 349-365. 4. Nisan, N. Pseudorandom generators for space-bounded computation. Combinatorica 12, 4 (1992), 449-461. 5. Yao, A.C-C. Theory and applications of trapdoor functions (extended abstract). In Proceedings of FOCS (1982). IEEE, 80-91.
Madhu Sudan ([email protected]) is a Principal Researcher at Microsoft Research New England and the Fujitsu Professor of Electrical Engineering and Computer Science at the Massachusetts Institute of Technology. © 2011 ACM 0001-0782/11/04 $10.00
april 2 0 1 1 | vol. 5 4 | n o. 4 | c o m m u n i c at i o n s of t h e acm
107
doi:10.1145/1924421.1 9 2 4 4 4 6
Poly-logarithmic Independence Fools Bounded-Depth Boolean Circuits By Mark Braverman
1. INTRODUCTION The question of determining which (weak) forms of randomness “fool” (or seem totally random to) a given algorithm is a basic and fundamental question in the modern theory of computer science. In this work we report progress on this question by showing that any “k-wise independent” collection of random bits, for some k = (log n)O(1), fool algorithms computable by bounded depth circuits. In the process we also present known tight connections between boundeddepth circuits and low-degree multivariate polynomials. We establish a new such connection to prove our main result. In the rest of this section, we introduce the basic concepts in greater detail so as to present a precise version of our main result. 1.1 Bounded depth circuits A boolean circuit C is a circuit that is comprised of boolean inputs, boolean outputs, and gates that perform operations on the intermediate values within the circuit. The circuit is not allowed to have loops. In other words, the gates of the circuit form a directed acyclic graph. A circuit C with n input bits and one output naturally gives rise to a boolean function FC : {0, 1}n → {0, 1}. Depending on the type of the circuit, various restrictions may be placed on the size of the circuit, its shape, and the types of gates that it may use. In this paper, we will focus on circuits where unbounded fan-in AND and OR gates, as well as NOT gates are allowed. Since any given circuit accepts a fixed number n of inputs, it can only compute a function over the set {0, 1}n of strings of length n. If we want to discuss computing a function F: {0, 1}* → {0, 1} that takes strings of arbitrary length using circuits, we need to consider families of circuits parametercomputes the ized by input size. A family of circuits {Cn} n=1 function F if each circuit Cn computes the restriction of F to strings of length n. The two most important measures of a circuit’s “power” are size and depth. Circuit size is the total number of gates used in constructing the circuit. For circuits with AND, OR, and NOT gates the depth is defined as the largest number of AND and OR gates a signal needs to traverse from any input to the output. The same measures can be applied to families of circuits. A family {Cn} is of polynomial size if the size of each Cn is bounded by nc for some constant c. Circuit complexity studies the amount of resources (such as size, depth) required to compute various boolean functions. 108
communi cations o f t h e ac m
| April 2 0 1 1 | vol . 5 4 | no. 4
In this context, studying circuits of bounded depth— i.e., using at most a constant d number of layers—is of particular interest. The complexity class capturing functions computed by boolean circuits of bounded depth and polynomial size is denoted by AC0. Thus AC0 captures what can be computed by polynomial size circuits in constant time. The class AC0 has been studied extensively in the past three decades. There are several reasons why AC0 circuits have been studied so extensively. Firstly, AC0 is essentially the only class of circuits for which strong unconditional lowerbounds have been proved: it is known, for example, that any AC0 circuit computing the parity function PARITY (x1, . . . , xn) mod 2 must be of size exponential in n.5,6 Secondly, = there is a very tight connection between computations performed by the polynomial hierarchy (PH) complexity class relative to an oracle and computations by AC0 circuits. Thus better understanding of AC0 translates into a better understanding of the polynomial hierarchy. Finally, the class of AC0 functions, and its subclass of DNF formulas has been the target of numerous efficient machine learning algorithms. It is actually because AC0 is a relatively weak class that functions in this class are amenable to efficient learning: it can be shown that learning functions from larger circuit complexity classes would allow one to break cryptographic primitives such as integer factoring. 1.2 Pseudorandomness: fooling bounded depth circuits Randomness is one of the most important computational resources. A randomized algorithm A may make use of a string of random bits r Î {0, 1}n as part of the computation. The randomness may be used, for example, for Monte Carlo simulations, or for generating random hashes. Denote by U = U{0, 1}n the uniform distribution on boolean strings of length n. Then our randomized algorithm A is executed with r being sampled from U. In reality, truly random samples are virtually impossible to obtain, and we resort to pseudorandom distributions and functions that generate pseudorandom bits, such as the rand function in C. What makes a distribution m over n-bit strings pseudorandom? Turns out that the answer to this question depends on the A previous version of this paper appeared in Proceedings of the IEEE Conference on Computational Complexity (2009) and the Journal of the ACM 57, 5 (2010).
research highlights algorithm A. The distribution m is pseudorandom for A, if the behavior of A on samples from m is indistinguishable from its behavior on truly uniformly random r. In particular, if A was likely to work correctly with truly uniform samples, it will be likely to work correctly with samples drawn from m. For simplicity, suppose that A outputs a single bit. For a distribution m on length-n strings {0, 1}n, we denote by 0 £ Em[A] £ 1 the expected value of F on inputs drawn according to m. When the distribution under consideration is the uniform distribution U on {0, 1}n, we suppress the subscript and let E[A] denote the expected value of A. A distribution m is ε -pseudorandom, or ε-fools A if the expected output when A is fed samples from m is ε-close to the expected output when it is fed truly uniform samples: |Em [A] - E[A]| < ε. Thus, when we use the rand function in our code, we implicitly hope that its output ε-fools our program. Similarly to fooling one algorithm, we can define fooling a class of functions. In particular, a distribution m ε-fools AC0 functions if it ε-fools every function in the class. The smaller (and weaker) the class of functions is, the easier it is to fool. Since AC0 is a relatively weak class of functions, there is hope of fooling them using a distribution m or a relatively low entropy. In the late 1980s Nisan10 demonstrated a family of such distributions. In the present paper we will give a very general class of distributions that fool AC0 circuits, showing that all k-wise independent distributions with k = log O(1) n fool AC0 circuits. 1.3 k-wise independence and the main result A distribution m on {0,1}n is k-independent if every restriction of m to k coordinates is uniform on {0, 1}k. Clearly, the uniform distribution on {0, 1}n is n-independent. A simple example of a distribution that is (n − 1)-independent but not uniform is mÅ := (x1, x2, . . . , xn −1, x1 Å x2 Å . . . Å xn −1), where the bits x1, . . . , xn−1 are selected uniformly at random and Å is the XOR operator. Equivalently, mÅ selects a uniformly random string in {0, 1}n subject to the condition that the number of 1’s selected is even. k-Wise independent distributions can be sometimes used as pseudorandom generators. If m is a k-wise independent distribution, it may require fewer than n bits of true randomness to sample from. For example, the distribution mÅ only requires n − 1 truly random bits to sample from. Alon et al.,1 building on ideas from Joffe7 give a simple construction of a k-wise independent distribution which only requires O(k log n) truly random bits using univariate polynomials. Another way of looking at and constructing k-wise independent distributions is using coding theory. Simple linear codes give rise to a large family of k-wise independent distributions. For any binary code C with distance > k, a uniform distribution on the set
C⊥ = {x : x · y = 0 mod 2, ∀y ∈ C} gives rise to a k-wise independent distribution. The example mÅ above corresponds to the n-times repetition code C = {00 . . . 0, 11 . . . 1} whose distance is n, again showing that mÅ is (n − 1)-wise independent. For which k’s does k-wise independence fool AC0 circuits? It is not hard to see that the problem of distinguishing mÅ from the uniform distribution is as hard as computing PARITY, and thus it is not surprising that mÅ fools AC0 circuits. If a distribution m is k-wise independent, it means that “locally” the distribution looks uniform, and a circuit distinguishing m from the uniform distribution needs to be able to “process” more than k input bits together. Thus, intuitively, k-wise independence should be able to fool AC0 circuits even for fairly small values of k. In 1990, Linial and Nisan9 conjectured that k = (log m)d − 1 (i.e., poly-logarithmic in m) is sufficient to fool AC0 circuits of size m and depth d. The main object of this paper is to give a proof of the following theorem, that was first published in 20093: Theorem 1. r (m, d, ε)-independence ε-fools depth-d AC0 circuits of size m, where r (m, d, ε) = Theorem 1 shows that poly-logarithmic independence suffices to fool all AC0 circuits, thus settling the Linial–Nisan conjecture. A gap remains in the exact dependence of the exponent of the log on d. Prior to Theorem 1, the conjecture had been proved by Bazzi2 in 2007 for the special case of DNF formulas. Bazzi’s original proof was quite involved, and was later greatly simplified by Razborov.12 No nontrivial bound on r(m, d, ε) was known for d > 2. 2. PROOF OVERVIEW The main technical ingredient in our work is presenting a new connection between low-degree polynomials and AC0 circuits. Connections between polynomials and circuits, especially AC0 circuits, have been explored in the past in various contexts, and remain perhaps the most powerful tool for analyzing these functions. We begin by observing that k-wise independent distributions perfectly fool degree-k polynomials. Let m be any k-wise independent distribution over {0, 1}n. As a first step, let be a degree-k monomial. m may depend on at most k different variables, on which m (being k-wise independent) appears to be perfectly uniformly random. Thus Em[m] = E[m]. Now, if f is a degree-k polynomial over x1, . . . , xn, it can be . Each written as a sum of degree-k monomials: monomial is completely fooled by m and thus
(1)
In light of (1), a reasonable attack strategy to prove Theorem 1 would be as follows. For a function F computed by an AC0 circuit, find a polynomial f that approximates F A pril 2 0 1 1 | vol. 5 4 | n o. 4 | c o m m u n i c at i o n s of t h e acm
109
such that: 1. E[F] » E[ f ], i.e., f approximates F well on average; 2. Em[F] » Em[ f ], i.e., f approximates F well on average, even when restrict ourselves to inputs drawn according to m. Then use (1) to conclude that Em[F] » Em[ f ] = E[ f ] » E[F], i.e., that m ε-fools F. Of course, it is not clear at all that such a polynomial must exist (and we do not know whether it does). However, a carefully crafted attack along these lines actually does go through. The proof combines two previous constructions of approximating polynomials for AC0 circuits. The first construction is due to Linial, Mansour, and Nisan8 and is discussed extensively in Section 3. It is an analytic construction that for each AC0 function F provides a low-degree polynomial ˜ f that approximates it well on average. Such a polynomial would easily fulfill requirement 1 above. However, it would not be helpful at fulfilling requirement 2. Since the support of m is quite small, being close on average does not rule out the possibility that F and ˜ f deviate wildly from each other on the points of the support of the distribution m. This type of uniform averagecase approximation is inherently useless when we want F to be close to ˜ f on samples drawn according to m. The second construction is a combinatorial one, and is due to Razborov11 and Smolensky.13 It is discussed in Section 4. For an AC0 function F, and any distribution of inputs ν, the Razborov–Smolensky construction gives a low-degree polynomial f such that F = f with high probability with respect to ν. In other words, ν [{x : f (x) ≠ F(x)}] < ε. Note that here f is actually equal to F most of the time (and is not just close to it). Since ν is allowed to be any distribution, we can take
to be the hybrid between m and the uniform distribution, thus guaranteeing that the probability that f (x) ≠ F(x) is small simultaneously with respect to both distributions. This seems to be a promising improvement over the Linial– Mansour–Nisan polynomials, that may allow us to fulfill both requirements 1 and 2 above. The caveat here is that on the few locations where f (x) ≠ F(x) the deviation | f(x) − F(x)| may be huge. In particular, while f is close to F (more precisely, it is perfectly equal) almost everywhere, it is not necessarily the case that they are close on average. Thus, for example, E[ f ] may differ wildly from E[F]. We see that both polynomial approximations get us “part-way” toward the goals, but do not quite work. To make the proof work we need to combine the two approaches as described in Section 5. In a nutshell, we first take a Razborov–Smolensky polynomial f that agrees 110
communications of t h e ac m | A p r i l 2 0 1 1 | vo l . 5 4 | n o. 4
with F almost everywhere. It turns out that an “error” function E (x) that “flags” the problematic locations, i.e., E (x) = 1 whenever f (x) ≠ F(x), can be computed by (a slightly bigger) AC0 circuit. Thus, it the function f *(x) = (1 − E (x) ) · f (x) were a polynomial, we would be in great shape: f *(x) = F(x) most of the time, while even when they disagree, the difference | f *(x) − F(x)| £ 1, and thus in fact E[ f *] » E[F] and Em[ f *] » Em[F]. Of course, f * is not a polynomial, since E is not a polynomial. However, it turns out with a little bit of work that the polynomial f ¢ = (1 −E� ) · f, where E�is the Linial–Mansour–Nisan approximation of E actually allows us to prove the main theorem. Thus the proof comes out of combining the two approximation techniques in one polynomial. 3. LOW-DEPTH CIRCUITS AND POLYNOMIALS— THE ANALYTIC CONNECTION Approximately a decade after the first AC0 lower bounds were published, Linial, Mansour, and Nisan proved the following remarkable lemma: Lemma 2. If F: {0, 1}n → {0, 1} is a boolean function computable by a depth-d circuit of size m, then for every t there is a degree t polynomial ˜ f with8
The lemma states that any low-depth circuit of size m can be approximated very well by a polynomial of degree (log m)O(d)—a degree that is poly-logarithmic in the size of the circuit. The approximation error here is the averagesquare error: ˜ f doesn’t have to agree with F on any point, but the average value of |F(x) − f (x)|2 when taken over the Hamming cube is small. An illustration of the statement of the lemma can be found in Figure 1. To understand the context of Lemma 2 we need to consider one of the most fundamental tools in the analysis of boolean functions: the Fourier transform, which we will briefly introduce here. An introductory survey on the Fourier transform over the boolean cube can be found in de Wolf.4 For the remainder of this section we will represent the boolean function F: {0, 1}n → {0,1} using a function G: {−1, +1}n → {−1, +1} with 0 corresponding to −1 and 1 to +1. Thus G(x1, . . . , xn) := 2F ((x1 + 1)/2, . . . , (xn + 1)/2) − 1. This will not affect any of the results, but will make all the Figure 1. An illustration of the statement of Lemma 2. For convenience, the boolean cube is represented by the real line. f The function F (in gray) is an AC0 boolean function. The function ˜ (in black) is a real-valued low-degree polynomial approximating F well on average.
˜ f
1
F 0
research highlights calculations much cleaner. Note that a degree-t polynomial approximation of F corresponds to a degree-t polynomial approximation for G and vice-versa. Each function G: {−1, +1}n → {−1, +1} can be viewed as n a 2n-dimensional vector in R2 specified by its truth table. Consider the special parity functions χS: {−1, +1}n → {−1, +1}. For each set S ⊂ {1, . . . , n} of coordinates, let
In other words, χS is the parity of the x’s that correspond to coordinates in the set S. There is a total of 2n different functions χS—one corresponding to each subset S of coordinates. The function χ0/ is the constant function χ0/(x) = 1, while the function χ{1, . . . , n}(x) outputs the parity of all the coordinates in x. Similarly to the function G, each function χS can also be n viewed as a vector in R2 specified by the values it takes on all possible inputs. By abuse of notation we will denote these vectors by χS as well. Moreover, for each two sets S1 ¹ S2 the vectors χS and χS are orthogonal to each other. That is, 1
2
n
Thus we have 2n orthogonal vectors {χS} in R2 , and when properly normalize they yield an orthonormal basis of the n space. In other words, each vector in R2 can be uniquely represented as a linear combination of the functions χS. In particular, the function G can be uniquely written as:
(2)
^(S) Î R are given by the where the numerical coefficients G inner product
The representation of G given by (2) is called the Fourier transform of G. The transform converts 2n numbers (the ^(S) ), thus truth table of G) into 2n numbers (the coefficients G preserving all the information about G. Another way to view the representation (2) is as a canonical way of representing G as a multilinear polynomial with real-valued coefficients:
(3)
Thus each function G can be uniquely represented as a degree-n multilinear polynomial (a simple fact that can be proved directly, without using the Fourier transform). More importantly, when we view the functions χS as vectors, the space Ht of polynomials of degree £ t, for any t, is spanned by the functions {χS: |S| £ t}. This means that to get the best low-degree approximation for G we should simply project it onto Ht, to obtain
(4)
Note that � g is no longer a boolean function, and may take arbitrary real values, even on inputs from {−1, +1}n. The 2-error—which we need to bound to prove Lemma 2—is
given by
(5)
To bound (5) one needs to show that all but a very small fraction of weight in the Fourier representation of G is concentrated on low-degree coefficients. This is where the magic of Linial et al.8 happens, and this is where the fact that G is computed by an AC0 circuit is being used. At a very high level, the proofs uses random restrictions. A random restriction r assigns the majority of G’s inputs a random value 0 or 1, while leaving some inputs unset. A tool called the Switching Lemma is then used to show that when a random restriction r is applied to and AC0 function G, the resulting boolean function G|r is highly likely to be very “simple”—depending on only few of the inputs. On the other hand, a random restriction applied to χS where |S| is large, is likely to leave some of the inputs in S unset. Thus if ^(S) with large |S|, G had a lot of weight on the coefficients G G|r would be likely to remain “complicated” even after the application of r. Lemma 2 immediately implies a lower bound for an AC0 circuit computing PARITY, since the Fourier representation of PARITY is χS(PARITY) = 1 for S = [n], and 0 otherwise. Thus PARITY cannot be approximated well by a polynomial even of degree t = n − 1. One interesting application of Lemma 2 that is outside of the scope of the present article is for learning AC0 functions. Suppose that G is an unknown AC0 function of size m, and we are given access to examples of the form (x, G(x) ), where x is chosen uniformly from {−1, +1}n. Then we know that G has a good polynomial approximation of the form (4). Thus to (approximately) learn G, all we have to do is learn the coefficients G^(S), where S is small. This can be done in time proportional to 2|S|, immediately yielding an O(d) n(log m) algorithm for learning AC0 circuits. 4. LOW-DEPTH CIRCUITS AND POLYNOMIALS— THE ALGEBRAIC CONNECTION It turns out that average-error approximations are not the only useful way in which AC0 circuits can be approximated by low-degree polynomials. We have seen in the previous section that the representation of any boolean function by a multilinear polynomial is unique, and thus given an AC0 function F we cannot hope to obtain a low-degree polynomial f that agrees with F everywhere. We will see, however, that it is possible to construct a low-degree polynomial f that agrees with F almost everywhere. Specifically, the following statement holds: Lemma 3. Let ν be any probability distribution on {0, 1}n. For a circuit of depth d and size m computing a function F, for any s, there is a degree r = (s . log m)d polynomial f such that Pν[ f(x) < F(x)] < 0.82sm. Note that Lemma 3 promises an approximating polynomial, again of degree (log m)O(d), that has a low error against an arbitrary distribution ν on inputs. The statement of the lemma is illustrated on Figure 2(a), where the probability A pril 2 0 1 1 | vol. 5 4 | n o. 4 | c o m m u n i c at i o n s of t h e acm
111
research highlights
Figure 2. An illustration of the statement of Lemma 3 (a), and Lemma 4 (b). Note that when the low degree polynomial f does disagree with F we have no good guarantee on the error. This means that f may be a good approximant almost everywhere but not on average.
Figure 3. An illustration of the inductive step where an approximating polynomial f is constructed from the approximating polynomials g1, g2, . . . , gk.
f≈F=G1∧G2∧...∧Gk
(a)
f
AND
1
F 0
G1 ≈g1
(b)
Eν
according to ν of the region of disagreement between f and F is small. Lemma 3 (or its slight variation) has been proved by Razborov11 and Smolensky13 in the late 1980s. The tools for the proof of the lemma have been developed in Valiant and Vazirani.14 Razborov and Smolensky used the lemma to give stronger lower bounds on bounded depth circuits. Let AC0[ p] ⊃ AC0 be the class of functions computable using a bounded-depth circuit that in addition to the AND and OR gates is allowed to use the MODp gate: the gate outputs 1 if and only if the number of 1’s among its inputs is divisible by p. We already know that PARITY Ï AC0, and thus AC0[2] AC0. Razborov and Smolensky showed that the MODp function cannot be computed efficiently by an AC0[q] circuit where q ¹ p. In particular, this means that PARITY Ï AC0[3]. The proof of Lemma 3 is combinatorial, and is much simpler than the proof of Lemma 2. In fact, we will be able to present its entire proof below. To obtain the results in Section 5 we will need a slight strengthening of the lemma. Lemma 4. Let ν be any probability distribution on {0, 1}n. For a circuit of depth d and size m computing a function F, for any s, there is a degree r = (s · log m)d polynomial f and a boolean function Eν computable by a circuit of depth £ d + 3 and size O(m2r) such that • Pν[Eν(x) = 1] < 0.82sm, and •• whenever Eν(x) = 0, f(x) = F(x). Thus, not only does the polynomial from Lemma 3 exist, but there is a simple AC0 “error circuit” Eν that given an input can tell us whether f will make an error on the input. The function Eν is shown on Figure 2(b). We will now prove Lemma 4. A curious property of the construction is that it gives a probabilistic algorithm for producing the low degree polynomial f that approximates F almost everywhere. Proof. (of Lemma 4) We construct the polynomial f by induction on the depth d of F, and show that with high probability f = F. The function Eν follows from the construction. Note that we do not know anything about the measure ν and communi cations o f t h e acm | April 2 0 1 1 | vol . 5 4 | no. 4
G1 ≈gk
1 0
112
G2 ≈g2
thus cannot give an explicit construction for f. Instead, we will construct a distribution on polynomials f that succeeds with high probability on any given input. Thus the distribution is expected to have a low error with respect to ν, which implies that there is a specific f that has a small error with respect to ν. We will show how to make a step when the output gate in f is an AND gate (see Figure 3). Since the whole construction is symmetric with respect to 0 and 1, the step also holds for an OR gate. Let F = G1 ∧ G2 ∧ . . . ∧ Gk , where k < m. For convenience, let us assume that k = 2ℓ is a power of 2. We take a collection of t := s log m random subsets of {1, 2, . . . , k} where each element is included with probability p independently of the others: at least s subsets for each of the p = 2−1, 2−2, . . . , 2−ℓ = 1/k. Denote the sets by S1, . . . , St—we ignore empty sets. In addition, we make sure to include {1, . . . , k} as one of the sets. Let g1, . . . , gk be the approximating polynomials for G1, . . . , Gk that are guaranteed by the induction hypothesis applied to G1, . . . , Gk with depth d − 1. We set
By the induction assumption, the degrees of gj are degg £ (s . log m)d−1, hence the degree of f is bounded by degf £ t . degg £ (s . log m)d. Next we bound the error P[ f ¹ F]. It consists of two terms:
(6)
In other words, to make a mistake, either one of the inputs to the final AND gate has to be wrong, or the approximating function for the AND has to make a mistake. We will focus on the second term. The first term is bounded by union bound. We fix a vector of specific values G1(x), . . . , Gk(x), and calculate the probability of an error over the possible choices of the random sets Si.
If all the Gj(x)’s are 1 then the value of F(x) = 1 is calculated correctly with probability 1. Suppose that F(x) = 0 (and thus at least one of the Gj(x)’s is 0). Let 1 £ z £ k be the number of zeros among G1(x), . . . , Gk(x), and a be such that 2a £ z < 2a+1. Our formula will work correctly if one of the sets Si hits exactly one 0 among the z zeros of G1(x), . . . , Gk(x). We will consider only the sets Si above that are likely to hit exactly one zero. Specifically, let S be a random set as above with p = 2− a −1. The probability of S hitting exactly one zero is exactly
Hence the probability of the formula being wrong after s such sets is bounded by 0.82s. Since this is true for any value of x, we can find a collection of sets Si such that the probability of error as measured by ν is at most 0.82s. By making the same probabilistic argument at every node and applying the union bound, we get that the condition “if the inputs are correct then the output is correct” is satisfied by all nodes except with probability < 0.82sm. Thus the error of the polynomial is < 0.82sm. Finally, if we know the sets Si at every node, it is easy to check whether there is a mistake by checking that no set contains exactly one 0, thus yielding the depth £ (d + 3) function Eν. The blowup in size is at most O(mr) since at each node we take a disjunction over all the possible pairs of (Si, Gj ∈ Si) of whether Gj is the only 0 in the set Si. 5. K-WISE INDEPENDENT DISTRIBUTIONS AND LOW-DEPTH CIRCUITS Next we turn our attention to proving the main result, Theorem 1. The result will give another way in which AC0 circuits can be approximated using low-degree polynomials. Theorem 1. r(m, d, ε)-independence ε-fools depth-d AC0 circuits of size m, where
To prove Theorem 1 we will show: Lemma 5. Let s ≥ log m be any parameter. Let F be a boolean function computed by a circuit of depth d and size m. Let m be an r-independent distribution where r ³ r(s, d ) = 3 · 60d+3 · (log m)(d+1)(d+3) · sd(d+3), then
appropriate approximation of F with low degree polynomials. Bazzi2 has given the following equivalent characterization of fooling through polynomial approximations using linear programming duality: Lemma 6. Let F: {0, 1}n → {0, 1} be a boolean function, k ≥ 0 an integer, and ε > 0. Then the following are equivalent:2 1. Any k-wise independent distribution ε-fools F. 2. there exist a pair of “sandwiching polynomials” fl, fu: {0, 1}n → R such that: • low degree: deg( fl ), deg( fu) ≤ k; •• sandwiching: fl ≤ F ≤ fu on {0, 1}n; •• small error: E[F − fl ], E[ fu − F] ≤ ε, where the expectation is with respect to the uniform distribution on {0, 1}n. The sandwiching polynomials are illustrated on Figure 4(a). Since part (1) in Lemma 6 is what we need to prove, we will only be interested in the “(2) ⇒ (1)” direction of the theorem, which is the “easy” direction, and which we can prove here. Suppose (2) holds, and let m be any k-wise independent distribution. The polynomial fl is of degree £ k, which as observed in Section 2, implies that Em[ fl ] = E[ fl ]. Thus, Em [F] ≥ Em [ fl] = E[ fl ] = E[F] − E[F − fl] ≥ E[ F] − ε. The first inequality uses the sandwiching property, and the last inequality uses the small error property. Similarly, Em [F] £ E[F] + ε, implying that m ε-fools F. Thus a problem about fooling AC0 circuits is actually another problem on approximating circuits with polynomials! Our actual proof of Lemma 5 will not produce a pair of sandwiching polynomials, but will “almost” produce such a pair. By the “(1) ⇒ (2)” direction of Lemma 6 we know that Theorem 1 implies that such a pair must exist for each AC0 function.
Figure 4. An illustration of sandwiching polynomials (a) and one-sided approximations (b)–(c).
(a)
1
F
fu
0
fl 1
(b)
F
f0
|Em [F] − E[F]| < ε(s, d),
0
where ε(s, d) = 0.82s · (10m).
1
(c)
Theorem 1 follows from Lemma 5 by taking As mentioned in the overview, one can prove that k-wise independence fools a function F by constructing an
fl=1–(1–¶0)2
A pril 2 0 1 1 | vol. 5 4 | n o. 4 | c o m m u n i c at i o n s of t h e acm
F 0
113
research highlights 5.1 Proof of Lemma 5 The proof will combine the two types of polynomial approximations that we discussed in Sections 3 and 4. When we produced the almost-everywhere-correct polynomials in Section 4 we noted that when the polynomial f does disagree with F, the disagreement might be very large. It is still possible to give the following very crude bound on the maximum value f ∞ that | f | may attain on {0, 1}n: Claim 7. In Lemma 4, for s ³ log m, f ∞ < (2m)
deg( f ) − 2
Figure 5. The proof of Lemma 8.
(a)
1
¶
F 0
(b)
1
Eν
F
(s·log m)d −2
= (2m)
.
The claim follows by an easy induction from the proof of Lemma 4. We omit the proof here. A low-degree polynomial f0 is a one-sided approximation of a boolean F (see Figure 4(b) ) if: • f0 is a good approximation: F – f022 is small. •• f0’s error is one-sided: f0(x) = 0 whenever F(x) = 0. If F had such an approximation, then the polynomial fl := 1 − (1 − f0)2 would be a (lower) sandwiching polynomial for F. fl still has a low degree, and
0 (c)
1
F=F∨E ν
F 0
(d)
1
1– Eν
1–Eν
0 (e)
1
¶=¶◊(1–E ν)
F 0
E[F − fl ] = F – f0 22 is small. This process is illustrated in Figure 4(c). Thus, being able to produce one-sided approximations (that combine characteristics from both Sections 3 and 4 approximations) would be enough. Unfortunately, we are unable to construct such polynomials. Instead, we show how to modify F just a little bit to obtain a boolean function F¢. The change is slight enough with respect to both m and the uniform measure so that fooling F¢ and fooling F is almost equivalent. We then show that the modified F¢ does have a one-sided approximation f ¢. The properties of F¢ and f ¢ are summarized in the following lemma: Lemma 8. Let F be computed by a circuit of depth d and size m. Let s1, s2 be two parameters with s1 · log m. Let m be any probability distribution on {0, 1}n, and U{0, 1}n be the uniform distribution on {0, 1}n. Set
F ∨ Eν so that f = 0 whenever F ¢ = 0. The problem is that the error |F¢ − f |2 may be large in the area where Eν = 1 (and thus f behaves poorly). If we could multiply f ¢ = f ¢·(1 − Eν), we would be done, as this would “kill” all the locations where Eν is 1. Unfortunately, we cannot do this since Eν is not a low degree polynomial. Here we use the fact that Eν is itself an AC0 function, and thus can be approximated well by a polynomial E˜ν by the Linial–Mansour–Nisan approximation (Lemma 2). A brief calculation indeed shows that f ¢ = f · (1 − E˜ν) for an appropriately chosen E˜ν satisfies all the conditions. Proof. The first property follows from the definition of F¢. The second one follows from Lemma 4 directly, since Pm [Eν = 1] ≤ 2 · Pν [Eν = 1] < 2 · 0.82s1 m.
Let Eν be the function from Lemma 4 with s = s1. Set F¢ = F ∨ Eν. Then there is a polynomial f ¢ of degree rf £ (s1 · log m)d + s2, such that • •• •• ••
F ≤ F¢ on {0, 1}n; Pm[F ¹ F¢ ] < 2 . 0.82s1m; 1/(d+3) d F¢ – f ¢22 < 0.82s1 ·(4m) + 22.9(s1·log m) log m − s2 /20, and f ¢(x) = 0 whenever F¢ (x) = 0.
Proof idea: The proof is illustrated on Figure 5. We start with a polynomial approximation f for F that works “almost everywhere”. By Lemma 4 we know that there is an AC0 function Eν that “flags” the locations of all the errors. We fix F ¢= 114
communi cations o f t h e ac m | April 2 0 1 1 | vol . 5 4 | no. 4
Note also that P[Eν = 1] ≤ 2 · Pν [Eν = 1] < 2 · 0.82s1 m. Let f be the approximating polynomial for F from that lemma, so that F = F ¢ = f whenever Eν = 0, and thus f = 0 whenever F ¢ = 0. By Proposition 7 we have d
d log m
f ∞ < (2m)(s1·log m) < 21.4(s1·log m)
.
We let E˜ν be the low degree approximation of Eν of degree s2. By Linial et al.8 (Lemma 2 above), we have /(d+3) Eν - E˜ν22 < O(m3) · 2 −s12 /20.
Let
function computed by a circuit of depth d and size m. Let m be an r-independent distribution where
f ¢ := f · (1 − E˜ν). 2
Then f ¢ = 0 whenever F¢ = 0. It remains to estimate F ¢ – f :
r ³ 3 · 60d+3 · (log m)(d+1)(d+3) · sd(d+3),
2
then Em [F ] > E[F ] − ε(s, d) where ε(s, d) = 0.82s · (10m). which completes the proof.
We can now use Lemma 8 to give each AC0 function F a lower sandwiching polynomial, at least after a little “massaging”:
Proof. Let F¢ be the boolean function and let f l ¢ be the polynomial from Lemma 9. The degree of f l ¢ is < r. We use the fact that since m is r-independent, Em[ f l ¢] = E[ f l ¢] (since k-wise independence fools degree-k polynomials, as discussed above): Em [F] ³ Em [F¢] − Pm [F ¹ F¢] > Em[ f l ¢] − ε(s, d)/2 = E[ f l ¢] − ε (s, d)/2 = E[F¢] − E[F¢ − f l ¢ ] − ε(s, d)/2 ³ E[F¢] − ε (s, d) ³ E[F ] − ε(s, d).
Lemma 9. For every boolean circuit F of depth d and size m and any s ³ log m, and for any probability distribution m on {0, 1} there is a boolean function F¢ and a polynomial f l ¢ of degree less than rf = 3 · 60d+3 · (log m)(d+1)(d+3) · sd (d+3) such that • •• •• ••
The dual inequality to Lemma 10 follows immediately by − applying the lemma to the negation F = 1 − F of F. We have − − Em[F ] > E[F ] − ε(s, d), and thus
Pm[F ¹ F¢] < ε (s, d)/2, F ¢ F¢ on {0, 1}n, f l ¢ £ F¢ on {0, 1}n, and E [F¢ – fl ¢] < ε (s, d)/2,
− − Em[F] = 1 − Em[F ] < 1 − E[F ] + ε(s, d) = E[F] + ε(s, d). Together, these two statements yield Lemma 5.
where ε(s, d) = 0.82s · (10m). Proof. Let F¢ be the boolean function and let f ¢ be the polynomial from Lemma 8 with s1 = s and s2 » 60d+3 · (log m) (d+1)(d+3) d(d+3) ·s . The first two properties follow directly from the lemma. Set f l ¢ := 1 - (1 - f ¢)2. It is clear that f l ¢ £ 1 and moreover f l ¢ = 0 whenever F¢ = 0, hence f l ¢ £ F¢. Finally, F¢(x) - f l ¢(x) = 0 when F¢(x) = 0, and is equal to F¢(x) - f l ¢(x) = (1 - f ¢(x))2 = (F¢(x) - f ¢(x))2 when F¢(x) = 1, thus 2
E[F¢ − f l ¢ ] < F ¢ –f ¢ < 0.82s · (5m) = ε(s, d)/2 2
by Lemma 8. To finish the proof we note that the degree of f l ¢ is bounded by 2 · deg ( f ¢) £ 2 · ((s1 · log m)d + s2 ) < 2.5 · s2 < rf .
References 1. Alon, N., Babai, L., and Itai, A. A fast and simple randomized parallel algorithm for the maximal independent set problem. J. Algorithm. 7, 4 (1986), 567–583. 2. Bazzi, L.M.J. Polylogarithmic independence can fool DNF formulas. SIAM J. Comput. (2009), to appear. 3. Braverman, M. Polylogarithmic independence fools AC0 circuits. J. ACM 57, 5 (2010), 1–10. 4. de Wolf, R. A Brief Introduction to Fourier Analysis on the Boolean Cube. Number 1 in Graduate Surveys. Theory of Computing Library, 2008. 5. Furst, M., Saxe, J., Sipser, M. Parity, circuits, and the polynomial-time hierarchy. Theor. Comput. Syst. 17, 1 (1984), 13–27. 6. Hastad, J. Almost optimal lower bounds for small depth circuits. In Proceedings of the Eighteenth Annual ACM Symposium on Theory of Computing (1986), ACM, NY, 20. 7. Joffe, A. On a set of almost deterministic k-independent random variables. Ann. Probab. 2, 1 (1974), 161–162. 8. Linial, N., Mansour, Y., Nisan, N. Constant depth circuits, Fourier
Lemma 9 implies the following:
Mark Braverman, University of Toronto.
Lemma 10. Let s ³ log m be any parameter. Let F be a boolean
© 2011 ACM 0001-0782/11/04 $10.00
transform, and learnability. J. ACM 40, 3 (1993), 607–620. 9. Linial, N., Nisan, N. Approximate inclusion-exclusion. Combinatorica 10, 4 (1990), 349–365. 10. Nisan, N. Pseudorandom bits for constant depth circuits. Combinatorica 11, 1 (1991), 63–70. 11. Razborov, A.A. Lower bounds on the size of bounded-depth networks over a complete basis with logical addition. Math. Notes Acad. Sci. USSR 41, 4 (1987), 333–338. 12. Razborov, A.A. A simple proof of Bazzi’s theorem. In Electronic Colloquium on Computational Complexity. Report TR08-081, 2008. 13. Smolensky, R. Algebraic methods in the theory of lower bounds for Boolean circuit complexity. In Proceedings of the Nineteenth Annual ACM Symposium on Theory of Computing (STOC’87) (1987), ACM, NY, 77–82. 14. Valiant, L.G., Vazirani, V.V. NP is as easy as detecting unique solutions. In Proceedings of the Seventeenth Annual ACM Symposium on Theory of Computing (STOC’85) (1985), ACM, NY, 458–463.
A pril 2 0 1 1 | vol. 5 4 | n o. 4 | c o m m u n i c at i o n s of t h e acm
115
careers Aalto University School of Science Department of Media Technology Tenured or Tenure-Track Faculty Position Professor The Department of Media Technology at Aalto University in Espoo, Finland, invites applications for a tenured or tenure-track position. The research conducted at the Department covers digital media widely, themes varying from digital imaging and interaction to modeling of room acoustics to web technologies and semantic web. Application deadline May 2nd, 2011. The complete job description can be found at http://www. aalto.fi/en/current/jobs/professors/
Bilkent University Faculty of Engineering Department Chair Department of Computer Engineering The dean of engineering invites nominations and applications to the position of chair of Computer Engineering Department. The department has B.S., M.S., and PhD programs and emphasizes both high quality research and teaching. Currently, there are 21 full time faculty members and 11 instructors. The candidate must have an excellent track record in research, and must qualify for an appointment at the full professor level. A clear demonstration of leadership abilities is expected. Nominations and applications should describe the professional interests and goals of the candidate in leading the department. Each application should include a resume along with the names and contact information of five or more individuals who will provide letters of recommendation. Please send all nominations and applications to: Professor Levent Onural, Dean of Engineering Bilkent University Bilkent, Ankara TR-06800, Turkey Email: [email protected] Information about the university and the department can be found at http://www.cs.bilkent. edu.tr and http://www.bilkent.edu.tr .
Faculty of Engineering Computer Engineering Faculty Position Announcement The Computer Engineering Department at Bilkent University is in the process of hiring additional faculty members and seeks candidates for multiple positions. Appointments may be made at Assistant Professor, as well as at Associate Professor or Full Professor rank for candidates with commensurate experiences and accomplishments. A PhD degree in a related field is required. The department emphasizes both high quality research and teaching. Faculty duties include communi cations o f t h e ac m
Christopher Newport University Lecturer in Computer & Information Science Christopher Newport University invites applications for a Lecturer in Computer and Information Science in the Department of Physics, Computer Science & Engineering starting August 2011. Full details on job and application (Search #8184) at http://hr.cnu. edu/employment.htm. Deadline: 3/31/11. EOE Requirements for this renewable (non-tenuretrack) position include: Ph.D. in Computer Science or related field, effective communication skills, fluency in English, and ability to teach a variety of Computer Science lecture and laboratory courses. Collaborative research and involvement in department service expected. Experience in the sub-field of information systems, in particular databases, cybersecurity, cloud computing, and/or web technologies highly desirable. If successful candidate has not yet completed Ph.D. at the time of appointment, he/she will be hired at Instructor rank.
Dalhousie University
Dalhousie University is located in Halifax, Nova Scotia (http://www.halifaxinfo.com/), which is the largest city in Atlantic Canada and affords its residents outstanding quality of life. The Faculty welcomes applications from outstanding candidates in all areas of Computer Science. In our choice of candidate we will take into account our programs, research and strategic directions. An applicant should have a PhD in Computer Science or related area and be comfortable teaching any core computer science course. Evidence of a strong commitment to and aptitude for research and teaching is essential. Applications should include an application letter, curriculum vitae, a statement of research and teaching interests, sample publications, and the names, email addresses and physical addresses of three referees. The application can include the voluntary Equity Self-Identification form (see below). All documents are to be submitted to the email address below as PDF, Word or Postscript files. Applicants should provide their referees with the URL of this advertisement (see below), and request that they forward letters of reference by email to the same address. Applications will be accepted until April 30, 2011. All qualified candidates are encouraged to apply; however Canadian and permanent residents will be given priority. Dalhousie University is an Employment Equity/Affirmative Action Employer. The University encourages applications from qualified Aboriginal people, persons with a disability, racially visible persons and women. Submission Address for application documents and reference letters: [email protected] Location of this advertisement: www.cs.dal.ca Self-Identification form (PDF): http://hrehp.dal. ca/Files/Academic_Hiring_%28For/selfid02.pdf Self-Identification form (Word): http://hrehp. dal.ca/Files/Academic_Hiring_%28For/selfid02.doc
Edward Waters College Assistant Professor The Math & Science Dept seeks an Assistant Professor. Requires a Ph.D. in Computer Science or related field. Min 3 yrs exp. Details at www.ewc.edu/administration/human-resources/careers. EEO employer.
Assistant Professor
Bilkent University
116
teaching at the graduate and undergraduate levels, research, supervision of theses and other related tasks. The department will consider candidates with backgrounds and interests in all areas in computer engineering and science; in particular the immediate needs of the department are in the fields of software engineering, bioinformatics and scientific computing. Nominations and applications should describe the professional interests and goals of the candidate in both teaching and research. Each application should include a resume and the names and contact information of three or more individuals who will provide letters of recommendation. Please send all nominations and applications to: Professor H. Altay Güvenir Faculty of Engineering Bilkent University Bilkent, Ankara TR-06800, Turkey Email: [email protected] Information about the university and the department can be found at http://www.cs.bilkent. edu.tr and http://www.bilkent.edu.tr .
Limited-term (three year) Assistant Professor position in the Faculty of Computer Science Dalhousie University (http://www.dal.ca) invites applications for a limited-term (three-year) position at the Assistant Professor level in the Faculty of Computer Science (http://www.cs.dal. ca) which currently has 23 faculty members, approximately 300 undergraduate majors and 214 master’s and doctoral students. The Faculty also partners with other Faculties in the University to offer the Master of Electronic Commerce, Master of Health Informatics and Master of Science in Bioinformatics programs, and is an active participant in the Interdisciplinary PhD program.
| april 2 0 1 1 | vol . 5 4 | no. 4
Northern Arizona University Asst/Assoc Professor of Bioinformatics The College of Engineering, Forestry and Natural Sciences invites applications for a tenure track position in Bioinformatics at either the assistant or associate level, to begin August 2011. Applicants should be committed educators able to lead the development of an undergraduate training program in bioinformatics. Please see www.nau.edu/hr (Vacancy#: 558657) for full position announcement. The position requires an earned doctorate in Computer Science or a Biological Science for either the assistant or associate level. Additionally, for the
associate level candidates will be required to have teaching, research and service experience consistent with COFS guidelines for promotion to Associate Professor. Please see http://home.nau.edu/provost/ for more information on COFS guidelines.
TOBB University of Economics and Technology (TOBB ETU) Faculty Positions at Department of Computer Engineering The Computer Eng. Dept. invites applications for faculty positions at all levels. The department offers undergraduate, MS and PhD programs. Candidates must have a PhD in Comp. Science or related fields, and demonstrate a strong potential for research. The language of education at TOBB ETU (www.etu.edu.tr) is Turkish and English. Please send a cover letter, detailed CV, research and teaching statements and names of five references as a single PDF file to [email protected]
University of Connecticut Assistant/Associate/Full Professor Faculty Initiative in Biomedical Informatics Computer Science & Engineering Department, Storrs The University of Connecticut (UConn) invites applications for two tenure-track faculty positions in the Computer Science & Engineering Department in the School of Engineering to join an interdisciplinary, integrated team in biomedical informatics (BMI)
SYMPTOMS:
DEVELOPER SUFFERING FROM BORING PROJECTS, DATED TECHNOLOGIES AND A STAGNANT CAREER.
CURE:
START YOUR NEW CAREER AT BERICO If you are a skilled Software Engineer with passion and expertise in any of the following areas, we invite you to apply. • Cloud Computing
• Web Development
• Application Development
• Mobile Application Development
To learn more about Berico and our career opportunities, please visit www.bericotechnologies.com or email your resume to [email protected]
that spans both the UConn main Storrs and Health Center campuses. The team will be comprised of faculty appointed at all ranks; qualified candidates will be considered for tenure. This BMI team will conduct research, education, and outreach within the Biomedical Informatics Division (BMID) of the Connecticut Institute for Clinical and Translational Science (CICATS: http://cicats.uconn.edu/). Applicants should hold a computer science &/or engineering degree (PhD) with a research interest of software engineering including ultra-large scale system design, operating systems, distributed systems, software architectures, security, databases, and programming languages, and with background and/or interest in applying their research to a clinical, medical, public health, or a related field. At this time, we are NOT seeking individuals with an interest in bio or genome informatics. This position will include classroom teaching and engaging in research efforts in computer science and engineering with applications to medical and clinical informatics. Senior candidates with existing research programs who are interested in moving their work and research team to UConn are strongly encouraged to apply. The University is in the midst of a 20-year, statefunded $2.3 billion initiative to enhance the research mission. In addition to the Schools of Engineering, Medicine, and Dental Medicine and the graduate programs in biomedical sciences, there are several Practice Based Research Networks whose access to electronic medical records creates outstanding research opportunities in Informatics. Applicants will have a primary faculty appointment at the rank of Assistant, Associate, or Full Professor, commensurate with qualifica-
tions, in the Computer Science & Engineering (CSE) Department on UConn’s main campus in Storrs. While your primary appointment will be in the CS&E Department at Storrs, there is an expectation that you will be working with colleagues at the UConn Health Center on a regular basis. Electronic Applications preferred to include curriculum vita and the names and contact information for at least five references, sent in .pdf format (all application materials in one file) to Noreen Wall at [email protected] or [email protected]. Applications may also be sent via mail to Engineering Dean’s Office , 261 Glenbrook Road, Unit 2237, Storrs CT 06269-2237 (Search # 2010023) Review of applications will begin immediately and will continue until the positions are filled. The University of Connecticut is an Equal Opportunity, Affirmative Action employer
USTC-Yale Joint Research Center on High-Confidence Software Multiple Faculty and Post-Doc Positions With permission of the University of Science and Technology of China (USTC), the USTC-Yale Joint Research Center on High-Confidence Software invites applicants for multiple tenure-track faculty positions at all levels and post-doc positions, in the areas of programming languages and compilers, formal methods, security, operating systems, computer architectures, and embedded systems. For more information about the positions, please see http://kyhcs.ustcsz.edu.cn/content/jobs.
KING ABDULAZIZ UNIVERSITY College of Computing and Information Technology Jeddah, Saudi Arabia ASSISTANT, ASSOCIATE, FULL PROFESSOR Computer Science, Information Technology, Information Systems
The departments of Computer Science (CS), Information Technology (IT), and Information Systems (IS) invite applications for multiple faculty positions beginning in Fall 2011 at the level of Assistant, Associate, and Full Professor from candidates working in the areas of Artificial Intelligence, Multimedia, Databases, Digital Security, Decision Support Systems, Data Mining, Internet Technologies, Bioinformatics and Software Engineering. However, outstanding candidates in all areas will be considered. Successful applicants who join the college are expected to develop and maintain a productive program of research, attract and develop highly qualified graduate students, provide a stimulating learning environment for undergraduate and graduate students, participate in the teaching of required core courses, develop elective courses in the candidate’s area of expertise, and contribute to the overall development of the college. A Ph.D. in CS, IT, IS or equivalent is required, with evidence of excellence in teaching and research. Rank and salary will be commensurate with experience. The college is newly created (2007) with more than 850 students. It offers three bachelor degree programs and one master degree program. A doctorate program will start within a year. The college is located in the coastal city of Jeddah which enjoys a low cost of living and a close proximity to the holy city of Makkah (70 km - 44 Miles). To apply, please send your C.V. and all supporting documents via e-mail to: [email protected] University Website: http://www.kau.edu.sa/ College Website: http://computing.kau.edu.sa/
april 2 0 1 1 | vol. 5 4 | n o. 4 | c o m m u n ic at i o n s of t h e acm
117
ACM TechNews Goes Mobile iPhone & iPad Apps Now Available in the iTunes Store ACM TechNews—ACM’s popular thrice-weekly news briefing service—is now available as an easy to use mobile apps downloadable from the Apple iTunes Store. These new apps allow nearly 100,000 ACM members to keep current with news, trends, and timely information impacting the global IT and Computing communities each day.
TechNews mobile app users will enjoy: Latest News: Concise summaries of the most relevant news impacting the computing world Original Sources: Links to the full-length articles published in over 3,000 news sources Archive access: Access to the complete archive of TechNews issues dating back to the first issue published in December 1999 Article Sharing: The ability to share news with friends and colleagues via email, text messaging, and popular social networking sites Touch Screen Navigation: Find news articles quickly and easily with a streamlined, fingertip scroll bar Search: Simple search the entire TechNews archive by keyword, author, or title Save: One-click saving of latest news or archived summaries in a personal binder for easy access Automatic Updates: By entering and saving your ACM Web Account login information, the apps will automatically update with the latest issues of TechNews published every Monday, Wednesday, and Friday
• • • • • • • •
The Apps are freely available to download from the Apple iTunes Store, but users must be registered individual members of ACM with valid Web Accounts to receive regularly updated content. http://www.apple.com/iphone/apps-for-iphone/ http://www.apple.com/ipad/apps-for-ipad/
ACM TechNews
last byte [ c o ntin ue d fro m p. 120] complicated. There’s something in king and pawn endgames called opposition in which, essentially, you can force your opponent’s king to retreat to a weaker position. So, we had the computer gain the opposition in king and pawn endgames. Another thing we did was a simulation of postal codes. They were adopted, and as a result Canada had its first postal strike, because there were all these questions about how computers would cause a loss of jobs.
You also got involved in timetables and scheduling. I was publishing a newsletter on school timetables for a while, but it got to be so much work that I quit. One of my grad students later formed a business around it. The other thing he programmed was how to computerize and control traffic lights. Toronto was the first city in the world to have computerized traffic lights, though if you went to Toronto now and looked at the mess of traffic, you’d never know. We did the world’s first airline reservation system, too. We had the president of Canada come down to watch the demonstration. It was a pretty heady time. You were teaching at this time? I was teaching, and since there was no graduate department, I was teaching in the School of Continuing Studies. Most of the students were businesspeople—actuaries and bankers and insurance people and so on. But eventually you did create a graduate program. Yes. In 1962, we decided we wanted to form a graduate program. It didn’t have a budget, it only had appointments. Presumably, since the people involved in computing also taught in other departments? That’s right. The university said we had to get permission from mathematics, physics, and electrical engineering. But since I wanted to form a department that didn’t need a budget, they didn’t worry about it. Incidentally, our department is now ranked eighth in the world.
“Administration is an occupation where the urgent preempts the important.”
Are you still teaching? I taught until three years ago. I was the oldest person teaching undergraduates. For many years, you taught a course on computers and society. “Computers and Society” always had a wonderful audience. It qualified as a science course for humanists and as a writing course for scientists, so I had a wonderful mix of students. You’ve explored a number of different socioeconomic issues throughout your computing career. When we started with databases, people got very worried about privacy, so Canada appointed a privacy commission and put me on it. As a result of our commission, the first Canadian privacy laws were all passed. But people don’t really give a damn about their privacy these days. They sell it very cheaply. Do you know what privacy is worth? About eight dollars. If I give somebody eight dollars, they’ll fill out the longest form—name, credit history. Or just offer them a discount card. In the 1960s, you were involved in a United Nations commission on what computing could do for developing countries. This was the age of big mainframes, so nobody imagined that developing nations could use them at all. I remember coming to the World Bank in Washington, D.C. Protocol required that we had to be met by the head, but in about three minutes we’d passed three levels down the hierarchy—one person would introduce us to his assistant, who would introduce us to his assistant, and so on. But I remember being greeted by the head of the World Bank, and he said, “Now tell me, what about this boondoggle of yours?”
The 1971 report that grew out of your United Nations research, The Application of Computer Technology for Development, was a bestseller. We had a big meeting in Bucharest, and we had written a draft report that was going to be presented to the General Assembly. It was quite technical, and all the other people were from human resources and management, and they just tore into us at Bucharest. They said, “You say nothing about training and management.” We thought about it and said, “They’re right.” So, in six weeks we completely rewrote the report, and we sent it to the U.N., and it turned out to be a bestseller. In fact, you’re cited in the Oxford English Dictionary. [J.N.P. Hume] and I wrote a book in 1958 called High-Speed Data Processing, and the Oxford English Dictionary found 11 words in it, like loop, that had new meanings. So, we’re cited 11 times in the Oxford English Dictionary. A friend of mine said, “You’re immortal.” And you know, it’s a lot cheaper than a gravestone. On a personal note, you were married to a writer for many years. My wife Phyllis wrote 18 books of poetry and science fiction. She died about a year and a half ago. I’ve always said that she was a humanist interested in science, and I was a scientist interested in humanity. Nowadays I’d guess people are more concerned with their Google results. One day my wife and I were arguing about how long you cook fish in the microwave. She’s the cook, of course, but I’m the master of the microwave. So she goes to Google and comes back and says it’s 4.5 minutes per pound, end of discussion. Of course, when we were arguing, I didn’t go to Google. She went to Google! I understand you’ll turn 90 on March 27. My department is making a party for me, and they asked me to give a talk. I’m calling it “Chiefly About Computers.” Leah Hoffmann is a Brooklyn, NY-based technology writer.
© 2011 ACM 0001-0782/11/04 $10.00
april 2 0 1 1 | vol. 5 4 | n o. 4 | c o m m u n ic at i o n s of t h e acm
119
last byte
DOI:10.1145/1924421.1924447
Leah Hoffmann
Q&A The Chief Computer Kelly Gotlieb recalls the early days of computer science in Canada. Born in 1921 in Toronto, Calvin “Kelly” Gotlieb is known as the “Father of Computing” in Canada. After receiving his Ph.D. in physics from the University of Toronto in 1947, Gotlieb co-founded the university’s Computation Center the following year and worked on the first team in Canada to build computers. In 1950, he created the first university course on computer science in Canada and the first graduate course the following year. Currently Professor Emeritus in Computer Science at the University of Toronto, Gotlieb is also known for his work on computers and society.
You’re still involved with the ACM, most notably as co-chair of the Awards Committee. I’ve been a continuous member of ACM since 1949. I was also in IFIP [International Federation for Information Processing] for many years. When IFIP was formed, I went to Paris as the Canadian representative. Isaac Auerbach asked if I would be president, but I didn’t want to be. I’ve always said ad120
communi cations o f t h e acm
Chief computer? They didn’t ask me, they just deposited this title on me.
Kelly Gotlieb in a conference room at ACM’s headquarters earlier this year.
ministration is an occupation where the urgent preempts the important. Let’s talk about your career. You completed your Ph.D. in physics at the University of Toronto. I got my Ph.D. in 1947. My Ph.D. is classified because it was on secret war work. It’s never been declassified, but it’s really not a secret anymore. We put radios in shells. Soon, though, you became involved with early efforts in computers. Yes, they had formed a computing team in the university and there was an IBM installation with punched cards. Eventually, the university set up something called the Computation Center. I
| april 2 0 1 1 | vol . 5 4 | no. 4
In the 1950s, the Computation Center purchased a computer made by Ferranti in England, a copy of the Manchester Mark I. We got the Ferut in 1951. It was the second computer ever sold in the world. UNIVAC was the first. Ferut was in a room about three times the size of this conference room [which measures 16’ x 16’]. It had about 10,000 vacuum tubes, and three or four would burn out every day. So, the engineers would take over at night and change the burned-out tubes, and we would run it. The machine was quite unreliable at first. For example, if you multiplied two numbers, you were advised to do it twice, and if you got the same answer, ok. If not, you did it a third time and took the best two out of three to get the product. I remember all these articles in the paper called it a “giant brain.” Well, it was giant all right, but it was about one-hundredth as powerful as your laptop, maybe one thousandth. Eventually, we got the Ferut working pretty well. I had a graduate student who was the chess champion of Canada, so I decided we should see if the computer could play chess. But Ferut didn’t have enough storage to play a whole game, so we only played endgames—king and pawn endgames, which are very [con tinu ed on p. 1 1 9 ]
Photogra ph by bria n greenberg
You used to be editor-in-chief of this magazine. I was editor-in-chief first of Communications and then, later on, of the Journal. When Alan Perlis started Communications [in 1957], he asked me to be the editor of the business section. Then Perlis was elected president of ACM, and the editorial board asked me if I would be editor-in-chief. I think I was editor-in-chief for two years because the editor-in-chief of the Journal, Franz Alt, retired, and they asked me to switch over.
was too young to be the director, so they gave me the title of chief computer.
From the mind to the marketplace. Intel works with educators worldwide to revitalize how students think about technology. Because encouraging new ideas fuels innovation. Collaborating. Investing. Changing the world. Learn more at Intel.com/software/academicprograms