This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
FEATURES Preventing Piracy While Preserving Privacy 16 by Michael O. Rabin and Dennis E. Shasha
The security approach presented here is a privacy-preserving, flexible, antipiracy solution that does not suffer from “Break Once, Run Everywhere.”
Reestablishing Trust in the Web 28 by Amir Herzberg and Ahmad Jbara
The TrustBar browser extension provides improved security, identification, and trust indicators.
Extended Visual Cryptography Schemes 36 by Daniel Stoleru
Visual cryptography is a graphical form of information concealing.
Inside the SmartDongle USB Security Key 40 by Joel Gyllenskog
Joel lifts the hood on his USB security key.
Developing JSR-168 Portlets 44 by Ted O’Connor and Martin Snyder
The JSR-168 portlet specification defines APIs for building applications viewed inside portal frameworks.
The Eclipse Test and Performance Tools Platform 48 by Andy Kaylor
The Eclipse Test and Performance Tools Platform provides open Standards for interoperability.
The Mac’s Move to Intel 52 by Tom Thompson
Steve Jobs dropped a bombshell when he told software developers that the Macintosh will switch from PowerPC to Intel x86 processors.
Calling C Library DLLs from C# 58 by Shah Datardina
Need to utilize legacy software? Here are techniques for calling unmanaged code written in C from C#.
Removing Memory Errors from 64-Bit Platforms 63 by Rich Newman
It’s crucial to address potential memory errors before porting to 64-bit platforms.
Pointer Containers 68
FORUM
by Thorsten Ottosen
Smart containers are useful and safe utilities that can lead to flawless object-oriented programming.
EDITORIAL 8 by Jonathan Erickson
EMBEDDED SYSTEMS PROGRAMMING Using Hardware Trace for Performance Analysis 71
LETTERS 10 by you
by Michael Lindahl
Michael examines embedded-systems performance-analysis techniques, and discusses some of their inherent limitations.
COLUMNS Programming Paradigms 75
Chaos Manor 82
by Michael Swaine Ringtones are where the money is — for now anyway.
by Jerry Pournelle Jerry looks back when inventing the future, and looks forward to the world of 64-bit computing.
Embedded Space 78
Programmer’s Bookshelf 85
by Ed Nisley Large, complex embedded systems have more places for things to go wrong.
by Michelle Levesque Michelle examines Greg Wilson’s Data Crunching: Solving Everyday Problems Using Java, Python, and More.
http://www.ddj.com
Dr. Dobb’s Journal, October 2005
DR. ECCO’S OMNIHEURIST CORNER 12 by Dennis E. Shasha NEWS & VIEWS 14 by DDJ Staff PRAGMATIC EXCEPTIONS 26 by Benjamin Booth OF INTEREST 87 by DDJ Staff SWAINE’S FLAMES 88 by Michael Swaine NEXT MONTH: In November, we’re all over the place when we cover distributed computing.
3
D R .
D O B B ’ S
O N L I N E
C O N T E N T S O N L I N E
E X C L U S I V E S
T H E
N E W S
S H O W
http://www.ddj.com/exclusives/
http://thenewsshow.tv/
.NET—The Decompiler Will Get You
Dual-Core Duel
How can you avoid opening up your intellectual property to intruders?
Reverse Engineering John Blattner on analyzing real-time embedded systems.
Windows Vista: The Developer Perspective John Montgomery on what features Windows Vista offers developers.
W I N D O W S / . N E T
The Seven Touchpoints of Secure Software Security must be built-in throughout the development lifecycle.
http://www.ddj.com/topics/windows/
Windows Security Investigating software and source-code theft.
Windows/.NET Q&A How do you create or modify strings composed of several strings?
D O T N E T J U N K I E S http://www.dotnetjunkies.com/
Selecting, Confirming, & Deleting Multiple Checkbox Items Here’s how to select and delete across pages in single batch deletes.
Web Hosting for ASP.NET 2.0 Beta 2 Microsoft sanctioned web hosting providers that deliver beta services.
B Y T E . C O M http://www.byte.com/
Media Lab Autodesk, Adobe, and the art of the interface.
Developing for Cell Phones Examining the Qualcomm BREW SDK, and exploring ringtone conversion.
T H E
P E R L
J O U R N A L
http://www.tpj.com/
A Music Player Remote Control in Perl/Tk This search facility hooks into WinAmp to automatically play songs.
Managing Documents Using a SOAP::Lite Daemon This document-management system decouples interfaces and back-end logic.
RESOURCE CENTER As a service to our readers, source code, related files, and author guidelines are available at http://www.ddj.com/. Letters to the editor, article proposals and submissions, and inquiries should be sent to [email protected]. For subscription questions, call 800-456-1215 (U.S. or Canada). For all other countries, call 902563-4753 or fax 902-563-4807. E-mail subscription questions to [email protected], or write to Dr. Dobb’s Journal, P.O. Box 56188, Boulder, CO 80322-6188. If you want to change the information you receive from CMP and others about products and services, go to http://www.cmp.com/ feedback/permission.html or contact Customer Service at Dr. Dobb’s Journal, P.O. Box 56188, Boulder, CO 80322-6188. Back issues may be purchased prepaid for $9.00 per copy (which includes shipping and handling). For issue availability, send e-mail to [email protected], fax to 785-838-7566, or call 800-444-4881 (U.S. and Canada) or 785838-7500 (all other countries). Please send payment to Dr. Dobb’s Journal, 4601 West 6th Street, Suite B, Lawrence, KS 66049-4189. Digital versions of back issues and individual articles can be purchased electronically at http://www.ddj.com/.
WEB SITE A C C O U N T A C T I VA T I O N Dr. Dobb’s Journal subscriptions include full access to the CMP Developer Network web sites. To activate your account, register at http://www.ddj.com/registration/ using the web ALL ACCESS subscriber code located on your mailing label.
EDITORIAL MANAGING EDITOR Deirdre Blake SENIOR PRODUCTION EDITOR Monica E. Berg ASSOCIATE EDITOR Della Wyser ART DIRECTOR Margaret A. Anderson SENIOR CONTRIBUTING EDITOR Al Stevens CONTRIBUTING EDITORS Bruce Schneier, Ray Duncan, Jack Woehr, Jon Bentley, Tim Kientzle, Gregory V. Wilson, Mark Nelson, Ed Nisley, Jerry Pournelle, Dennis E. Shasha EDITOR-AT-LARGE Michael Swaine PRODUCTION MANAGER Stephanie Fung INTERNET OPERATIONS DIRECTOR Michael Calderon SENIOR WEB DEVELOPER Steve Goyette WEBMASTERS Sean Coady, Joe Lucca AUDIENCE DEVELOPMENT AUDIENCE DEVELOPMENT DIRECTOR Kevin Regan AUDIENCE DEVELOPMENT MANAGER Karina Medina AUDIENCE DEVELOPMENT ASSISTANT MANAGER Shomari Hines AUDIENCE DEVELOPMENT ASSISTANT Melani Benedetto-Valente MARKETING/ADVERTISING ASSOCIATE PUBLISHER Will Wise SENIOR MANAGERS, MEDIA PROGRAMS see page 86 Pauline Beall, Michael Beasley, Cassandra Clark, Ron Cordek, Mike Kelleher, Andrew Mintz MARKETING DIRECTOR Jessica Marty SENIOR ART DIRECTOR OF MARKETING Carey Perez DR. DOBB’S JOURNAL 2800 Campus Drive, San Mateo, CA 94403 650-513-4300. http://www.ddj.com/ CMP MEDIA LLC Gary Marshall President and CEO John Day Executive Vice President and CFO Steve Weitzner Executive Vice President and COO Jeff Patterson Executive Vice President, Corporate Sales & Marketing Leah Landro Executive Vice President, Human Resources Mike Mikos Chief Information Officer Bill Amstutz Senior Vice President, Operations Sandra Grayson Senior Vice President and General Counsel Alexandra Raine Senior Vice President, Communications Kate Spellman Senior Vice President, Corporate Marketing Mike Azzara Vice President, Group Director of Internet Business Robert Faletra President, Channel Group Tony Keefe President, CMP Entertainment Media Vicki Masseria President, CMP Healthcare Media Philip Chapnick Vice President, Group Publisher Applied Technologies Paul Miller Vice President, Group Publisher Electronics Fritz Nelson Vice President, Group Publisher Network Computing Enterprise Architecture Group Peter Westerman Vice President, Group Publisher Software Development Media Joseph Braue Vice President, Director of Custom Integrated Marketing Solutions Shannon Aronson Corporate Director, Audience Development Michael Zane Corporate Director, Audience Development Marie Myers Corporate Director, Publishing Services
American Buisness Press
6
Dr. Dobb’s Journal, October 2005
Printed in the USA
http://www.ddj.com
EDITORIAL
Salary Surveys & Programmer Pay
A
rnold Schwarzenegger, erstwhile movie star and current governor of California, has been giving me headaches. For starters, I’ve been diagnosed as suffering from PAS, short for “Post Arnold Syndrome,” after downloading the Governator ringtone and seeing both Kindergarten Cop and Conan the Barbarian on the same day. Symptoms include nausea, nightmares, and an uncontrollable urge to pump serious iron. But the real grief Schwarzenegger has brought home is the revelation that he’s been earning more than $1 million a year moonlighting as a magazine editor. When this news broke, the Dr. Dobb’s Journal editorial staff descended on my office, with the equal-pay advocates jostling for head of the line. Not to be left out, Margaret Anderson tried to change her job title from “Art Director” to “Art Editor,” while Senior Production Editor Monica Berg, who grew up in Germany, made her case because she sounds like Schwarzenegger. Jeez, that’ll teach me to wander into the office. However, the good news is that U.S. compensation for software developers seems to be on the rise, at least, according to a recent survey conducted by Foote Partners (http://www.footepartners.com/). Granted, most programmers (or editors) won’t be breathing that rarified Schwarzenegger ether anytime soon. Still, in the first six months of 2005, pay for noncertified skills inched up 2.1 percent for application developers; 4.3 percent database developers/administrators; 5.1 percent networking/internetworking; and 8.2 percent for operating systems experts. For categories in certified tech skills, salaries were up 3.8 percent for web/e-commerce development; 2.3 percent for application development/programming languages; and 0.7 percent for database administration. In Foote Partners parlance, skills-related pay is typically paid as cash bonuses or embedded in base salary as an adjustment for the presence of a dominant vendor or technology skill critical to the job. For example, the salary for an Oracle database administrator, Linux systems administrator, or .NET developer, can be different than what employers might provide for generic “systems administrator”, “programmer”, and “developer” job titles. According to Foote Partners head-ofresearch David Foote, the redefinition of IT jobs is currently so pervasive that traditional job titles are becoming increasingly meaningless. Instead of overhauling job titles, employers are finding it easier to differentiate workers with the same job titles by recognizing technical skills fundamental to their jobs, putting a market value on those skills, and adjusting base pay accordingly. So what’s going on? Are employers suddenly opting for kinder and gentler employment practices? Hardly. For one thing, Foote sees a return to hiring as the economy has strengthened. For another, many companies were stung by botched offshore outsourcing projects, particularly when they failed to keep key people who had both technical skills and an understanding of the business and industry. Consequently, says Foote, companies are trying to do a better job of hiring and retaining talent with specific technical skills and business and industry experience — and reinvesting in onshore application development. He adds that as Sarbanes-Oxley compliance related work tapers off at many companies, the need for complex combinations of industry knowledge and technical skills is rising. “The shift is on innovation and new products,” Foote says. The survey, which queried 50,000 IT professionals, found that the hottest noncertified skills (that is, those that exhibited 25 percent or more growth in skills pay over the last 12 months) focused on SQL Server, WebSphere, Active Server Pages, SQL Windows, and .NET. Likewise, the highest paying noncertified skills involved project-level security, RAD/Extreme Programming, VoIP, Gigabit Ethernet, IBM WebSphere, Oracle database and applications, and SQL Windows. The Foote Partners findings are more or less confirmed by Information Week’s 2005 salary survey conducted earlier this year. (In the spirit of disclosure, Information Week is published by CMP Media, which also publishes Dr. Dobb’s Journal.) Information Week (http://www .informationweek.com/) found that the highest salaries were commanded by web-security experts, followed by wireless infrastructure personnel. However, in the Information Week survey, which queried more than 12,000 IT professionals, salaries for application developers was more or less flat, as compared to 2004, although networking jobs were paying slightly more. Within the next couple of months, Software Development magazine (another of DDJ’s sister publications; http://www.sdmagazine.com/) plans on publishing the results of its annual developer salary survey. It will interesting to see how that matches up with Foote Partners and Information Week. Until then, you can find me either lifting weights at the gym or in my neighborhood movie theater, setting a personal record of seeing Hercules in New York, Terminator 2: Judgment Day, and Conan the Destroyer— all in the same day.
10.3.9: No Optimization, 22:34:59; Wirth’s Validity Check, 08:06:42; Permutation Vector, 03:49:08; both optimizations, 01:53:54; Total Time 36:24:45 (131,085 seconds). The Mac at ~63 percent MHz gets ~98 percent of the performance. F.C. Kuechmann [email protected]
T
D
2
CE
S NT
2
Duff’s Device Dear DDJ, Reading Ralf Holly’s article “A Reusable Duff Device” (DDJ, August 2005) brought back memories of doing much the same thing around 1981. I was working on a proprietary system that had a limited instruction set and functionality. It’s been many years, but my solution for unrolling a loop was to use a series of reentrant calls with an instruction such was an output to an I/O device executed just before the primary return. Calls queued up like that allowed me to do 2, 4, 8, 16, and so on executions of the instruction that needed to be repeated. As I recall, we did not have an adder or increment instruction, so looping was a bit of a problem. Gary G. Little [email protected] Optimal Queens Dear DDJ, I enjoyed the article “Optimal Queens” by Timothy Rolfe (DDJ, May 2005), but it seems like a good time for a food fight. I made a few mods to Timothy’s Queens.c code to adapt it for a Macintosh CodeWarrior C console program that automates testing using each of the four optimizations listed below for 4–18 queens. Timothy must enjoy sitting around waiting for each test to finish so he can enter the next set of n- queens and optimization parameters [24 times for n-queens 12–18], then wait again. I globalized a few vars, paired allocs with frees inside nested for loops, and let the program handle the permutations. Because times start exceeding one second at 13 –14 queens, my times can be directly compared with Timothy’s without too much picking of nits or other amusements. His Dell desktop computer with a 2GHz Pentium 4, OS not specified: No Optimization, 20:59:51; Wirth’s Validity Check, 08:51:50; Permutation Vector, 03:52:36; both optimizations, 01:54:54; Total Time 35:39:11 (128,351 seconds). My Macintosh 2003 MDD Dual 1.25-GHz G4, OS X 10
Editor’s Note: F.C.’s modified Queens.c code is available electronically; see “Resource Center,” page 4. Licensing Again Dear DDJ, Thanks to Jim Wiggins for his detailed and interesting note (“Letters,” DDJ, March 2005). I do not object to licensing software professionals in one area: embedded computing. Many of the cases that are cited below are exactly that kind of development, and Ed Nisley does a fine job every month of describing exactly how different a world that is. Licensing for “embedded space” would need to include EE training as well as computer science, and some mechanical engineering and materials science couldn’t hurt. I think this kind of training and education is well beyond that of the typical software engineer. In fact, while I don’t know about automotive engineering, much of the rest of the embedded industry does have special requirements levied on the way they do software. Some of the strongest requirements are imposed upon nuclear power stations, and I have some experience with those. They cannot use C++, Ada, Java, and there are restrictions on the use of C. (I can provide references but don’t have the time right now.) Why? Because to assure predictability of the software, dynamic storage allocation is disallowed. There are also strictly technical benefits, such as it being easier to burn code into PROMs and stuff, and know what box it is in. Software and its algorithms must fit into a structured reliability discipline, and the reliability engineer makes the call on whether a change of means or algorithm is acceptable. Indeed, this kind of really rigorous structure is absent even in NASA’s work, and is certainly missing in most defense and FAA aerospace development. They need to get things to work, yes, but the reliability element is often hit or miss because (at the systems level at prime) contractors, quality control folks, and testers have little clout — [they are] seen as impediments to the company getting paid. FAA’s government people have a strong hand, but they can get burnt if the company has good political ties. Nevertheless, NASA and FAA demand detailed, written specifications and formal Dr. Dobb’s Journal, October 2005
test plans and procedures, all under ruthless change control. Defense is supposed to work that way, but it depends upon the program and the character of the SPO running it. A lot of this is criticized in Congress and in the trade press, lamenting how long it takes them to get anything done in comparison to their apparently fleet-footed commercial brethren, along with much lathered on opinion about the superiority of free-market versus government-run programs. I strongly suspect they are swift because they can afford to not be rigorous. Could things be done smarter? Of course, they can be: Consider the Shuttle program versus the Mars Rovers. Alas, that’s pitting NASA mainstream against Caltech-JPL. And don’t mention SpaceShipOne: As admirable as that project is, and as supportive of that effort I am, its scope is far more limited than what NASA must do and it builds upon a lot of work originally paid for by government. When someone needs to launch a satellite as part of a tsunami-warning network for the Indian Ocean, Scaled Composites can’t do it — not soon, anyway. There may be other niches in software development that admit comparable discipline. I should think that software running securities trading and monetary exchange arbitrage must be of necessity right on, considering the amounts of money that can be lost in a mistake. In principle, there ought to be a high-reliability version of Windows or at least Windows NT out there. There’s a question of how to pay for it, however. High-reliability Linux? There ought to be that, too, and perhaps there will be. It’s being used in quasi-embedded situations more and more. As negative as I might sound, there are successes to tout. Relational database systems sell themselves to their customers primarily because of their design for data safety, reliability, and ability to recover from all kinds of misfortune, man- and nature-made. I hope I conveyed what I think to be a collective frustration on the part of software users of all kinds with how long it takes to do anything in software. Some of that is, as I tried to express, part of the nature of the beast because it demands being very precise about things people normally aren’t and don’t have to be. But some of it is limitations of our own technology and smarts, stuff that, apart from important hardware assists, really hasn’t changed since 1980. I simply do not see how licensing will get us to fix that. Jan Galkowski [email protected] DDJ http://www.ddj.com
DR. ECCO’S OMNIHEURIST CORNER
Calculation in the Narrows Dennis E. Shasha
E
cco was invited to a submarine base, where he heard about a new naval computational technique called “parallel local computation” (PALC). He was told it was very useful for high security applications, but they couldn’t tell him which. Instead, they presented him with the following sanitized version of a problem commanders face: A group of people have lined up in a long narrow corridor that allows only two people to be side-by-side. Each has been given a number between 1 and 10,000. After some number of rounds (described below), each person is to report a whole number that is no more than one above or one below the mean of the numbers. For example, if there are four people and they have been given the numbers 2, 2, 2, 3; then it is fine if some report 2 and some report 3 because the mean is 2.25. During the course of this calculation, no person should have to remember more than five significant digits in total including those to the left and those to the right of the decimal point. Start: P0 P1 P2 P3 P4 Next: P1 P2 P3 P4 P0 Next (so, P1 and P2 can exchange information as can P0 and P3): P2 P3 P4 P1 P0 Next: P3 P4 P2 P1 P0 Next: P4 P3 P2 P1 P0 End of round Next: P4 P3 P2 P1 P0 Next (now P4 leads): P3 P2 P1 P0 P4 Next: P2 P1 P0 P3 P4 Next: P1 P0 P2 P3 P4 Next: P0 P1 P2 P3 P4 Next: P0 P1 P2 P3 P4 end of second round
Figure 1: People moving in a corridor. Dennis, a professor of computer science at New York University, is the author of four puzzle books. He can be contacted at [email protected]. 12
Because the corridor is so narrow, the people are going to move as shown in the example of Figure 1 for five people P0,…,P4. Only when two people are sideby-side can they exchange information. Note that in the first round, P0 encounters P1 and P3; P1 encounters P0, P2, and P4; and P2 encounters P1 and P3. In each pairwise encounter, people can exchange numbers and do any calculation they like with those numbers. However, the number that each person retains after an encounter may contain no more than five digits. Warm-Up: Suppose that two people A and B meet and that initially A’s number x is greater than B’s number y. Suppose the mean of the entire collection, while unknown to A or B, is denoted M_all. Consider the initial error of A and B to be the maximum of |x – M_all| and |y – M_all|. What can A and B do that will reduce their error without preventing them from calculating the mean correctly? Solution to Warm-Up: They calculate the mean of x and y, denoted M_xy. A substitutes M_xy for x, and B substitutes M_xy for y. If M_all is greater than M_xy, then |M_all – y| was the error before, so the error is reduced. Here is a proof: Suppose that x<M_all, then the order of elements is y<M_xy<x <M_all, so the conclusion follows. If M_all<x, then the order must be y<M_xy< M_all<x. Because M_xy is the mean of x and y, (M_xy – y) =(x – M_xy), which is greater than both x–M_all and M_all –M_xy. So the error is reduced and the sum of all numbers stays the same. We have ignored rounding so far. If rounding is required, then allow A, beStart: P0 P1 P2 P3 P4 Next: P1 P2 P3 P4 P0 Next (reverse): P4 P3 P2 P1 P0 Next: P2 P1 P0 P3 P4 Next: P0 P1 P2 P3 P4 Next: P3 P4 P2 P1 P0 Next: P4 P3 P2 P1 P0 Next: P0 P1 P2 P3 P4 End of round
Figure 2: List reversal. Dr. Dobb’s Journal, October 2005
cause it initially has a higher value, to take a value above the mean, and B to take a value below the mean in such a way that A loses as much as B gains. This guarantees that the mean of the results equals the mean of the inputs. End of warm-up. The warm-up suggests the following possible solution. Each exchange rounds to the nearest whole number (always keeping the sum of all numbers constant). For example, in an exchange between 53 and 42, the resulting numbers would be 48 and 47. So the initially greater number would be given the higher integer. 1. Might there be some configuration of a dozen people and their initial values that requires more than 20 rounds in this case? What is the maximum necessary? The answer to this might suggest that using a number including one decimal digit would help. 2. If one does use a decimal digit for all numbers below 10,000 and there are at least 12 people in the hallway, then will any initial configuration require more than 20 rounds? What is the maximum necessary? 3. Suppose there could be only three people in the hallway. (I use three because two would have a solution in one exchange.) Then will any initial configuration require more than 20 rounds? 4. The protocol now requires that the person having the higher initial value gets the higher value after rounding. What if the assignment to higher or lower occurs randomly (with equal probability for the initially higher and initially lower)? Would this change the answer to question 1? Here is an open problem. Consider a protocol in which after every exchange (rather than after every round), the list reverses itself. So the protocol looks like Figure 2 in a typical round. Even though each round is more expensive, I cannot find a case involving 10 or more people in which the protocol takes more than six rounds provided one can use one digit to the right of the decimal point. Can you find a limit in terms of the number of rounds required? For the solution to last month’s puzzle, see page 80. DDJ http://www.ddj.com
SECTION
A
MAIN NEWS
Dr. Dobb’s
News & Views
Unified EFI Forum Established The nonprofit Unified EFI Forum has been formed to manage the evolution and promotion of the Extensible Firmware Interface (EFI) specification (http://www.uefi .org/). The EFI spec defines an interface that hands off system control from the preboot environment to the operating system. In short, EFI is a modern replacement for the BIOS. (For more information on EFI, see “The Extensible Firmware Interface,” by Craig Szydlowski; DDJ, September 2005). Founding members of the EFI Forum include AMD, American Megatrends, Dell, HP, Intel, IBM, Insyde Software, Microsoft, and Phoenix Technologies. The Forum will publish the EFI 1.10 specification by the end of 2005. It will also make available test suites for the UEFI spec based on contributions from member companies.
Eiffel Standardized by ECMA A Standard for the Eiffel programming language has been adopted by the General Assembly of ECMA International (http:// www.ecma-international.org/publications/ standards/ECMA-367.htm). ECMA’s charter is to evaluate, develop, and ratify telecommunications and computing Standards. The Eiffel language, originally designed by Bertrand Meyer, is available through implementations by Meyer’s Eiffel Software (http://www.eiffel.com/) and other providers. ECMA Standardization guarantees total, line-by-line compatibility between different implementations. The specification also has been submitted for ISO (International Standards Organization) approval as part of ECMA’s “fast-track” ISO status.
Grid Security Examined The Enterprise Grid Alliance, an open consortium focused on developing and promoting enterprise grid solutions, has released its Enterprise Grid Security Requirements document, which identifies a set of requirements for grid security (http:// www.gridalliance.org/en/workgroups/ GridSecurity.asp). Developed by the EGA Grid Security Working Group, the document builds on the previously released EGA Reference Model by identifying the 14
unique security requirements of commercial enterprise grid computing. It is intended as a guide for users, Standards organizations, and vendors.
IBM Launches Academic Licenses IBM has launched an program that provides universities with free access to a range of emerging technologies developed in IBM’s R&D labs (http://www.developer.ibm.com/ university/scholars/). The goal of the “Academic License” program is to help train, educate, and accelerate development skills around open Standards-based technologies. University professors can use the technologies to build course curriculum. Professors will have access to more than 25 technologies, including games and simulations, to accelerate skills around IBM on- demand offerings including open Standards technologies, such as Java and Eclipse, tools to enable grid computing. MIT and Harvard’s Division of Engineering and Applied Sciences will be the first universities to participate in the program. The program is open to academic institutions participating in IBM’s Academic Initiative.
Open Authentication Moves Forward OATH, the initiative for Open Authentication, has released Version 1.0 of its OATH Reference Architecture, which provides a framework for open authentication (http://www.openauthentication.org/ reg.asp). The document’s client framework section addresses topics of authentication methods, innovation in authentication tokens for multifunction purposes or mobile devices, token interfaces for one-time password tokens, and authentication protocols. The validation framework covers interfaces for protocol and validation handlers, and protocols used by applications to authenticate user credentials. OATH will develop a framework to let vendors develop Standards-based provisioning protocols and evaluate the need to standardize on one or more provisioning protocols to target specific credential types. OATH is a collaboration of device, platform and application companies, with the goal of fostering strong authentication across networks, devices, and applications. Dr. Dobb’s Journal, October 2005
DR. DOBB’S JOURNAL October 1, 2005
Secure Voice-over-IP Phil Zimmermann, developer of Pretty Good Privacy (PGP) e-mail encryption software, is now working towards building similar security tools for Internetbased Voice-over-IP (VoIP). Codenamed “zFone,” the prototype Zimmermann demonstrated at the Black Hat Briefings security conference scrambles information until it reaches its destination. To unscramble the data, recipients must be running a program that uses the same protocols. According to Zimmermann, zFone interoperates with any standard SIP phone. Zimmermann’s prototype is based on Shtoom, a VoIP client written in Python. For more information, see http://www .philzimmermann.com/EN/zfone/index.html.
UC Berkeley and Yahoo Partner for Research In a first-of-its-kind partnership between a public university and private Internet company, Yahoo Research Labs and the University of California at Berkeley, are launching a joint lab to explore Internet search technology, social media (photos, video, music, audio, and text obtained from personal, public, or community sources, then shared, referenced, or remixed in ways that help foster social relations), and mobile media. Most intellectual property developed at the lab will be shared jointly between UC Berkeley and Yahoo. The founding director of Yahoo Research Labs-Berkeley is Marc Davis, an assistant professor at UC Berkeley’s School of Information Management and Systems.
IBM Steps Up Open-Source Java Efforts IBM has stepped up its efforts to see an open-source, compatible, and independent implementation of the Java 2 Platform Standard Edition 5 (J2SE 5) by participating in (and eventually contributing code to) the Apache open-source Harmony project. Among other goals, the Harmony project was launched to create an open- source modular runtime (virtual machine and class library) architecture to allow independent implementations to share runtime components. http://www.ddj.com
Preventing Piracy While Preserving Privacy A flexible antipiracy solution MICHAEL O. RABIN AND DENNIS E. SHASHA
I
n the battle between pirates and content providers, the pirates are winning. Movies appear on bootlegged DVDs and on peer-to-peer networks even before they appear in theatres. Expensive software can be obtained at rock-bottom prices without royalties flowing to the authors. Pricey technical counter-measures are easily defeated. In 2002, a multimillion dollar CD-based antipiracy scheme developed by Sony, was defeated by writing on the outer rim of protected CDs with a magic marker. License servers are routinely cracked. Total losses, while hard to calculate exactly, may amount to tens of billions of dollars per year. Content vendor reactions vary from hand-wringing to threats of lawsuits to hope for yet a better protected medium. Platform vendors such as Intel, Microsoft, Apple, and Panasonic are more ambivalent. If one platform prevents piracy, will consumers choose another? This proposition has not been tested but platform vendors have been cautious so far. Some content vendors even view piracy as a kind of loss leader. A few years ago, a scientist from a leading vendor, for example, announced to an expert panel (in substance): “Piracy doesn’t worry us. The best thing that can happen to us is that someone buys our software, next that someone steals it, and the worst that someone buys our competitors’ software.” More recently, however, a scientist from the same company said to one of us: “We can no longer afford to sell just one copy in country X and see the rest stolen.” Frustrated with platform vendor inactivity, content vendors have chosen to use law enforcement and the courts to stop piracy—261 lawsuits were filed on just one day in 2003, for example. This has met with some success, but only in some countries and in a few cases. Even then, there is something distasteful about prosecuting librarians and 12-year-old children. There must be a better way. Look beyond computer software and beyond movies to driving behavior. When faced with speed bumps, you slow down. You don’t need police to tell you to. Your butt or your passengers’ discomfort will ensure you don’t speed. The underlying philosophy behind our solution is to implement a software speed bump to combat piracy. Our solution requires no police and preserves the privacy of everyone, even pirates. Michael is a professor at Harvard University and a recipient of the Turing Award. Dennis is a professor at New York University and is the puzzle columnist for DDJ. They can be contacted at [email protected] and [email protected], respectively. 16
As a matter of terminology, we use the term “software” (or simply “content”) to indicate any digital content, such as computer programs, computer games, audio and video, and so on. Starting Points We start with two assumptions, the first moral and the second technical.
“The underlying philosophy behind our solution is to implement a software speed bump to combat piracy” The moral assumption is that stealing is wrong, even if it’s easy. A screenplay writer friend once said she doesn’t condone stealing except of computer software. It weighs nothing. The computer copies it. Some big corporation suffers. What could be wrong? How would she feel if someone stole her screenplay? Oh, that’s different. But it isn’t, big corporation or not. She wouldn’t feel it’s okay to steal a car from an automobile factory. On the other hand, the punishment should fit the crime. Ideally, software pirates should get no benefits from pirating but there should be no jail sentences or onerous fines. The more successful we are at preventing piracy by technical means, the less the need for law enforcement and high penalties. (As a matter of legal principle, if the odds of getting caught are 1 in 1000, then the penalty should be 1000 times the profit to render piracy unattractive. We avoid high penalties by reducing the profit from piracy to virtually zero.) Think speed bump again. The technical assumption is that User Devices (computers or other software playing devices) have a secure clock and software called the “Supervising Program” that cannot be changed and that is given a periodic time-slice when it can run. “Secure” means that even the owner of the device cannot alter the progress of
Dr. Dobb’s Journal, October 2005
http://www.ddj.com
(continued from page 16) the clock, alter the Supervising Program, or intervene with its actions. This assumption lies within the technical state of the art: • In a paper written in 1992, Lampson, Abadi, Burrows, and Wobber [1] suggested a way to load an operating-system kernel reliably using a bootstrapping method based on a single cryptographic key. Continuous checking of the integrity of the kernel or of our Supervising Program can be achieved by similar means. In these days of inexpensive hardware, there are other possibilities —IBM, HP, and Dell already ship computers that include a so-called “Trusted Platform Module,” a coprocessor providing a feature called “remote attestation” [2]. The Trusted Platform Module can guarantee (and even promise to other devices) that a certain operating system and a certain BIOS are running. Similar techniques can be used for the Supervising Program. Only the Supervising Program needs to be secure, not the Software that is later going to be protected. • Hardware vendors must provide a clock that advances continuously and uniformly (for example, one that keeps in step with Greenwich Mean Time so is unaffected by time zone or daylight-savings time). The Trusted Platform Module already provides a counter that is guaranteed to increase over time. • The operating system must interact with the Supervising Program by ensuring that it runs periodically. The Shield system does the rest, ensuring piracy prevention while preserving privacy. Before we discuss it, however, we briefly examine the main existing approaches to prevent piracy. Current Approaches to Combat Piracy Many companies offer piracy prevention or, more generally, digital-rights-management software. The main distinction between the two is that digital-rights-management software may also include linguistic constructs to describe usage possibilities, a prominent example being ContentGuard’s XrML language (http:// www.contentguard.com/xrml.asp). In this article, however, we concentrate on piracy prevention, because that is the fundamental technology upon which all else rests. The best current approach is to encapsulate software inside hardware. Video cameras do this, but in the computer software world, such software comes on hardware attachments, such as so-called “dongles,” like those from MicroWorks (http://www .mw- inc.com/) and SafeNet (http://www.safenet- inc.com/ products/tokens/ikey1000.asp). This solution is feasible if the dongle can be rendered tamperproof and by running impractical-to-reconstruct parts of the software program on the dongle. The dongle approach is vulnerable to a reverse engineering attack of that “impractical-to-reconstruct” software. Even when the dongle approach works technically, however, the hardware apSotware is freeware? No User device has rights to Yes run software?
Yes
It runs normally
No Hinder use
Figure 1: Piracy prevention flowchart. No information leaves the User Device. 18
proach makes it difficult to use several unrelated but protected software items at once and is, in general, cumbersome. A part-hardware approach is to ship software out on “copyproof” CDs. Again, extremely low-tech attacks (scribbling on CD rims) have defeated such solutions in the past. But even if the CD is truly copyproof, what happens if the content ends up on a web site from which it can be downloaded. This attack, dubbed “Break-Once, Run Everywhere” (BORE), can render an entire factory’s work a waste of time and effort. A software-imitates-hardware approach is to encrypt the content and ship the key to the client site, which can then execute the software only if it has the proper keys. This solution suffers from the BORE problem as well: If the content can ever be constructed in the clear through either an attack on the encryption, an attack on even one User Device where the software has been running, or an insider leak by an employee of the software author, it can be used everywhere. License servers combat piracy by requiring licensed software to get permission to continue running from time to time. This scheme can be attacked if a would-be pirate can simulate the license server’s responses, or change the software not to query the license server. If either happens, there is a BORE problem. In addition, this solution requires the software author to modify the software by introducing the (hopefully nonremovable) calls to the license server. Even if not, the notion of having to report usage to an outside license server inherently infringes on privacy. There are approaches that don’t try to prevent piracy but try to track and/or punish the pirates. The “watermarking” approach is to write some unique undetectable digital message on each instance of the software. If that digital message is found on many instances of software in the field, then the original purchaser of that watermarked copy is the source of those copies. The problems with this scheme range from the theoretical (it doesn’t seem possible to create an undetectable watermark) to the practical (how does one track down copies and test them for watermarks). Further, there is the problem of legal punishment. Trials are expensive, timeconsuming affairs. Finally, the technique depends fundamentally on violating privacy, because it requires identifying the “criminal.” A second form of punishment is to put “poisoned apples” in places where pirates are likely to look. The idea is to punish pirates by giving them something that looks good but isn’t — conceivably a virus but more commonly a broken piece of content. Two years ago, a pirate downloading a Madonna song from a site might instead find a furious Madonna piping out expletives. Since then, poisoning peer-to-peer networks has become a thriving cottage industry. For certain kinds of software, notably movies and music, the aforementioned solutions do not prevent a would-be pirate from digitally recording the content while watching or listening and then later redistributing the recording. Copying and redistributing content in this way is known as the “Analog Hole” attack. All existing solutions (other than wrapping the software inside a hardware device) suffer from a BORE attack. Most of these solutions infringe on privacy, sometimes by design. A better solution should avoid BORE, avoid courts, and preserve privacy. Towards a New Approach Our approach to protection is simple: As Figure 1 illustrates, periodically during the execution of software on the User Device, our Supervising Program checks whether the software is freeware or not. If not, the Supervising Program identifies the software and checks whether this User Device has the rights to run this software. If so, the software continues to run; if not, the software is either stopped or markedly slowed down. No information leaves the device. The punishment is to hinder use. To realize this approach, we have to specify how rights are transported to the User Device, how rights can be transferred
Dr. Dobb’s Journal, October 2005
http://www.ddj.com
(continued from page 18) between User Devices for purposes of fair use and upgrades, and how the Supervising Program can determine which software is running. At each step, we show how privacy is preserved. The basic data flow of the Shield system is in Figure 2. Briefly, privacy-preserving purchases are shown on the left side of the User Device, content-identifying information enters from the Superfingerprint server depicted on the upper right, and privacypreserving rights information is exchanged with the Guardian Center. One important point: The indicated interactions with the Software Vendor, the Superfingerprint Server, and the Guardian Center are infrequent (on the order of once per week) and need little bandwidth. People who like to work mostly offline can continue to do so. Privacy-Preserving Purchase Our ability to preserve privacy while preventing piracy is based on the fact that rights, as embodied in “Tags,” are stored on the User Device in data structures called “Tag Tables”; see Figure 3. The relationship between the Tag Table Identifier (TTID) and the Tag is an internal affair of the User Device. At purchase time, Tag-related information flows between the User Device and the Vendor/Author, but the Vendor/Author does not know for which TTID. At rights-management time, TTIDs flow between the User Device and the Guardian Center but the Guardian Center does not know for which Tags. So even if the Vendor, Author, and Guardian Center all collude, they cannot determine which sets of Tags belong to the same User Device, much less which particular User Device owns any particular Tag. When the owner of a User Device wishes to purchase digital content (including digital content that has been preloaded on the User Device or installed from a CD), the Supervising ProContent identifying info Content Author
SPFs
Superfingerprints (SPFs) Server
User Device
Content Signed Purchase Order
Purchase Order
Content Vendor
Call-ups(TTIDs)
Continuation Guardian Center messages
Purchase Order
Content
Figure 2: System architecture. OS & Supervising Program (SP) Secure Clock Superfingerprints Tag Tables (TTs) TTID1 Tag Tag …
TTID2 Tag Tag …
...
Content
Figure 3: User Device. 20
TTIDk Tag Tag …
gram on that device creates a structure identifying the software and its associated Tag Table Identifier: S = (Name(C), TTID, Hash(C), UsagePolicy, NONCE)
Name(C) is the name of the content. TTID is the identifier of the Tag Table into which the Tag will eventually go. Hash(C) is the hash value of the content. UsagePolicy is some kind of policy such as perpetual use or three-month use. NONCE is a number that is randomly chosen from a large number space (for instance, from 128-bit numbers) and that is never used again. We use the NONCE to hide the value of TTID even should the Vendor collude with the Guardian Center. A Purchase Order consists of: (Hash(S), Name(C), Hash(C), UsagePolicy)
The hash function is a one-way hash function (see the accompanying text box entitled “Crypto Technologies”) such as SHA1 (or any of its improved versions), so no outsider can compute the TTID by inverting the function and no outsider can guessand-check the TTID because of the NONCE component of S. The Purchase Order may be sent to Vendor/Author over an anonymizing network, to make the source unknown to the sender [3]. The purchase may be in digital cash. Thus, the Vendor/Author can be prevented from knowing the identity of the purchaser, but can verify that the purchase amount corresponds to the correct price. If so, the Author digitally signs the Purchase Order Sign_Author(PurchaseOrder) and sends it to the User Device via the Vendor. (By signing, the Author guarantees that it is paid for every purchase. If Vendor signatures were sufficient, then a rogue Vendor could start selling content on its own. As a practical matter, the Author may devolve signing privileges to select Vendors.) The Supervising Program then verifies that the Author’s signature is correct. This is possible because the User Device has previously downloaded from the Superfingerprint Server authenticated (digitally signed) data including a list of the Authors’ public signature-verification keys. If the signature is verified to be that of the author of the content C, and S is consistent with the signed Purchase Order, the Supervising Program installs the triple (Author name, S, signed Purchase Order) into the Tag Table having identifier TTID. That triple is the Tag; see Figure 4. If a user pays with anonymous digital cash (or even a one-use credit card) and sends orders over an anonymizing network (see, for example, http://tor.eff.org/), the Vendor/Author will not know who did the purchase. Further, the Vendor/Author will not know which TTID is associated with this purchase. Superfingerprint Information The Superfingerprint Server (upper right corner of Figure 1) periodically sends several kinds of information updates to the User Device. All User Devices receive the same information and must be reasonably up-to-date (for instance, this information must not be more than one week old, so the User Device must receive the Superfingerprints once a week). • Content Identifying Information. This data associates with the name Name(C), of each content C that is protected by the system, data enabling the Supervising Program to identify C when it runs. What running or executing means depends on the type of digital content. In the case of a computer program, running means the execution of the program and identification information can then be derived from sequences of machine instructions executed by the program at runtime and from functionalities of the program. Alternatively, the content could be music, in which case, the identification information could be derived from frequency components of the melody. The Content-Identifying Information for a content C typically fits
Dr. Dobb’s Journal, October 2005
http://www.ddj.com
(continued from page 20) in about 1/1000 of the number of bytes of C (significantly less for movies). Each Author wishing to protect a content C runs a program (or asks a professional organization to run a program) that generates relevant Content-Identifying Information. That information is distributed to Superfingerprint Servers. These in turn send the additional Content-Identifying Information to User Devices during the next Superfingerprint broadcast. A Vendor/Author need not change the content C in any way to enable this protection. As a consequence, the antipiracy protection can be deployed after distribution of C. • Content-Identifying Algorithms. The Supervising Program initially includes a suite of Content-Identifying Algorithms (which employ the Content-Identifying Information) to identify protected content. The algorithms are tailored to the type of content; for example, one class of algorithms for computer programs, another for music or video, and so on. But the algorithms apply to all examples of content in each class. One attack on the combination of Content-Identifying Information and Content-Identifying Algorithms consists of obfuscating the code or music or other content so it has the same effect to end users but looks different to our detection system. Experimentation has shown that detection algorithms can be made robust against a wide range of obfuscation attacks. User Device
(Compressing the content does not hinder our detection because detection occurs primarily at runtime.) The framework counters further obfuscation attacks by requiring the User Device to obtain periodic updates (weekly, for instance) of Content-Identifying Information and algorithms from the Superfingerprint Server. As obfuscations improve, so can our detection. • Lists of pairs: Signature-verification key, Author name. This information lets the User Device verify whether a given Author’s signature corresponds to an Author. In addition, there will be pairs relating the hashes of content to Author names. Together, these ensure that the signature of an Author as found in a Tag in fact constitutes sufficient Authority to allow the use of software. This combats the attack where author A creates content X but author B signs Purchase Orders for content X without having the right to do so. All communication with the Superfingerprint Server is oneway— from Superfingerprint Server to User Device, again possibly through an anonymizing network. Consequently, no information leaves the User Device. Transfers Without Promiscuity Finally, there is the question of managing rights. Fair rights laws and tradition require the ability to make backups. Our technology Author
Vendor
Prepare Purchase Order. Digital cash/one-time credit-card number.
Anonymizing network
Verify purchase conditions/money.
Verify. Install Tag into Tag Table.
Pass signed Purchase Order through.
Sign Purchase Order. (Knows what has been purchased, but not by whom.)
Figure 4: Privacy-preserving purchase. Identity of user hidden by anonymizing network and digital cash. Tag Table Identifier is embedded into Purchase Order using a one-way function.
Crypto Technologies
W
hereas our approach never encrypts content, it makes substantial use of three cryptographic technologies — one-way functions to hide Tag Table Identifiers and User Device Descriptive Values, digital signatures to establish the identity of sites on the network, and Secure Sockets Layer (SSL) to ensure private communication of TTIDs. Intuitively, a function f is one-way if, given x, it is easy to compute f(x) whereas given y, it is hard to find an x such that y = f(x). The hash function SHA-1 is one example (among many) of a oneway function. The purpose of a digital signature is the same as of a written one — to establish the identity of the signer of a mes-
22
sage. When you sign a contract, the holder of that contract can go to court and assert your agreement to the contract. Ideal written signatures are unforgeable but recognizable: only X can produce X’s signature but anyone can recognize that signature. So, only one person can sign, but anyone can verify (at any time or place). Digital signatures work the same way: An agent (say, the Guardian Center) in our protocol uses a private key to sign a document but that agent’s signature-verification key is well known (say, is in the Supervising Program of every User Device). Therefore, if a message arrives purporting to be from that agent, then any User Device can test whether the message is in fact from that agent.
Dr. Dobb’s Journal, October 2005
The Secure Sockets Layer (SSL) protocol is a client-server protocol offering asymmetric authentication and private communication. SSL assures the client (in our protocols, the User Device) that the server has a particular identify (in our Call-Up protocol, that the server really is the Guardian Center). SSL also enables the client and server to agree on a private key, which can be used in subsequent communication. The net effect is that the client knows the identity of the server (but not the other way around) and that the content of the exchange between client and server remains hidden from anyone else. — M.R. and D.S.
http://www.ddj.com
allows any number of backups to be made of everything — the Tags, Tag Tables, and content. Further, we want to allow transfers of rights, so Tag Tables may be moved from one User Device to another, provided the Tag Table is disabled on the first device. On the other hand, we don’t want the same Tag Table to appear on millions of devices. We reconcile these two goals through communication between each User Device and the Guardian Center. The basic purpose of this communication is to determine whether a Tag Table having some Tag Table Identifier is on several devices. Let us back up for a moment. TTIDs come about by randomly generating an identifier from a large (128-bit numbers) space perhaps based on time, typing characteristics, or a special random process. The chances of collisions in such a case are, for all practical purposes, negligible until the number of TTIDs is extremely large (for instance, a billion billion for 128bit TTIDs). So when first created, every Tag Table has a globally unique TTID. To ensure that only one User Device contains a particular TTID at a given time, each User Device performs a “Call-up” between some minimum and maximum time, say every five to seven days. As shown in Figure 5, a Call-up from device U consists of a message to the Guardian Center where the message contains a list of all enabled TTIDs of User Device U, a timestamp, and the hash of a “User Device Descriptive Value” of U appended to a NONCE. The User Device Descriptive Value contains some slowly changing property of the device that only a small number of devices have (for example, a processor ID, if available, or something about the number of files or structure of directories on the device). The use of the one-way hash function prevents any knowledge of this value from leaving the device. The Call-up is sent using a well-known secure protocol such as SSL (see “Crypto Technologies”), so no third party can see which TTIDs are being sent. The Guardian Center checks each TTID x in the list of TTIDs to see whether an overly recent Call-up contained x. If so, the Guardian Center either records the fact for future reference or, if this has happened more than some threshold number of times, the Guardian Center invalidates that TTID. After this analysis, the Guardian Center responds to the Callup with a signed “Continuation Message” listing valid Tag Table Identifiers: Sign_GuardianCenter(timestamp, Hash(User Device Descriptive Value, NONCE), TTID1, TTID3,…)
The timestamp ensures that the device cannot simply replay an old Continuation Message. The hash together with the NONCE prevent the Guardian Center from learning the User Device Descriptive Value. The User Device Descriptive Value permits the Supervising Program on User Device U to ensure that the Continuation Message was meant for U. This prevents a single Continuation Message from being used by many shadow User Devices. The User Device associates the most recent Continuation Message and its associated User Device Descriptive Value with each Tag Table. If the User Device Descriptive Value no longer matches the relevant properties of the User Device (perhaps due to a transfer of a Tag Table to this device), the Supervising Program on the User Device performs a new Call-up for just that Tag Table. On the User Device, the Supervising Program disables Tag Tables whose TTIDs have not been included in the most recent Continuation Message. There is a grace period policy, however, allowing devices to use the software associated with Tag Tables even if out-of-date, provided this doesn’t happen too often. A user transfers content by disabling its associated Tag Table x on the source device and sending it to a destination device. After doing a Call-up for Tag Table x, the destination device can now use all the software items whose Tags involve the transferred TTID. http://www.ddj.com
Dr. Dobb’s Journal, October 2005
23
Failure to disable Tag Table x and its TTID on the source device will soon thereafter lead to overly frequent Call-ups for that TTID being sent to the Guardian Center. Call-ups must be done over a secure channel (such as SSL) to prevent malicious users from fakGuardian Center
User Device List of Tag Table IDs (TTIDs), Hash (User Device Descriptive Value, NONCE). Anonymizing network
Verify that the Continuation Message pertains to this device based on User Device Descriptive Value and that the timestamp > time of Call-up.
For each TTID t, if t hasn’t been included in a Call-up too frequently over the recent period, then t is valid.
Continuation message= Sign_GuardianCenter (timestamp, Hash(User Device Descriptive Value, NONCE), list of valid TTIDs). Guardian Center sends Continuation Message.
Figure 5: Privacy-preserving Call-ups. User knows that it is talking to the Guardian Center but not vice versa (an option of SSL). TTIDs do not reveal the associated Tags. The one-way hash function associated with the NONCE prevents any revelation of the User Device Descriptive Value, so even processor identifiers can be used without fear of privacy breach.
24
ing Call-ups with a given TTID y just to deny the real owner of the Tag Table having TTID y from using that Tag Table. Note also that the Guardian Center need not be a single device. Guardian Center data may be replicated and any one of several Guardian Center nodes can handle a given Call-up request, or data may be partitioned based on TTID. (The Guardian Center data consists of information about TTIDs: time of last Call-up and a history of any overly early Call-ups.) In any case, the Guardian Center workload scales easily. Putting It All Together Here is a quick overview of the whole system. Every User Device includes a Supervising Program. When software C is being used (for example, executed) on the User Device, the Supervising Program attempts to identify C by use of Content Identifying Information and Algorithms present on the User Device. If unsuccessful, then C is deemed to be freeware and use proceeds. If identified as software named N, then the Supervising Program searches for a Tag for N in a Tag Table having a valid TTID. If found, then the Supervising Program verifies that the current usage is in accordance with the UsagePolicy for that instance of C included in the Tag for C. If everything checks out, then use of C is allowed, otherwise use is stopped or hindered. The Supervising Program is run at regular periods, checking the running queue of the User Device. It can be designed to consume fewer than 2–3 percent of the computing resources. In our experiments, its impact on the performance of even compute-intensive workloads, such as computer games, is unnoticeable. The Supervising Program performs the protected software installation task. The actual software purchase can be done outside of the User Device, for example, by an organization’s purchasing department.
Dr. Dobb’s Journal, October 2005
http://www.ddj.com
Frequently Asked Questions When we talk about this framework, we hear several questions: Q: How can we claim that we preserve privacy when we have Call-ups? A: The Call-ups send information that neither identifies the user nor the software nor the Tags on the User Device, because TTIDs are sent rather than Tags. The protocol can be verified by third parties. Alternatively, you could avoid Call-ups by linking Tags to machines IDs, but then transfers would become more complicated and purchases as well as transfers might potentially infringe on privacy. Q: Why don’t we suffer from BORE? Superfingerprints detect use of software rather than mere possession. Can’t one subvert your detection? A: Maybe, but it is possible to do a very good job of detecting functional equivalents of software. Also, Superfingerprints can be improved with each download to counter new attacks. Q: What happens when you catch someone stealing? A: The Supervising Program on the device stops or slows down the use of that software. No information leaves the User Device. This is the functional equivalent of a speed bump: Behave, because you get car-sick if you don’t. Q: So, if this is so great, why isn’t this adopted? A: For this architecture to take hold, the hardware and operating-system Vendors must cooperate. The enabling technology for the protection system essentially exists, so it is a question of willingness. Platform Vendor incentives aren’t so clear. If one platform Vendor provides piracy prevention and another doesn’t, consumers may prefer the one that doesn’t. If our solution is used, the only reason consumers will have to dislike the piracy-prevention system is that it prevents the ability to steal. It is possible that legislation will be necessary to ensure that no platform vendor benefits by making a platform that makes stealing easier. There is precedent for this: When catalytic converters to reduce automobile pollution emissions first came on the market, many consumers resisted their introduction because they made both acceleration and gas mileage suffer, besides raising the price of the car. Their introduction has greatly reduced air pollution, however, so it constituted a societal good. Legislation was necessary to avoid having consumers punish vendors who advanced that societal good. The same may happen here. Further, whereas our architecture imposes negligible penalties on performance, it permits many new usage models such as paying for use only when needed (pay for tax software only at tax time), the preloading of software, and digital distribution of software. The saved costs from cheaper distribution and vastly reduced piracy run into tens of billions of dollars, enough to benefit all players — authors, consumers, and platform vendors. Again, there is precedent for the situation where taking on a burden ultimately enhances profit. When credit-card companies cap payments by 26
consumers due to fraudulent uses of their cards, consumers feel more confident about using their credit cards. Similarly, when platform vendors support this framework, this will allow many new creative and inexpensive uses of and distribution of content, enhancing the value of platforms everywhere and ultimately reducing the price of software to all consumers. Indeed, we foresee an alliance between (enlightened) consumers, platform vendors, and authors supporting this framework, because it is in everyone’s economic and artistic interest. Conclusion The Shield Approach is a flexible, privacy-preserving, antipiracy solution that does not suffer from “Break Once, Run Everywhere.” It protects privacy in a strong sense: It can be configured so that no one knows what you buy, what you use, or even whether you cheat. Because the content is obtainable separately from the Tag, preloading the content is possible. Transfers and fair use are straightforward. Finally, the solution is technology friendly. We embrace peer-to-peer networks, video-on-demand, superdistribution, and free software. Content Vendors will feel free to distribute content over the Internet, reducing distribution costs and material waste. Lawsuits will be reduced. Isn’t it time for technology to solve this problem? Acknowledgments Warm thanks to our principal coworkers in this effort: Yossi Beinart, Carl Bosley, Ramon Caceres, Aaron Ingram, Timir Karia, David Molnar, and Sean Rollinson. References [1] “Authentication in Distributed Systems: Theory and Practice.” Butler Lampson, Martmn Abadi, Michael Burrows, and Edward Wobber. ACM Transactions on Computer Systems Volume 10, Number 4, (November 1992), pp. 265–310. [2] For information about trusted coprocessors, see https:// www.trustedcomputinggroup.org/home/. [3] For information about anonymizing networks, see http:// tor.eff.org/. DDJ
Pragmatic Exceptions . . . .
(continued from page 24) The Supervising Program periodically downloads authenticated (that is, timestamped and digitally signed) updates of the ContentIdentifying Information, Content-Identifying Algorithms and lists of (Author name, content-hash) pairs, and (Author name, signatureverification key) pairs from the Superfingerprint Server. To revalidate its Tag Table identifiers, the Supervising Program periodically calls up a Guardian Center. The Call-ups are infrequent and require little bandwidth. Transfers entail movements of Tag Tables from one User Device to another. Back-ups are unlimited. Every reasonable model of fair use is easy to implement. For example, it’s possible to lend your software to your friend (two transfers), to allow short term use (Tags having short-term Usage Policies), and family packs (single purchase yields the privilege to obtain multiple Tags).
Tip #2: Refactor for Exceptional Clarity Tip #2 is the logical follow-on from Tip #1, “If In Doubt, Throw It Out,” (DDJ, September 2005). In deciding if a local exception should be caught or pitched out, wellfactored and highly focused code is ideal. Refactoring is fundamental to good programming practice in so many ways, an important one being that it helps you understand the finer points of the method contract. This removes doubt and leads to better decisions regarding exceptions. The smaller your functions, the easier it is to tell whether what just happened was normal. Because each function can clearly specify exactly what should be expected, knowing whether to throw an exception becomes obvious. Large functions obscure or even obliviate the contract they’re supposed to fulfill. — Benjamin Booth [email protected]
Dr. Dobb’s Journal, October 2005
http://www.ddj.com
Reestablishing Trust In the Web A browser extension identifying sites AMIR HERZBERG AND AHMAD JBARA
E
lectronic commerce is growing rapidly. Unfortunately, electronic fraud is growing just as fast. Among the most acute current security threats are web spoofing and “phishing.” Web spoofing is the creation of fake web sites, typically a fake e-banking login page designed to harvest user passwords. On the other hand, phishing attacks involve fake e-mail messages, typically directing recipients to spoofed sites on some pretext. Alas, many users fall victim to such attacks. Studies estimate millions of victim users and stolen accounts (“phish”), and damages in the order of $1 billion in 2003 alone. Furthermore, both the rate and sophistication of the attacks is accelerating, attracting more and more criminal elements. Amazingly, current browsers fail to protect users by helping them tell the difference
Amir is a professor of computer science at Bar Ilan University and Ahmad an instructor at Netanya Academic College. Ahmad is also a graduate student in the computer science department at Bar Ilan University. They can be contacted at http://AmirHerzberg.com/ and achmad@ netanya.ac.il, respectively. 28
between known, trustworthy sites and spoofed ones. We created TrustBar (http://AmirHerzberg .com/TrustBar/) to address this problem. TrustBar is a browser extension that provides improved security, identification, and trust indicators. TrustBar is sufficiently visible to draw the attention of even naive users upon entering a spoofed site. In this article, we examine the current browser UI for identifying sites and their weaknesses and explain the extended UI provided by TrustBar. Browser Identification Mechanisms Granted, browser UIs include areas that should be examined by users to authenticate web sites. For instance, the location bar contains the location (URL) of the current web page, while the status bar contains a closed padlock icon in pages protected by the SSL/TLS protocol. However, these elements are not sufficient to protect most users. For one thing, the location bar and padlock are not sufficiently visible, and many naive users are not even aware of their existence, let alone importance or meaning. In particular, the location is usually given as a URL, and most users do not know which part of it identifies the domain. Indeed many users ignore it or completely remove it. Furthermore, spoofed sites may remove it, possibly replacing it by a fake look-alike image and/or script. Nor, for that matter, is the padlock highly visible, and most users are not aware that a padlock is meaningful only in the status bar and not inside the web page. In any event, the padlock merely indicates only whether the site has invoked SSL/TLS Dr. Dobb’s Journal, October 2005
with a public key certificate from one of the (hundred or so) Certificate Authorities (CAs) trusted by the browser. These CAs differ extensively in their requirements,
“Most users rely on the content of web sites as a means to identify the site and whether it is protected” procedures, and costs. Specifically, some certificates only involve validation of ownership of the domain by e-mail/phone to the contact, while others involve validation of corporate documents. Most users are not aware of the identity of most CAs, not to mention that these (unknown) entities are responsible for validating the identities of owners of (protected) web sites. In reality, most users rely on the content of web sites as a means to identify the site and whether it is protected. Unfortunately, it is trivial for attackers to http://www.ddj.com
(continued from page 28) mimic the appearance of victim sites. This situation is made worse because some of the most important web sites request passwords and even indicate a padlock and claim to use security in unprotected web pages. Most of these sites invoke SSL/TLS to encrypt the password in transit, but users have no way to know this in advance, and therefore, are unlikely to detect a spoofed version that sends the passwords to a cracker. Amazingly, this trivial-to- fix— yet fatal — vulnerability exists in many sensitive sites, including online banks (Chase, PayPal, Wells Fargo, MidFirst, TD Waterhouse, Bank of America); merchants (Amazon); and even security services (Microsoft Passport and Equifax); see Figure 1 and http:// AmirHerzberg.com/Shame.htm for an updated list. The combined vulnerability of the browser’s UI, user’s naïveté, and the irresponsibility of site designers encourage web spoofing and phishing attacks. TrustBar is an open-source browser extension we built for the Mozilla and Mozilla Firefox browsers (http://trustbar .mozdev.org/). The purpose of TrustBar is to provide highly visible, preferably graphical, indicators for the identification of sites. Specifically, TrustBar pre-
sents the identity of the site as names or preferably logo, rather than URLs, and lets users select their own name or logo (“My Bank,” for instance). Furthermore, TrustBar also presents the identity of the Certificate Authority (CA), which is the entity that validated the identity of the site. Some CAs have multiple certificate products, typically for different levels of identity validation; TrustBar lets the CA display a different logo for each such product or class of certificate. Because we designed TrustBar as an integral part of the browser UI, attackers have no control over its display and cannot remove or clone it. Furthermore, it contains clear graphical and/or textual indicators so that users can distinguish between original and cloned sites — identification for the site, identification for the CA, and indication whether the site is protected. Users can edit both text and graphical identifications. We decided to locate our bar at the top of the browser window, above all other toolbars. It is fixed and beyond the control of web sites. As such, it appears under all conditions. These properties were implemented via the XUL language that was used to build the Mozilla UI. We are currently investigating ways to make Trust-
Bar more like other bars, while still protecting it from removal or cloning by rogue web pages. Mozilla Firefox Extensions Using extensions, you can enhance Mozilla’s Firefox browser functionality. The TrustBar extension is a collection of files developed as an independent package and overlaid on Mozilla Firefox. This package consists of XUL, CSS, JavaScript, and image files. All files must be zipped in one XPI file using a ZIP utility. The XUL file contains a description of the extension UI. If the UI is complicated, then its description can span several XUL files. Each XUL file can be overlaid with the original Firefox XUL file to affect the browser UI. The CSS files describe the attributes of the UI elements defined in the XUL files, and the JavaScript files are the code controlling the system. To create an installable extension, all these files are ZIPPED into a JAR file and that, together with install.rdf file, must be packaged in an XPI file. The TrustBar Package The TrustBar package is provided in the installation file TrustBar.xpi created by using ZIP. This file contains: • Install.rdf, a mandatory file in any Firefox extension. This is an XML-like file that defines properties such as GUID, name, version, target application, and JAR files. • TrustBarOverlay.xul, which defines the UI of the main bar of TrustBar. • TrustBar.js, a Javascript file that contains code for supporting the UI defined in TrustBarOverlay.xul. • TrustBarDlg.xul, which defines the TrustBar dialog UI. This file also includes JavaScript code for supporting this UI. • TrustBarGlobal.js, which defines functionality used globally. • TrustBar.jar, a ZIPPED file that contains the XUL and JavaScript files, except for the install.rdf file.
Figure 1: Unprotected login.
The TrustBar identification mechanism is always displayed, regardless of the security level of the loaded site. When the site reached is unprotected, TrustBar displays a message (such as Figure 2) and two buttons: • Suspect Fraud Button, which lets users report suspected sites. • What Does It Mean Button, which presents users with explanations about the meaning of entering an unprotected site and suggests a potential secure alternative. Figure 2: TrustBar within an unprotected versus browser within an unprotected site. 30
Dr. Dobb’s Journal, October 2005
In secure sites (as shown in Figure 3), TrustBar lets users modify the identification http://www.ddj.com
(continued from page 30) details by clicking the “show TrustBar dialog” (Figure 4). Within this dialog, users can edit the site/CA organization name, attach a new logo to the site/CA, and change the trust extent for the current CA. To do it easily and quickly, TrustBar lets users replace current site logos with new ones by right clicking the mouse over any image within the site contents and making that image the active logo (Figure 3). TrustBar UI The TrustBar UI is merged into Mozilla Firefox UI using Listing One. This is an XUL file that defines the TrustBar UI and is overlaid with the Firefox original XUL file, called browser.xul. Line 1 defines file elements that are embedded with the browser’s original main window and become a part of its UI. The TrustBar’s main element is the horizontal TrustBarBox defined in line 13. This element functions as a container of
any number of elements such as logos, text, and buttons. We verified that the mainCommandSet element defined in the Firefox browser.xul file is always above all other elements. We use the insertbefore attribute to place TrustBarBox above it. The context attribute defines the context menu activated from inside the TrustBarBox. The elements contained in our main box are defined in lines 14 to 21. These elements represent either the site and CA logos or their text. The whatId button lets users reach the TrustBar dialog for further information and modifications, and the fraudId button lets users report suspected sites. The TrustBar dialog pops up whenever an unrecognized secure site is reached or by clicking the whatId button within a secure site. The XUL file describing the TrustBar dialog UI is available electronically; see “Resource Center,” page 4.
<window id="main-window" onload="Init()" >
This statement overrides the onload event of the main window of the browser by the TrustBar Init function. The Init function in line 3 initially calls the initialization function of the main window of the browser and then initializes TrustBar. The initialization tasks of TrustBar include: • Creating local directories for saving information about sites and CAs. This is done by calling the createLogoDir function (line 7). • Initializing a listener to the browser so TrustBar can get all types of notifications from the browser. Line 5 adds the user-defined listener TrustBarProgressListener (Listing Three) to the browser. The listener definition includes all notifications that have to be implemented. One of the main notifications to which TrustBar responds is the onSecurityChange notification (line 10). This notification is received whenever a switch occurs between unprotected and secure sites and vice versa. We look into whether the browser switched to a secure or unprotected site in line 13. The switch statement checks the State variable against two constant values in lines 15 and 26. In case the browser security level changes to a secure site, TrustBar responds in line 15; if it changes to an unprotected site, TrustBar, line 26, responds. The response consists of two functions — one is for destroying the current secure UI, and the other for constructing the unprotected UI to be conformed to the new state. These two functions are straightforward. Basically, they hide some elements and make others visible. In the secure case, we initially verify the SSL certificate by calling the verifySSLCertificate. If this check fails, TrustBar automatically switches to unprotected state. If this verification passes, we make some UI initializations. In line 21, we call the updateTrustBarDB function that checks whether the CA and the site are known to TrustBar, and accordingly, updates TrustBar’s local database. Based on the database, TrustBar decides whether to present its dialog.
Figure 3: TrustBar within a secure site.
DDJ (Listings begin on page 34.)
Figure 4: TrustBar dialog. 32
TrustBar in Action Once the TrustBar UI is merged into the Mozilla Firefox UI, the JavaScript code in it is activated (see Listing Two). The main function that initiates TrustBar is Init. This function is called when the main window of the browser is loaded. This is done using this XUL statement:
Extended Visual Cryptography Schemes isual cryptography is a graphical method of concealing information. The information you want to hide can contain graphics, hand- or machinewritten text, spreadsheet calculations, and the like. A visual cryptography scheme is based on the fact that each pixel of an image is divided into a certain number m of subpixels. The number m is called the “pixel expansion” of the scheme (because there is no technical way to divide a pixel, you go the other way and expand any pixel to a matrix of m pixels). The basic model consists of several transparency sheets. On each transparency, a ciphertext is printed, which is indistinguishable from random noise. The hidden message is reconstructed by stacking a certain set of transparencies and viewing them. The system can be used by anyone without any cryptography knowledge and without performing any cryptographic computations. For more information on visual cryptography, see “Visual Cryptography & Threshold Schemes” by Doug Stinson (DDJ, April 1998). There are many schemes for implementing visual cryptography. Moni Naor and Adi Shamir developed the Visual Secret Sharing Scheme (VSSS) (http://citeseer .ist.psu.edu/naor95visual.html). As a further generalization of a visual cryptogra-
phy scheme, the very existence of a secret image can be concealed by displaying a different image on each transparency. Naor and Shamir solved this problem in the case of binary (black and white) images for a (2,2) threshold scheme. The problem was also considered for a general access structure. Stefan Droste offered a higher generalization: By stacking the transparency of each participant in the scheme together, a secret image is recovered and there is only this one way to recover it (http://ls2-www .cs.uni-dortmund.de/~droste/). However, the participants of any arbitrary subset of the entire set of participants share a secret, too. Hence, you actually have a multitude of more or less secret images. M. Nakajima and Y. Yamaguchi presented a two-out-of-two Extended Visual Cryptography Scheme for “natural” (continuous tone) images (http://wscg.zcu.cz/wscg2002/ Papers_2002/A73.pdf). They suggested a theoretical framework for realizing the Graylevel Extended Visual Cryptography Scheme (GEVCS) and present some results and methods aimed at improving the contrast of the processed images. In this article, I use the framework described by Nakajima and Yamaguchi to develop a Python-based application that embodies the proposed model. I call the application “PyEvcs.” While my dithering and image-processing techniques are original, the idea of using Python to implement a visual cryptography scheme is not new. Frank Stajano’s Visual Cryptography Kit (VCK) (http://www-lce.eng.cam.ac.uk/ ~fms27/vck/) and Thomas M. Thomson’s Visual Cryptography Project (http://citeseer .ist.psu.edu/thompson00rit.html) are both implemented in Python.
Daniel is a software developer for an investment banking company in Germany. He can be contacted at [email protected].
The Model Figure 1 illustrates the model I implement. PyEvcs uses three images as input data:
Hiding information right out in the open DANIEL STOLERU
V
36
Dr. Dobb’s Journal, October 2005
the secret image (information) I want to conceal (The Cameraman), and two images representing information to be shared, Dorian and Lena. This model mainly consists of two phases — halftoning and encryption.
“The simplest visual cryptography scheme is a secret image ‘split’ into two shared images” First, you need to transform the graylevel images into simple black-and-white images in such a way that they are still meaningful; that is, the obtained blackand-white replications (called “Intermediate Images”) mimic the aspects of the continuous tone ones. PyEvcs uses an ordered dither algorithm, similar to the dithering techniques used for newspapers. To illustrate, I have defined two different dithering masks, but any viable mask can be easily inserted into the application and can be tested. But simply dithering the three images is not enough to obtain the proposed GEVCS. You also need to further process (encrypt) the images. This process is the most interesting part of the model. Again, the simplest visual cryptography scheme is a secret image “split” into two shared images. The shared images are printed onto separate transparencies and handed to the two participants in the scheme. To decrypt, the participants simply stack http://www.ddj.com
their transparencies and are able to visually recognize the recomposed secret message. All three images involved in the scheme need to have the same dimensions. Furthermore, you must have corresponding pixels — two pixels in separate images with the same coordinates in the horizontal plane. In other words, if you stack the two images, a pair of corresponding pixels will perfectly superimpose. During the dithering process at the pixel level, any pixel in the original — continuous tone — image is expanded to a matrix of black and white subpixels, with the number of black or white subpixels determined by the gray level of the original pixel. You denote the number of white subpixels in a pixel’s expanded (halftoned) version by pixel transparency. In Figure 2, three different recomposition cases can be analyzed. The encryption process applies pixel-by-pixel to the three halftoned images, controlling the transparencies of the shared pixels such that the required transparency of the target pixel is obtained. The matrix obtained by expanding a pixel is similar to a binary matrix where the “1” elements represent black subpixels and the “0” elements represent white subpixels. The operations at the pixel level are binary operations in the OR semigroup. In the Figure 2, T1, T2, and Tt are the pixel transparencies in the share1, share2, and target images, respectively. In Figures 2(a) and 2(b), reconstruction is possible. It is merely a question of arranging the black and white subpixels inside the given matrix. The problem is solved by the encryption module of the application, which builds proper matrix collections for every required gray level. Why collections? Because the security of the scheme is also important. Therefore, at any encryption step, a matrix is randomly selected from the proper collection. However, in Figure 2(c), you can no longer obtain the required transparency for the corresponding pixel in the target image, no matter how you rearrange the subpixels inside the matrices. So how can we solve such a possible situation? Assume you define a 3D space having the transparencies of the pixels in the three images involved in our problem as axes — the x-axis represents the transparencies of the pixels in share1, the y-axis represents the transparencies of the pixels in share2, and correspondingly, the z-axis represents the transparencies of the pixels in the target image. Any point in the defined space is characterized by three values representing transparencies in the three mentioned images: p(T1, T2, Tt). You first determine the volume containing the points for which the reconstruction is possible, http://www.ddj.com
as in Figures 2(a) and 2(b). Afterwards, you analyze every point outside this volume, as in Figure 2(c). For instance, say you have a point (outside the “possible zone”) p'(T'1, T'2, T't) and you still want to be able to encrypt that pixel. However, you don’t want the initial image to deteriorate. Consequently, you have to accept a compromise — the application determines the closest point to p' situated in the “possible area,” say p"(T"1, T"2, T"t), and will replace p'
with p". That is, the transparencies of the corresponding points in share1, share2, and target become T"1, T"2, and T"t, respectively. Nevertheless, you should check to ensure that the new transparencies are not disturbing your images so much that the original image will no longer be recognizable. The security of the scheme is another matter. While modifying the shared images, you are not allowed to “betray” the secret image. Thus, the problem slowly becomes more and more complex.
Figure 1: The representation of the proposed Gray-level Visual Cryptography System.
Figure 2: Three examples of pixel arrangements. Dr. Dobb’s Journal, October 2005
37
Luckily, Python makes handling the problem much easier. The Implementation PyEvcs uses the core functionality of Python, the Python Imaging Library (PIL), and the “numarray” packages. For some GUI functionality, I also used Tcl/Tk. There are a variety of dithering techniques that can be implemented. I chose an ordered dither algorithm and designed two representative dithering masks declared as global variables; see Listing One (any other dithering mask can be inserted into the code). The dithering function takes a corresponding dithering mask as parameter and returns the determined binary matrix. The dithering function also calculates the transparency (number of white subpixels) for every pixel in the graylevel (original) image, and maps the coordinates of the pixel with the corresponding transparency value; see Listing Two. As for the encryption process, Listing Three determines the okzone containing the points for which reconstruction is possible without modifying the images; see Figures 2(a) and 2(b). Afterwards, I define a “distance” function. For the points outside the okzone, it is necessary to calculate the shortest distance between the point in question and the okzone; see Listing Four. To properly encrypt the three dithered images, you first rearrange the subpixels in the obtained pixel matrices (see Figure 2). You then obtain a 2×m binary matrix in which the proper number of [1 1], [1 0], [0 1], and [0 0] columns can be calculated. If you find a point outside the okzone, you just determine the closest candidate in the okzone and correspondingly replace the transparency values; see Listing Five. Now that the 2×m matrix is known, you need only construct the proper square ma-
trix used to generate the matrix collection; see Listing Six. The matrix collection is generated by randomly permuting the columns of the base matrix; see Listing Seven.
“To get the secret information, participants of the scheme need only properly stack the transparencies” Now you are ready to apply the algorithm for obtaining the two shared images of the Gray-level Visual Cryptography Scheme. All you need to do is to put everything together in the proper order; see Listing Eight. At the end of this function, you use the psprint procedure in conjunction with the v1 and v2 views. Again, in visual cryptography, the decryption is processed directly by the human user’s vision. The secret image is revealed by properly stacking the two shared images. PyEvcs can generate all the required images (also a couple of intermediate images for testing purpose) in PostScript format. To test the results quickly, PyEvcs also simulates the superimposition of the two shared images. In such cases, the function is not very complicated; see Listing Nine. Conclusion Extended visual cryptography schemes let you construct visual secret sharing schemes
Listing One
Listing Two def encrypt(self, mask = _boxMask): """ Perform the actual dithering algorithm. For any pixel in the original image the function fills a square with black and white pixels in resulting image. If the grey level of the pixel in original image is greater than the one specified in the mask, the corresponding subpixel in resulting image will be black. Otherwise, the subpixel in the resulting image remains white. Precondition: the image must have a specified size."""
length = len(mask) result = bitmap((length*maxX, length*maxY))
38
DDJ
for x in range(maxX): for y in range(maxY): level = self.__img.getpixel((x,y)) transparency = 0 for i in range(length): for j in range(length): if level >= mask[i][j]: #print a black pixel result.set(i + x*length,j + y*length,0) else: #print a white pixel result.set(i + x*length,j + y*length) transparency = transparency + 1 self.__data[(x, y)] = transparency return result
in which the shared images are meaningful. So again, why Python? Image processing means intensive calculations, reading/writing image files in different formats, Boolean transformations at the pixel level, and many other operations. I have more than 10 years of experience in developing complex C++ and Java applications, and I’ve also developed diverse programs using Perl, Tcl, Visual Basic, and shell scripting. All programming languages and development environments have their strengths and weaknesses, but until Python, no language has provided such intuitiveness and effectiveness. Visual cryptography represents a technique for information concealment with major advantages — it is very low-tech because all that you need to use it is a couple of transparencies and a printer. Additionally, the decryption process is immediate and does not require any cryptographic knowledge from users. To get the secret information, participants of the scheme need only properly stack the transparencies. This presents a high level of security (some schemes are absolutely secure), proven in the theoretical studies. That said, visual cryptography schemes suffer mainly because of pixel expansion (any pixel in the original image is expanded during the encryption process to an m ×m matrix of pixels) and loss of contrast (you have seen that the contrast of both shared images and the target image can suffer deterioration). Combining the theoretical research in visual cryptography with more developed testing applications in Python makes a major contribution toward finding some solid practical applications for visual cryptography.
Listing Three def fillokzone(self, x, y): st = self.__secret.transparency(x, y) s1 = self.__share1.transparency(x, y) s2 = self.__share2.transparency(x, y) m = self.__secret.expansion() #the total number of columns """st must be in the range [max(0, s1 + s2 -1), min(s1, s2)]. """ Lbound = max(s1, s2) tmp = s1 + s2 Ubound = min(25, tmp) if st >= Lbound and st <= Ubound: self.__okzone.append([s1, s2, st])
Dr. Dobb’s Journal, October 2005
http://www.ddj.com
Listing Four def findmindist(self, x, y, z): dist = 0 distmap = {} if len(self.__okzone) <= 0: raise "The images cannot be encrypted. okzone empty!" x0, y0, z0 = self.__okzone[0] min = sqrt((x - x0)*(x - x0) + (y - y0)*(y - y0) + (z - z0)*(z - z0)) for i in range(len(self.__okzone)): xp, yp, zp = self.__okzone[i] dist = sqrt((x - xp)*(x - xp) + (y - yp)*(y - yp) + (z - zp)*(z - zp)) distmap[dist] = [xp, yp, zp] if dist < min: min = dist return distmap[min]
Listing Five def numberofcolumns(self, x, y): """ For every pixel we need to determine an encoding matrix. The function will return a tuple containing the number of 11, 10, 01 and 00 columns. The values are determined based on the transparencies of the corresponding 3 pixels.""" #set the transparency factors stp = self.__secret.transparency(x, y) s1p = self.__share1.transparency(x, y) s2p = self.__share2.transparency(x, y) m = self.__secret.expansion() #the total number of columns """st must be in the range [max(0, s1 + s2 -1), min(s1, s2)]. If the condition doesn't hold, we'll try to adjust the transparency of the target - st.""" Lbound = max(s1p, s2p) tmp = s1p + s2p Ubound = min(25, tmp)
Listing Nine
if stp < Lbound or stp > Ubound: s1, s2, st = self.findmindist(s1p, s2p, stp) else: s1, s2, st = s1p, s2p ,stp p00 p01 p10 p11
= = = =
ps1 = pm.pixelonshare(permutedmatrix[0]) length = len(ps1) for i in range(length): for j in range(length): if ps1[i][j] == 0: #write a white pixel on share1 share1.set(i + x*length,j + y*length,0) else: #write a black pixel on share1 share1.set(i + x*length,j + y*length) ps2 = pm.pixelonshare(permutedmatrix[1]) length = len(ps2) for i in range(length): for j in range(length): if ps2[i][j] == 0: #print a white pixel on share2 share2.set(i + x*length,j + y*length,0) else: #print a black pixel on share2 share2.set(i + x*length,j + y*length) v1 = share1.view(root, "First Share") v1.psprint("FirstShare.ps") v2 = share2.view(root, "Second Share") v2.psprint("SecondShare.ps") decryptedImage = dithering1.decrypt(share1, share2) v3 = decryptedImage.view(root, "Decrypted Image") v3.psprint("DecryptedImage.ps") if showAllImages != 0: v4, v5, v6 = pm.showimages(root) v4.psprint("FirstImgDithered.ps") v5.psprint("SecondImgDithered.ps") v6.psprint("TargetImgDithered.ps") return v1, v2, v3, v4, v5, v6 else: return v1, v2, v3
def decrypt(share1, share2): """In visual cryptography the decryption should be done only by superimposing the two shares. Here we just simulate the process."""
Listing Six def buildbasematrix(self, p11, p10, p01, p00): exp = self.__expansion matrix = zeros((2, exp*exp)) stop = p11 for i in range(stop): matrix[0, i] = 1 matrix[1, i] = 1 stop = p11 + p10 for j in range(p11, stop): matrix[0, j] = 1 matrix[1, j] = 0 stop = p11 + p10 + p01 for k in range(p11 + p10, stop): matrix[0, k] = 0 matrix[1, k] = 1 stop = p11 + p10 + p01 + p00 for l in range(p11 + p10 + p01, stop): matrix[0, l] = 0 matrix[1, l] = 0 return matrix
Listing Seven def randompermutation(self, matrix): new_matrix = [] for x in range(len(matrix)): new_matrix.append([]) numcols = range(len(matrix[0])) for y in range(len(matrix[0])): choice = random.choice(numcols) for x in range(len(new_matrix)): new_matrix[x].append(matrix[x][choice]) numcols.remove(choice) return new_matrix
Listing Eight def encodeimages(root, file1, file2, file3): pm = pixelmatrix(file1, file2, file3) maxX, maxY = pm.size() expandorder = pm.expansion() share1 = dithering1.bitmap((expandorder*maxX, expandorder*maxY)) share2 = dithering1.bitmap((expandorder*maxX, expandorder*maxY)) for x in range(maxX): for y in range(maxY): pm.fillokzone(x, y) for x in range(maxX): for y in range(maxY): p11, p10, p01, p00 = pm.numberofcolumns(x, y) basismatrix = pm.buildbasematrix(p11, p10, p01, p00) permutedmatrix = pm.randompermutation(basismatrix)
http://www.ddj.com
Dr. Dobb’s Journal, October 2005
39
Inside the SmartDongle USB Security Key Hardware support for security JOEL GYLLENSKOG
O
n Christmas Eve 1997, MicroWorks (my company) entered into a joint venture agreement to write a pointof-sale software package. We would do the programming, and our joint venture partner would do the sales and marketing. As often happens, we underestimated the time and effort required to complete the task, but a year and a half later, a product emerged ready for shipment. We put a lot of blood, sweat, and tears into our software and were eager to find a niche in the marketplace. Most of the sales were to small specialty stores that wanted to track inventory. Sales were primarily through dealer networks. One dealer convinced us that if we made a few modifications to the system, he could place a copy in every business on one of the islands in the Caribbean. We gleefully tailored the code to meet his requirements. After some time passed, however, we noticed that this particular dealer only licensed a single copy. To this day, we think this one copy was cloned and made its way into many businesses where intellectual property rights are not respected. Time passed and we looked for new niches and created more software. Still, the memory of what happened on this island and realization that software could be “borrowed” and not purchased, lingered in our minds. Thus, being engineers, we decided to create a device that would prohibit the easy “copying” of software. We knew that dongles were commercially available, but their prices seemed prohibitively high for what we desired. PreJoel is president and senior engineer at MicroWorks. He can be contacted at joel@ mw-inc.com. 40
viously we had done some engineering work for another company that produced USB devices that had good manufacturing connections. Through this connection and our experience in the field of USB drivers and devices, we created a robust USB security key at a modest price point. For some people, cracking security has a monetary incentive. For others, it is an intellectual challenge. With today’s debuggers and decompilers, it is a simple task to step through a running program and watch as each instruction is executed. It is easy enough to change a conditional branch to a no-op or to an unconditional branch. We decided to create a system that could not be cracked by such means. To that end, we designed a security key so that decisionmaking is done on the dongle and not by the application program. By putting the process in the dongle, we can control what would-be crackers can see. Another design choice was to create a dongle with a generous amount of memory. This lets application programs store information required for the successful execution of the program. This approach lets the dongle be used in many different ways. We are not limited to simple yes/no operations. Programs can be written so that they read/write data on the dongle. With this exchange of data, it is harder for crackers to compromise the system. Design Choices For the reasons just mentioned, we selected as the “brains” of the security key a part from Cypress Semiconductor that consisted of a processor, RAM, and ROM on the same chip. The part is designed so that the code burned onto the ROM is “execute” only. Even with hardware tools, nobody can read the program. The dongle also has 32 KB of flash memory for placing critical program information on the dongle. To make sure the information on the dongle is secure, we use AES 128 encryption. Included in the “execute-only” portion of the dongle are the keys required to decrypt the data. In a lab environment, someone might cut a dongle open and read the flash memory, but without knowing the keys, Dr. Dobb’s Journal, October 2005
the data would be scrambled beyond recognition. One reason for choosing a 128-bit encryption scheme is because it is the strongest key that the U.S. Government allows for export. There are a few countries where we cannot do business, but we can live with that. As for the name, “SmartDongle USB Security Key,” one important consideration in coming up with a name was its availability in the realm of the Internet. We had no trouble acquiring the “smartdongle.com” domain name.
“The linear congruential sequence is a popular and useful method for generating pseudorandom numbers” Establishing Communication The process by which communication is established between the SmartDongle and the computer works like this: The SmartDongle has a free running counter. From the time power is supplied to the chip, the firmware on the dongle starts adding one to a counter. The chip runs at 4 MHz, so the counter changes rapidly. When the application program attempts to make contact with the SmartDongle, the number is quite large. Not only is it large, but it is unpredictable. When the application program signals that it wants to communicate with the dongle, the dongle takes that large, unpredictable number, which I call “L1,” and finds the next value in its linear congruential sequence, which I call “L2.” The value of L2 now is randomly and uniformly spread over the range from 0 to 264 –1. The dongle sends L2 back to the PC and waits. The PC has the ability to generate the same linear congruential sequence. http://www.ddj.com
(continued from page 40) It finds the next value, which I call “L3”, and sends it back to the dongle. The dongle compares the value it receives from the PC with the value it calculated. If they are identical, the communication is established. If they differ, the dongle requires that the process starts again. Linear Congruential Sequences The linear congruential sequence is a popular and useful method for generating pseudorandom numbers. Pseudorandom numbers are values generated by an algorithm that appear to be random, but can be recreated at will. The process was introduced by D.H. Lehmer (see “Mathematical Methods in Large-scale Computing Units,” Proceedings of the Second Symposium on Large-Scale Digital Calculating Machinery, 1951) and enhanced by W.E. Thomson (see “A Modeled Congruence Method of Generating pseudorandom Numbers,” Computer Journal, 1958). It is clearly taught in Donald Knuth’s The Art of Computer Programming, Volume 2: Seminumerical Algorithms, Second Edition (Addison-Wesley, 1981). The algorithm works like this: Ln+1 ← (a ∗ Ln + c) modulo m
where “a” is the multiplier, c is the increment, and m is the modulus. The initial value (L0) is called the seed. Choosing the “right” values for a, c, and m are crucial. If the wrong values are chosen, then the sequence repeats quickly. To make life easy for everyone, it makes sense to use a modulus that works with the arithmetic instructions of the processor at hand. For the SmartDongle, we chose a modulus of 264. The little processor in our Cypress chip does its arithmetic 8 bits at a time. We use simple loops in the firmware to do the arithmetic on the unsigned 64-bit numbers. The Linear Congruential Sequence The values chosen for a and c can yield widely different results. We would like to have as many different numbers appear as possible. With a modulus of 264, there are 264 possible numbers that can occur in the sequence. Attempting to keep track of which numbers have occurred in the sequence sounds like a daunting task. If we were to attempt to create a bit array and keep track of which numbers appear in the sequence, it would take more RAM than exists in all computers that have ever been built on Earth (about 2-million terabytes). Fortunately, we have other options. The following algorithm stops the first time a number is repeated in a sequence. The function f can be any function: count ← 0 X ← Y ← seed do
42
Dr. Dobb’s Journal, October 2005
http://www.ddj.com
{ count ← count + 1 X ← f(X) Y ← f(f (Y)) } until X = Y
This algorithm certainly satisfies my idea of a robust algorithm. It is simple and yet effective. It uses only a trivial amount of memory, performs a modest number of calculations, and yet it works. Choosing “Good” Values for a and c Not all of the potential values for a and c yield sequences that are even close to being random. As a trivial example, if a is chosen to be zero, then the sequence quickly deteriorates. Regardless of the seed supplied, after the first value, all subsequent values have the value of c. This is not a very useful sequence. Similarly, if a is chosen to have a value of 1, the linear sequence is way too predictable. “Good” choices for a and c are prime numbers that are relatively prime to the modulus. Integers are stored on a PC as binary numbers. The modulus for binary numbers is a power of 2. To be relative prime to the modulus, the values of a and c must be odd. Some quick experimenting with values of a and c using the foregoing algorithm shows that whenever the numbers are prime, the period of the linear sequence is maximized. Stated another way, as long as a and c are relatively prime to the modulus, the values in the sequence L1, L2, L3, L4… do not repeat until all possible values have appeared exactly once. These sequences work well in our SmartDongle application. In other applications, the sequences may fail miserably. For instance, using the sequences to simulate a coin toss could be done by testing to see if the numbers are even or odd. The problem is that since a and c are both odd, the values in the linear congruential sequence alternate between even and odd numbers. In this example, selecting a bit other than the low-order bit works much better.
If the program drops out of the loop, it means that t is a prime number. We can get away with stopping when q is greater than d because multiplication is commutative. Remembering back to the days of algebra, we learned that q times d is the same as d times q. That means that the comparison to see if q*d is equal to t need only use the possible values for which q is less than or equal to the square root of t. Rather than use a separate calculation to find the square root of t, we can accomplish the same objective by just comparing the divisor and the quotient. Finding Prime Numbers The final step in finding values for a and c is pretty simple. We select an arbitrary odd number and test to see if it is prime. If it isn’t, we add 2 and try again. It doesn’t take many iterations before a prime is found: t ← an odd number we pull out of the air d←1 do { d←d+2 q←t/d if ((q ∗ d) = t) then { t is not prime t←t+2 d←1 } } until q < d
When we exit the loop, t will be a prime number. We use this method to find a and c, we use 264 as our modulus, and we use the value in our free running counter as the seed value for the sequence. The result is a query that the SmartDongle puts to the PC that is difficult to answer without knowing the values of a and c. Conclusion We believe the procedures outlined here provide a reasonable level of security. Part of that comes from reversing the roles in the interrogation process. Part of it comes from the randomness introduced by the amount of time that elapses between the moment that power is delivered to the SmartDongle and the instant that the application program attempts to start the communication process. Part of it comes from the pseudorandom nature of linear congruential sequences. Together, these processes make it more difficult to compromise the system. Had we used a SmartDongle in our point-of-sale software, I’m pretty sure that retailers on an island in the Caribbean wouldn’t have thought it was worth their time and effort to attempt to crack the code. DDJ
Prime Numbers There are lots of prime numbers that can be represented in 64 bits. We need to have a method for selecting from this rich set. This can be done without requiring a large amount of computation. For starters, we know that all of the values we want to use for a and c must be odd. I use t as the value to be tested to see if it is prime and d as the divisor. The variable q will be the quotient. d←3 do { q←t/d if ((q ∗ d) = t) then t is not prime d←d+2 } until q < d
http://www.ddj.com
Dr. Dobb’s Journal, October 2005
43
PROGRAMMER’S TOOLCHEST
Developing JSR-168 Portlets Aggregating web contents TED O’CONNOR AND MARTIN SNYDER
P
ortlets are applications viewed inside a portal framework. With the increased adoption of Java portals, both as intranets and public sites, comes the need to separate portlet functionality from the portal to both maximize code reuse and limit your dependency on specific vendors. It used to be that if you wanted to migrate portlet functionality from one portal to another, you would have to rewrite massive amounts of code. If you were lucky and planned for this potential scenario in the beginning, you might have been able to limit your changes to the JSP and minimize the changes to the supporting classes. Now, as portal vendors begin to support the Java Specification Request #168 (http://www.jcp.org/en/jsr/detail?id=168), this is becoming less of an issue for portlet developers. Writing JSR-168 portlets lets you become portal agnostic and lets vendors support a wider spectrum of platforms with little to no code changes. In this article, we examine a portlet that Wingspan Technology (where we work) developed and contributed to the opensource community. This “Published Folder” portlet is used to expose the contents of a particular folder within EMC’s Documentum eContent Server content repository (http://www.documentum.com/). We then get into the specifics of how and why this was done.
Ted and Martin are senior technologists at Wingspan Technology. They can be reached at [email protected]. 44
A portal is a web application whose primary purpose is to aggregate web content in portlets. While a portal may also provide user management and authentication, personalization, and other functionality across portlets, the role we focus on is the portlet container. Some of the more popular commercial Java portals are Vignette Application Portal, IBM WebSphere Portal, BEA WebLogic Portal, Oracle Portal, and Plumtree Corporate Portal. With the introduction of the JSR-168 specification, open-source portals are starting to take large bites out of the portal market. The leading open-source JSR-168 portals currently are eXo, Liferay, JetSpeed, and uPortal. One of the most common public examples of a portal site is my.yahoo.com. Table 1 presents the results of an informal poll of portal popularity that was conducted by Punit Pandey (http:// portlets.blogspot.com/). Portlets are the meat within the portal. They provide the real content that the end user is looking for. A portlet could be as simple as a design element composed of images or text, or it could have more personalized content such as a local weather forecast. There are no limits to the complexity of a portlet. For example, a single portlet could act as the entire web interface for some legacy system. In some portals, like eXo, the portlets are even responsible for the portal navigation. The JSR-168 portlet specification defines a set of APIs to address portlet aggregation, personalization, presentation, and security. Its main goals were that the portlets be simple, client agnostic, and secure. They would also support localization and internationalization, hot deployment, and remote execution. Enterprise Content Management (ECM), sometimes referred to as “Enterprise Document Management,” is a type of application that administers the storage, organization, classification, and retrieval of company data. This is typically in the form of a document repository. Additional functionality such as workflow, search, verDr. Dobb’s Journal, October 2005
sioning, and access control are also standard. Some examples of ECM solutions are Documentum, FileNet, and SharePoint. The Requirements The requirements of this project can be broken down into two areas — characteristics and functionality. People working
“Portlets provide the real content that the end user is looking for” on projects with similar required characteristics would be wise to consider developing JSR-168 portlets as all or part of their solution. For our purposes, the requirements were: • The ability to run on a variety of portal applications. • Simple to install and configure. • Limit additional libraries that must be shipped with the application. The key functional requirements were to: • Allow any number of users to access content in Documentum without specifying Documentum credentials. • Provide Administrators with a mechanism to specify the folder to display and connection information for Documentum. The Challenges Most of the challenges we encountered were a result of JSR-168 being in its infancy. Because of this, the different portals have varying levels of support. For example, some of the portals support custom http://www.ddj.com
window modes and states while others only support a handful of their own predefined custom states. The installation procedures can also vary greatly between the portal applications. Few of the portals support direct import of the JSR-168 portlet Web Archive (WAR) file. Many of them require the WAR to be run through a preparation tool or some other multistep deployment process. In addition to lack of support, there are also serious bugs with some of the portal and application servers. One common problem is with versions of Apache Tomcat prior to 5.5. Any portals using the older versions of Tomcat, including Pluto (the reference implementation), have problems using the application session across web contexts. This means that the portlets cannot use the session as servlets in the same WAR distribution. This is an important consideration if the client accesses servlets outside of portlet and you need to share data between the two (http://issues .apache.org/jira/browse/PLUTO-53/). Another challenge is the differences in portal architectures. Some portals require user authentication or at least have a built-in mechanism for logging in users. Others, however, make that optional or leave it as a function that a custom portlet must implement. This presents a problem when you want to provide administrative functionality through a special view or page. Several portals, such as eXo, Vignette, and IBM, support a custom “config” portlet mode to accomplish this very thing. But because this is not a standard mode, we avoid using it. We ultimately decided to avoid the user authorization problem entirely by just storing our configuration settings in a properties file. The downside to this solution is that every instance of the portlet within the portal shares the configuration settings. The Architecture There are three main sections to the DocWay Published Folder portlet in terms of architecture: • JSR-168 Portlet Class. • Display JSP and Class. • Content-Retrieval Classes. The JSR-168 portlet class is PublishedFolder.java (Listing One). This is the main entry point for the portlet and is the only class that the portal framework is really aware of for this project. The portlet class loads the Java Server Page (JSP) file that is, in turn, responsible for calling the content classes and displaying the folder contents. When users click on a link, the contentretrieval classes stream the content to the browser. http://www.ddj.com
The Code The PublishedFolder class implements the interface that the portal uses to communicate with the portlet, and controls the activation of the JSP file. In this case, we extend the javax.portlet.GenericPortlet abstract class. We could have alternatively implemented the javax.portlet.Portlet and javax.portlet.PortletConfig interfaces. However, because this portlet is so simple, it was much easier to extend the generic version and override the doView method. Typically, the portlet class is more complex as most portlets have multiple views (view, edit, help, and so on) and also have to handle user actions. The key line in this class is when the PortletRequestDispatcher is created by a call to getRequestDispatcher on the PortletContext. This is important because this is how our view.jsp file gets loaded. For details on deployment descriptors and other information, see “Introduction to JSR 168: The Java Portlet Specification (http://developers.sun.com/ prodtech/portalserver/reference/techart/ jsr168/pb_whitepaper.pdf). The view.jsp file (Listing Two) that gets loaded by the portlet class retrieves the document listing from the content source and renders the results using the HTMLRenderer class. There are a couple of points to make about this JSP. First, there is no business logic. The JSP just gets and displays results. It also is the last line in terms of error handling before the exception or error propagates up to the portal level. This is why we catch Throwable instead of Exception. Because this portlet could be on a page with countless other portlets, we need to ensure that any problems we encounter do not bubble up and crash the entire page or possibly the entire portal server. This is not just a good idea out of consideration for the other portlets on the page, but it can greatly ease debugging efforts. The different portals handle pagelevel exceptions in varying ways. Some provide the actual exception details (message, source, and stack trace); however, some just report a portlet-level exception as an internal server error with an HTTP 500 message. It the latter case, not only are you prevented from determining the root cause of the error, but if the 500 error prevents access to administrative links you may not be able to even remove the portlet from the page without uninstalling it. The abstract ContentSource class (Listing Three) is the start of the content-retrieval process. This simple class acts as the broker for the actual content-source implementations. By abstracting the repositoryspecific code, we are able to keep the display and portlet classes unchanged, regardless of the source of the documents. Classes that extend ContentSource reside in the com.docway.publishedfolder.conDr. Dobb’s Journal, October 2005
Table 1: Portal popularity. tent.impl package and must implement the getContentList and getContentDocument methods. The static ContentSource.getContentSource method currently just returns a new instance of DocumentumSource. However, this could be enhanced to use a class loader to instantiate the implementing class based on a property file setting. This way, there might be SharePointSource, JdbcSource, and the like, and the administrator can change the source without any recompiling. The DocumentumSource class is not shown, but all it does is make DQL queries return a list of documents or a single document for the abstract ContentSource methods. Because the notion of a “document” is dependent on the data source, we encapsulate this in the ContentDocument class (Listing Four). The ContentDocument class assumes that all documents, regardless of the source, have at least some standard properties; namely, ID, name, content, and size. Other custom or nonstandard attributes are handled using an ArrayList of key/value pairs. We implemented this as an ArrayList instead of a Map because we wanted to be able to use the order in which the attributes are added as the order that they are rendered in the display. A Map would have required a separate mechanism for ordering the keys. When the list is rendered in the HTML, the document’s name will be a link to the document’s content. The transmission of the document contents to the client machine is handled through a servlet, ContentServlet (Listing Five). The HTML link makes the call to the servlet passing the selected document’s ID as a query string parameter. This is then handed off to the ContentSource to retrieve the document from the configured source. The servlet then copies the InputStream from the ContentDocument to the HttpServletResponse OutputStream so the data is sent directly to the client. Content-source configuration, as mentioned previously, is achieved with a properties file. The published_ folder.properties file (Listing Six) lists the user credentials to impersonate and the location of the folder to publish. The username, password, 45
and domain properties make up the credentials and are self explanatory. Every client that loads the portlet is accessing the content source by impersonating the user that is represented by these credentials. The repository and location properties are used in conjunction to identify where the content is located. In the case of Documentum, the repository would be the name of the docbase and the location would be the r_object_id of the cabinet or folder to be published. If SharePoint were the source, these properties might be the site name and document library name, respectively. For a JDBC or filesystem source, they could identify the database and table or server and folder. Generally, the repository/location combination is enough to locate a “folder” in any content source. The document’s ID could then be relative to one or both of these values depending on how identifiers are managed in the source. For example, in Documentum, the location property is not needed to retrieve the document contents because the repository (docbase) and document ID (r_object_id) are enough to uniquely identify the requested object. Enhancements There are several areas that are good candidates for future enhancements. Because the code was developed to easily extend the ContentSource class, and therefore, provide implementations for sources other than Documentum, this is the most obvious area for further development. Additional sources could be another ECM system such as SharePoint or FileNet, a JDBC data source, or even the file system. These could be done with as little as one class that extends the abstract ContentSource class. Another enhancement would be to provide a caching mechanism at the portal level. This would limit the frequency of the queries against the content source. This could be a major performance gain if the query is time intensive because such queries currently are reexecuted with each page refresh. A third possible enhancement deals with the configuration challenge mentioned previously. A way to set the credential and repository information per portlet instance instead of per portal would make the solution more flexible and usable. This portlet has been contributed to the open-source community with a GNU General Public License (GPL). It is now hosted on SourceForge, so feel free to download the code to try it out or contribute your own enhancements. For more information on the DocWay Published Folder, see http://sourceforge.net/projects/docway/ and http://www.wingspantech.com/. DDJ 46
Dr. Dobb’s Journal, October 2005
http://www.ddj.com
Listing One
* @param size */ public ContentDocument(String id, String name, InputStream content, long size) { super(); this.id = id; this.name = name; this.content = content; this.size = size; } public void addAttribute(String key, String value) { attributes.add( new Attribute(key, value)); } public Attribute[] getAttributes() { return (Attribute[]) attributes .toArray(new Attribute[attributes.size()]); } public String getId() { return id; } public void setId(String id) { this.id = id; } public long getSize() { return size; } public void setSize(long size) { this.size = size; } public InputStream getContent() { return content; } public void setContent(InputStream content) { this.content = content; } public String getName() { return name; } public void setName(String name) { this.name = name; } public static class Attribute { private String name; private String value; public Attribute(String name, String value) { this.name = name; this.value = value; } public String getName() { return name; } public String getValue() { return value; } }
package com.docway.publishedfolder.portlet; import java.io.IOException; import javax.portlet.*; /** Main portlet class. Handles render requests from portal framework. */ public class PublishedFolder extends GenericPortlet { public void doView(RenderRequest request, RenderResponse response) throws PortletException, IOException { response.setContentType("text/html"); final PortletContext context = getPortletContext(); final PortletRequestDispatcher rd = context .getRequestDispatcher("/view.jsp"); rd.include(request, response); } }
Listing Two <%@ page import="com.docway.publishedfolder.content.ContentSource, com.docway.publishedfolder.content.ContentDocument, com.docway.publishedfolder.HTMLRenderer" %> <% String html; try { //retrieve document listing from content source final ContentSource source = ContentSource.getContentSource(); final ContentDocument[] documents = source.getContentList(); //convert the results to HTML html = HTMLRenderer.render(documents); } catch(Throwable t) { //convert the exception or error to HTML html = HTMLRenderer.render(t); } //display the HTML out.println(html); %>
Listing Three package com.docway.publishedfolder.content; import com.docway.publishedfolder.content.impl.*; /** Class used to retrieve ContentDocuments and abstract the source * specific functionality and packages. */ public abstract class ContentSource { /** Gets subclass that implements any content source specific functions. * @return ContentSource implementation */ public static ContentSource getContentSource() { // For now this just returns Documentum implementation. This method // could be easily modified to determine implementation to use based // on the properties file or some other mechanism. return new DocumentumSource(); } /** Retrieves an array of all documents that are located in location in * the repository that was set in the application's resource bundle. This * should only be used to display the list of documents and their * attributes. Since document contents are not streamed at this point the * ContentDocuments that are returned may not have that property populated * even though content might exist. * @return Array of ContentDocuments that are contained in the repository * location or null if none found. * @throws Exception */ public abstract ContentDocument[] getContentList() throws Exception; /** Retrieves an individual document by ID that is located in location in * the repository that was set in the application's resource bundle. This * should only be used to retrieve document contents. Since attributes are * not streamed the ContentDocument that is returned may not have its * attribute map populated even though additional attributes might exist. * @param contentID * Unique ID of document to retrieve. * @return ContentDocument with requested ID or null if not found. * @throws Exception */ public abstract ContentDocument getContentDocument(String contentID) throws Exception; }
Listing Four package com.docway.publishedfolder.content; import java.io.InputStream; import java.util.*; /** Class represents a document returned from a ContentSource. There are * several standard properties like id, name, content, and size. Any other * custom attributes can be stored in the attributes ArrayList. */ public class ContentDocument { private String id; private String name; private InputStream content; private long size; private ArrayList attributes = new ArrayList(); /** @param id * @param name */ public ContentDocument(String id, String name) { this(id, name, null, -1); } /* @param id * @param name * @param content
http://www.ddj.com
}
Listing Five package com.docway.publishedfolder.content; import java.io.*; import javax.servlet.ServletException; import javax.servlet.http.*; public class ContentServlet extends HttpServlet { private static final long serialVersionUID = 1L; private static final int CHUNK_SIZE = 2048; public void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException { try { final String contentID = request.getParameter("content_id"); final ContentSource source = ContentSource.getContentSource(); final ContentDocument document = source .getContentDocument(contentID); response.setStatus(200); response.setHeader("Content-Length", Long.toString(document .getSize())); response.setHeader("Content-Disposition","attachment; filename=\"" + document.getName() + "\""); copyStream(document.getContent(), response.getOutputStream()); } catch (Exception e) { throw new ServletException(e); } } /** Copies an InputSteam to an OutputStream. * @param input * @param output * @throws IOException */ private static void copyStream(final InputStream input, final OutputStream output) throws IOException { final byte[] chunk = new byte[CHUNK_SIZE]; while (true) { final int i = input.read(chunk); if (i < 0) // eof { break; } output.write(chunk, 0, i); } output.flush(); } }
Listing Six ### Sample Properties file for DocWay Published Folder portlet ### The domain property is optional. All other properties are required username=johndoe password=jsr168rules #domain= repository=MyDocbase location=0c00000180000112
Dr. Dobb’s Journal, October 2005
DDJ 47
The Eclipse Test and Performance Tools Platform An open framework for tool builders ANDY KAYLOR
S
oftware developers suffer from a peculiar delusion — we each think that our application is the center of the universe, that the only reason people buy computers is to run our applications. Oh sure, in the abstract, we know this isn’t true, but when we’re working on applications, well, that’s the way we think. As you might expect, solutions have been developed to accommodate this way of thinking. Operating systems let users run numerous applications side-by-side, while letting each one blissfully pretend that it is the only program running. But users know something the applications don’t — several of these programs are solving related problems. Things such as integrated development environments (IDEs) simply put these related tools together in the same place. But this doesn’t really solve the problem — it only moves it. Now we have tools running together, but for the most part, unaware of one another. What programmers really want is for all of these tools to work together and leverage each other’s results and capabilities. So we’ve come to a fork in the road. Down one path, developers can lock into a single vendor solution, relying on proprietary Standards for interoperability. Down the other path, they can choose open Standards for interoperability and be able to pick and choose tools designed to work with that Standard. The Eclipse Test and Performance Tools Platform (TPTP) Project (http://www.eclipse.org/tptp/) is an example of the second path. The advantages that open Standards and platforms provide include more options in selecting
Andy is a senior software engineer for Intel Corp. He can be reached at andrew [email protected]. 48
the best tool for every job and more possibilities as to how tools can interoperate. What may not be obvious is that this path also holds advantages for tool builders. In this article, I examine the Eclipse Test and Performance Tools Platform Project (formerly known as “Hyades”), of which I’m a team member. I also explore ways in which tool developers can take advantage of the capabilities that TPTP provides to develop interoperable distributed test and performance tools. What Is TPTP? TPTP is an open platform that supplies frameworks, services, and data models to enable the development of integrated testing, tracing, profiling, and monitoring tools. Just as the Eclipse workbench provides a broad and extensible starting point for tool development in general, TPTP provides a starting point specifically for developing test and performance tools. Although TPTP includes a number of fully functional exemplary tools that have been developed on top of this platform, the primary goal of the platform is to provide a common base upon which additional commercial, open-source, or in-house test and performance tools will be built. The platform consists of a stack of components that includes data collectors, a communications framework, data models, analyzers, and viewers (see Figure 1). It is structured in such a way that proprietary tools can plug-in at any point and take advantage of the capabilities offered by the rest of the stack. The idea behind the TPTP project is to move the common components of test and performance software into a shared, opensource code base. Consequently, tool developers can focus their attention on features, thereby producing better products, which (as an added benefit) plug-in to a common environment, paving the way for interoperability and symbiosis between tools. Tool vendors get the benefit of not having to develop and maintain the code infrastructure that is necessary for their product but really secondary to its actual functionality. End users get the benefit of better tools that know how to work together. Everybody wins. Dr. Dobb’s Journal, October 2005
Sounds great, right? But how do we get there? Naturally, a common user experience begins with a shared user interface. Just plugging-in to the Eclipse workbench is a start toward a common look-and-feel, as well as a set of components on which to build UIs. Beyond that, the TPTP project provides a set of viewers and editors specifically geared toward test and performance analysis.
“The primary goal of TPTP is to provide a common base upon which test and performance tools will be built” But viewers aren’t useful unless they have something to view. The heart of the TPTP user experience is a common set of data models. TPTP defines four EMFbased data models: • Test model. • Statistical model. • Logging model. • Trace model. By feeding data into these models, tools open up possibilities for interaction with analysis engines and viewers that are designed around these models. The data models are supported by a common framework for accessing data collection and test-execution agents. TPTP provides an extensible set of classes to discover and manage these agents and the data they provide. If an agent produces data in one of the formats TPTP recognizes, a standard data loader reads the data and populates the data model. Otherwise, the agent developer can provide a custom data loader to achieve the same end. Finally, TPTP includes a flexible framework to connect agents with the workbench. http://www.ddj.com
(continued from page 48) In the rest of this article, I explore the agent framework, explaining the concepts it is based on and exploring how to develop agents to plug-in to the TPTP project. Communications Everyone who has developed a software tool with distributed execution has had to address the problem of how to connect the remote components with the local components. Companies with multiple tools or tools with long histories have probably solved this problem several times and are quite possibly considering doing it again. No one ever seems to be happy with their remoting solution. But the absurdity is that this communications piece generally has nothing to do with the primary functionality that the tool is intended to provide. It’s just something that has to be done along the way. It makes sense to adopt an open solution. General solutions such as the Adaptive Communication Environment (ACE; http:// www.cs.wustl.edu/~schmidt/ACE.html) tend to be heavier than you might like. The TPTP communication framework is specifically designed to support testing and the performance analysis engine, and UI Data Analysis Data Model
provides a streamlined connection to the rest of TPTP. Figure 2 shows the basic architecture of the TPTP communications framework. TPTP provides a client library that manages communications with an Agent Controller that can be running locally or on a remote system. This Agent Controller manages access to and control of agents running on the target system. In addition, TPTP provides template code for agent development to handle boiler-plate tasks such as registering with the Agent Controller and parsing incoming commands. The actual communications between components are managed by dynamically loaded transport layers. TPTP provides transport layers to handle socket-based communications, and SSL and HTTP solutions are under development. Additional mechanisms can be supported through custom libraries written to the TPTP transport layer interface. The TPTP architecture is designed to isolate these layers from the rest of the system so that clients and agents can be written independent of the transport layer they use to communicate and transfer data. The Agent Controller uses a standard but extensible protocol for sending asynchronous commands back and forth between agents and clients. TPTP defines this XML-based protocol with a command envelope and basic set of interfaces and commands, but agent developers can extend the protocol by defining their own interfaces and commands. Agents and Interfaces In the context of TPTP, an “agent” is defined as a logical object that exposes services through the TPTP Agent Controller. In TPTP, these agents run in a separate process from the Agent Controller (though multiple agents can exist within a single process).
Data Loader Data Schema Agent
Figure 1: TPTP platform.
Command Extractor
Payload Normalizer
TPTPClient Library
Socket Transport Layer
Eclipse
Key = You write this part. = TPTP provides this part. = Provided by TPTP but replaceable.
Templates
Named Pipe Transport Layer
Payload Normalizer
Client Application
Command Extractor
Agent Controller
Agents
Figure 2: Basic architecture of the TPTP communications framework. 50
Dr. Dobb’s Journal, October 2005
The Agent Controller recognizes external components through the connections they establish. When something establishes a connection with the Agent Controller through a transport layer, the Agent Controller assigns it an ID and begins seeing it as a logical object. If that object then sends the Agent Controller a registerAgent command, it is an agent. Data collectors are an important type of agent, but the agent concept is meant to be general within TPTP. Other agents can provide services such as file transfer or system information. Clients can request access to a specific agent by name, but to support the goal of general interoperability, clients can also locate agents by querying based on the command interfaces they support. Interfaces and commands are the basic building blocks of the protocol TPTP uses for communications between components. Again, the problem of common terms comes in. TPTP understands an interface as a contract regarding a set of commands. Although there is no programmatic enforcement (such as compilation or linking errors), TPTP relies on the assumption that if a component says it supports an interface, then it handles all the commands in that interface. These interfaces are simply an agreement between components. Commands in TPTP are based on XML fragments. TPTP defines a command element with attributes such as source, destination, and context that it uses for routing these commands, but a subelement within the command fragment defines command-specific information that is only read by the component receiving the command. TPTP defines a standard set of interfaces for general concepts such as agent, collector, and event provider, but the protocol is designed to be extended using custom interfaces. Agent Development How can you put TPTP into your application? Listing One is an elementary piece of client code that locates an agent and sends it a command. Listing Two is code for the agent invoked in Listing One. If it seems like these two samples don’t contain any meaningful code, that’s by design. Remember that the goal of TPTP is to provide all of the infrastructure code, so that tool developers can focus on adding their particular functionality. Because this sample doesn’t have any particular purpose other than demonstrating how to use TPTP to connect a client to an agent, all of the work is done in the libraries provided by TPTP. The client code interacts with an Agent Controller to obtain an instance of the agent it wants to use. It uses the TPTP http://www.ddj.com
client library to connect to an Agent Controller that may be running on the same machine as the client or on a remote machine. From this point on, the client will not make any calls that are dependent on the location of the target system. That will be abstracted by TPTP. Next the client creates an object to act as a delegate for the agent it wants to control and passes that object to the local delegate of the Agent Controller. Behind the scenes, the TPTP client library interacts with the real Agent Controller to request that it provide an instance of this agent. The Agent Controller checks to see if there is already an instance of this agent running and available. If not, it launches the agent and manages its subsequent lifecycle. In either case, the client is put in contact with a running agent. (TPTP also provides ways to handle special cases such as when an agent must be launched in conjunction with the application it is monitoring.) Alternatively, if the client did not have a specific agent in mind, it could have queried the Agent Controller for a list of agents that supported a given interface, then obtained further metadata describing each of these agents before deciding which agent to request access to. The agent itself runs as a standalone process. The main routine for the process
simply creates an instance of the agent object, which derives from the BaseCollectorImpl class provided by TPTP, registers the agent, and waits for the agent to be terminated. The registerAgent method manages all of the common work of locating the Agent Controller, establishing communications, starting a thread to listen for incoming messages, and sending the registration command. The base class implementation further handles the work of parsing incoming commands and calling the agent’s implementation of these commands. For standard interfaces, such as the collector and agent interfaces, the agent itself only needs to provide an implementation for those commands for which it wants to override the default implementation. If the agent wants to provide a custom interface, it also needs to override the processCommand method and interpret the command block for its custom commands. In this example, I have presented both the client and the agent implemented in C++. Where does Eclipse comes into the picture? For the agent itself, Eclipse really stays out of the picture. The agent always communicates through the Agent Controller and never needs to be specifically aware that it is talking to an Eclipsebased client on the other end of the line.
Listing One #include "tptp/client/INode.h" #include "tptp/client/NodeFactory.h" #include "tptp/client/Agent.h" #include "tptp/client/Collector.h" #include "tptp/client/IDataProcessor.h" using namespace TPTP::Client; class MyDataProcessor: public IDataProcessor { public: MyDataProcessor() {} ~MyDataProcessor() {} // Handle the data coming virtual void incomingData(char buffer[], int length, DIME_HEADER_PTR_T dimeHeader) { // TODO: Do something with the data } virtual void invalidDataType(char data[], int length) {} virtual void waitingForData() {} }; int main(int argc, char* argv[]) { // Create a Node that represents the Target Machine INode* SampleNode = NodeFactory::createNode("localhost"); // Get the Agent Controller on the Node AgentController* agentCtrlr = SampleNode->connect(10002); // Request an instance of our agent Collector* myCollector = new Collector("com.ddj.tptp.sample.myCollector"); agentCtrlr->getAgent(myCollector, TPTP_CONTROLLER_ACCESS); // Establish Data Path and Data Listener MyDataProcessor* dataProcessor = new MyDataProcessor(); int dataConnectionID = myCollector->addDataListener(dataProcessor); // Starts the collector myCollector->run(); Sleep( 5000 ); // Stop the collector myCollector->stop(); return 0; }
Listing Two #include "tptp/agents/BaseCollectorImpl.h" class MyCollector : public BaseCollectorImpl { public: MyCollector() {} ~MyCollector() {} virtual int run(CmdBlock* cmd) {
http://www.ddj.com
TPTP provides libraries to enable agent development in C, C++, and Java. On the client side, TPTP provides a C++ version of the client library that isn’t integrated with the Eclipse workbench to provide a low entry point for tools that are currently implemented in C++ (though it is our hope that such tools will eventually proceed to integration with the Eclipse workbench and the rest of the TPTP client environment). TPTP also offers a Javabased client library that is fully integrated with the Eclipse workbench and the TPTP client framework. The C++ and Java client libraries are very similar, and all of the concepts I’ve described here apply to both. Conclusion If, after you’ve evaluated TPTP, you think it’s a good solution but doesn’t quite do what you need, get in touch with us (http://www.eclipse.org/tptp/). We’ve designed TPTP to be flexible and extensible, but there’s always room for improvement. TPTP is an open-source platform, and the TPTP project team is serious about listening to input, accepting new people into the conversation, and keeping our tools useful to the testing and performance tools community. DDJ
char data[] = "<mySampleData> \ <mood>happy \ "; // TODO: Do whatever it is that starts data collection // Send the replay command int destID = cmd->getDestID(); int sourceID = cmd->getSourceID(); int contextID = cmd->getContextID(); char commandFormat[] = ""; char command[1024]; sprintf( command, commandFormat, destID, sourceID, contextID ); sendCommand( command ); // Simulated activity for sample purposes only Sleep( 500 ); sendData(sourceID, data, sizeof(data) ); return 0; } virtual int stop(CmdBlock* cmd) { // TODO: Do whatever it is that starts data collection // Send the replay command int destID = cmd->getDestID(); int sourceID = cmd->getSourceID(); int contextID = cmd->getContextID(); char commandFormat[] = ""; char command[1024]; sprintf( command, commandFormat, destID, sourceID, contextID ); sendCommand( command ); return 0; } // This sample doesn't expect to receive data virtual int receiveData(int sourceID, char buffer[], int bytesRead, DIME_HEADER_PTR_T dimeHeader) { return 0; } }; int main(int argc, char* argv[]) { MyCollector* collector = new MyCollector(); collector->registerAgent("com.ddj.tptp.sample.myCollector", "MyCollector"); collector->waitForTermination(); return 0; }
DDJ Dr. Dobb’s Journal, October 2005
51
The Mac’s Move to Intel Migrating applications to the new hardware platform TOM THOMPSON
A
pple Computer’s CEO Steve Jobs clearly dropped a bombshell when he told software developers at Apple’s World Wide Developer’s Conference (WWDC) that the Macintosh computer platform was going to switch from PowerPC to Intel x86 processors. He added, of course, that there would be time to adapt and manage the change, because Macs with Intel processors would be phased in over a two-year period. Mac old-timers will recall that Apple accomplished a similar major processor transition in 1994. Back then, the switch was from Motorola’s 68K processors to Motorola/IBM’s PowerPC. For software developers, the transition’s difficulty depended upon whether their programs used high-level code exclusively, or accessed lower-level system services. In the latter case, it required that developers make fundamental changes in how they wrote their code; some of these changes I helped document in the book The Power Macintosh Programmer’s Guide (Hayden, 1994). As a consequence, Apple has ample prior experiTom is a technical writer providing support for J2ME wireless technologies at KPI Consulting. He can be contacted at [email protected]. 52
ence to guide its migration to the x86 processor. Pundits and others have already debated the wisdom and reasons for making the processor switch. The decision has been made; it’s not worth repeating those arguments here. Seasoned Mac programmers are determining how painful and how expensive the transition is going to be. In this article, I examine the migration plan, and describe pitfalls you should be aware of. Infrastructure Support Migrating an application to the new hardware platform requires that there be a certain amount of infrastructure in place to support its development and execution. There are several key technical pillars that comprise this infrastructure. There must be: • A native version of the operating system available to provide the needed system services. • A mechanism to package an application’s code for distribution and execution on disparate processors during the transition period. This scheme must be transparent to less tech-savvy users, or else you frustrate them when the application won’t run because it’s on the wrong platform. • Tools that translate the application’s source code into the platform’s native code. At the WWDC announcement, Jobs revealed that, since its introduction in 2001, every PowerPC release of Mac OS X has had an x86-based Dopplegänger— a separate Intel version of the OS was quietly developed and maintained (see the accompanying text box entitled “A Portable Operating System,” for how Mac OS X’s design allowed this). Mac OS X 10.4, (aka Tiger), which was released this April for the Dr. Dobb’s Journal, October 2005
PowerPC Macs, is slated to be the preliminary x86-based OS release. In short, the first infrastructure pillar is therefore already in place and has been tested for years. The core of the Mac x86 platform’s distribution mechanism is the universal binary file for Mac OS X applications. A universal binary carries two versions of the
“Apple has laid a solid foundation for making the migration possible” application — a version implemented as PowerPC machine code, and a version implemented in x86 machine code that is stored in a single, larger executable file. However, the application’s GUI elements— TIFF images of buttons or other visual controls— can be shared between the two versions. Sharing these GUI elements, known as “resources,” helps keep the universal binary application’s size manageable. The universal binary scheme is an extension of Mac OS X’s Mach-O executable files (see http://developer.apple.com/ documentation/DeveloperTools/Conceptual/ MachORuntime/FileStructure/chapter_2 .1_section_7.html#//apple_ref/doc/uid/ TP40000895-//apple_ref/doc/uid/ 20001298-154889-CJBCFJGH). Universal binary files consist of a simple archive that contains multiple Mach-O files; see Figure 1. Each Mach-O file contains contiguous http://www.ddj.com
(continued from page 52) bytes of an application’s code and data for one processor type. Special file headers enable the platform’s runtime to quickly locate the appropriate Mach-O image in the file. Listing One shows how this is done. The “fat” header identifies the file as a universal binary and specifies the number of “fat” architecture structures that follow it. Immediately past the fat header, each fat architecture data structure references the code for a different processor type in the file. These architecture structures supply the runtime with the CPU type, an offset to the start of the embedded Mach-O file within this file, and its size. When a universal binary application is launched, the operating system uses the headers to automatically locate, load, and execute the Mach-O file of the application that matches the Mac’s processor type. Universal binaries thus form the second support pillar, the distribution mechanism. Typical users are unaware of the dual sets of code packaged in the file, and they can copy the application by simply dragging and dropping it. When they launch the application, the Mac OS X runtime performs a sleight-of-hand that loads the appropriate application code from the universal binary file, then executes it. No matter what Mac it’s installed on, a universal binary application thus executes at native speeds on either PowerPC- or x86based systems.
Mac old-timers will recognize the universal binary scheme as an echo of the “fat” binary file format that handled software distribution during the Mac platform’s transition to the PowerPC. Fat binaries consisted of two versions of the application —68K and PowerPC— and the Mac OS determined which version to load and run. The fat binary distribution scheme worked very well, and based on its success, I have high expectations that the universal binary scheme will work, too. The third pillar, the development tools necessary to generate x86 code for the x86 Mac platform, is represented by Apple’s Xcode 2.1 development tool set. Xcode consists of an IDE with GUI-based tools such as a source-code editor and debugger. It uses GCC 4 compilers to generate x86 machine code from C, C++, Objective-C, and Objective-C++ source code. Source-level debugging is possible through the use of the standard GDB tool. For x86 development, you’ll need to install Xcode 2.1, along with the 10.4 Universal SDK. This SDK contains the APIs and header files that enable you to generate PowerPC code, x86 code, and universal binaries. Generating a universal binary becomes just a matter of selecting both PPC and Intel processors for code generation in Xcode’s controls, and building the program. Metrowerks, whose CodeWarrior toolset helped Apple get through the 68K/PowerPC transition, will not be participating in this transition. The company has sold its x86 compiler and linker to a third party, and thus, the CodeWarrior toolset can’t generate universal binaries. For a limited time, Apple offered a Developer Transition Kit that contained the Xcode tools and universal SDK. For actual testing on the target platform, the kit also had a preliminary x86 hardware platform with a 3.6-GHz Pentium 4 processor, running a preview release of Mac OS X 4.1 for Intel.
Data
Header Load Commands
Data
Identifies file as universal binary. Contains offsets to Mach-O files. Mach-O file for CPU architecture 1. Mach-O file for CPU architecture 2.
Figure 1: Universal binary files. 54
Code Casualties Apple has laid a solid foundation for making the migration possible. However, any programmer who’s done a code port regards this plan with a healthy skepticism, because the infrastructure is still preliminary in some areas. More important, not all applications will be easy to port, and some applications will be left behind, due to design issues and costs. Let’s see if we can’t draw up a triage list of which applications are most likely to survive the transition. First and foremost, any application ported to the x86 Mac platform must be a Mac OS X application. Fortunately, Mac OS X provides a wealth of different APIs for writing and migrating applications — there’s Carbon, Cocoa, Java, and BSD Dr. Dobb’s Journal, October 2005
UNIX. Table 1 provides a brief summary of the APIs that Mac OS X offers. To start the triage list, it should be obvious that if you’re writing a kernel extension, driver, or other low-level system service that requires intimate knowledge of the kernel plumbing or processor architecture, you’ve got a code rewrite ahead of you, no matter what API you use. Mac OS 8/9 applications won’t survive the transition unless they’re ported to the Carbon API. Furthermore, existing Carbon apps that use the PowerPC-based Preferred Executable Format (PEF) will have to be rebuilt with Xcode 2.1 for conversion to the Mach-O executable format. The reason is that Mac OS X uses the dyld runtime, which is the native execution environment for both PowerPC and Intel Mac platforms. The dyld runtime uses the Mach-O format for its executable files, and as we’ve already learned, universal binaries rely on the Mach-O format to package PowerPC and x86 binary images. Applications that use common system services should port easily. However, caveats abound. For example, how the two processors store data in memory can cause all sorts of problems even for simple applications. Architecture’s Impact High-level application frameworks hide the gritty hardware details from developers to improve code portability and stability. When the platform’s processor changes, fundamental differences in hardware behavior can ripple up through these frameworks and hurt you. Let’s take a look at two of these differences and see how they affect porting a PowerPC application to the x86 platform. One such problem is known as the “Endian issue” and occurs because of how the PowerPC and Intel processors arrange the byte order of data in memory. The PowerPC processor is Big endian in that it stores a data value’s MSB in the lowest byte address, while the Intel is Little endian because it places the value’s LSB in the lowest byte address. Normally, the Endian issue doesn’t rear its ugly head unless you access multibyte variables through overlapping unions, use a constant as an offset into data structure, or use bitfields larger than a byte. In these situations, the position of bytes within the variable matter. Although the program’s code executes flawlessly, the retrieved data is garbled due to where the processor placed the bytes in memory, and spurious results occur. To fix this problem, reference data in structures using field names and not offsets, and be prepared to swap bytes if necessary. The Endian issue manifests itself another way when an application accesses http://www.ddj.com
data piecewise as bytes and then reassembles it into larger data variables or structures. This occurs when an application reads data from a file or through a network. The bad news is that any application that performs disk I/O on binary data (such as 16-bit audio stream), or uses network communications (such as e-mail or web browser), can be plagued by this problem. The good news is that each Mac OS X API provides a slew of methods that can perform the byte swapping for you. Consult the Universal Binary Programming Guidelines (http://developer.apple .com/documentation/MacOSX/Conceptual/ universal_binary/universal_binary.pdf) from Apple for details. Another potential trap manifested by the Endian issue is if your Mac application uses custom resources. Mac OS X understands the structure of its own resources and will automatically perform any byte-swapping if required. However, for a custom resource whose contents are unknown to the OS, you will have to write a byte-swapping routine for it. Those applications that use CodeWarrior’s PowerPlant framework require a byte-swapping routine to swap the custom PPob resources that this framework uses. Ap-
pendix E in the Universal Binary Programming Guidelines document has some example code that shows how to swap PPob resources, and this code serves as a guideline on how to write other byteswapping routines. It’s In the Vector Another major processor architectural issue is for those applications that make heavy use of the PowerPC’s AltiVec instructions for scientific computing and video editing — some of the Mac’s bread-andbutter applications. (AltiVec is a floatingpoint and integer SIMD instruction set referred to as “AltiVec” by Motorola, “VMX” by IBM, and “Velocity Engine” by Apple; see http://developer.apple.com/hardware/ ve/.) AltiVec consists of more than 160 special-purpose instructions that operate on vectors held in 32 128-bit data registers. A 128-bit vector may be composed of multiple elements, such as four 32-bit integers, eight 16-bit integers, or four 32bit floats. The AltiVec instructions can perform a variety of Single Instruction Multiple Data (SIMD) arithmetic operations (multiply, multiply-add, and others) on these elements in parallel, yielding highthroughput data processing.
A Portable Operating System
—T.T.
http://www.ddj.com
Application Services
BSD UNIX
Carbon
Cocoa
Java
S
ome developers weren’t surprised that Mac OS X was operating on the x86 platform for two reasons. First, Mac OS X began life as NeXTSTEP in 1989, complete with an extensive suite of object-oriented classes. These classes ultimately became Cocoa and the other intermediate-level frameworks in Mac OS X. In 1993, NeXTSTEP 3.1 was ported to several platforms (Intel, Sparc, HP), where the code achieved a large measure of hardware independence. It has also been extensively field tested, so its classes are known to be stable. Second, as Figure 2 shows, Mac OS X is a layered OS. The lowest layer, Darwin, consists of an open-source version of the Mach 3.0 kernel, device drivers, and a bevy of Berkeley System Distribution (BSD) interfaces, libraries, and tools. An Intel-based version of Darwin has been maintained over the years, so the lowest layer of Mac OS X that sits next to the hardware was Intel-ready should the need arise. The higher layers of the OS consist of frameworks written in Objective-C and other high-level languages, so porting them was just a matter of recompiling the frameworks and tuning the code.
Graphics & Multimedia
Core Foundation Darwin Hardware
Mach kernel, drivers, BSD UNIX layer. Framework for low-level services. Framework for HTML rendering, font management, and other app services. Framework for 2D & 3D graphics and video display. APIs.
Figure 2: Mac OS X is a layered OS.
Dr. Dobb’s Journal, October 2005
55
API
Description
Preferred Programming Language
Pros
Cons
Carbon
Interfaces used for Mac OS 8/9 applications. OO interfaces/classes for Mac OS X
C, with C++ supported
Familiar APIs, allows reuse of Mac application code or program code written on other procedural systems
Not every API call available, older Carbon apps must be converted to Mach-O format
Cocoa
Traces its roots to the NeXTSTEP framework
Objective-C, Objective-C++, although C and C++ can be integrated into code
NeXTSTEP framework fieldtested for over a decade offers a rich set of services
You're starting from scratch adapting code to this API
Java
Uses J2SE 1.4.2
Java
Can reuse Java programs written on other platforms with little modification
Won't have full access to all of the OS features as do the other APIs, performance may be an issue
BSD UNIX
Based on the System V IA-32 Application Binary Interface (ABI)
C
Good for writing CLI tools or low-level services such as daemons or root processes
Mac version of IA-32 ABI still in preliminary form, and there are some minor differences in it from the Intel version
Table 1: Summary of Mac OS X APIs. Applications relying on AltiVec instructions must be rewritten to use Intel’s SIMD instructions, either its Multimedia extensions (MMX) instructions or its Streaming SIMD Extensions (SSE) instructions. There are several flavors of SSE instructions (SSE, SSE2, and SSE3) and they work on eight 128-bit data registers. The ideal solution for this problem is to use Cocoa’s Accelerate Framework. It implements vector operations transparent of the underlying hardware. An application that uses the Accelerate Framework can operate without modification on both Mac platforms. This framework provides a ready-made set of optimized algorithms for image processing, along with DSP algorithms for video processing, medical imaging, and other data-manipulation functions. If you must port your AltiVec code to the x86 SSE instructions, on the plus side, Intel provides a high-level C interface that simplifies the use of these instructions. Another major plus is that you’ve already “vectorized” your high-level code for use with AltiVec, and these modifications apply to using SSE instructions as well. That is, you should have unrolled code loops and modified the data structures they reference to take advantage of the SIMD instruction’s parallel processing capabilities. The big negative to porting to SSE is that the rest of your code will need to be heavily revised due to the differences between the AltiVec and SSE instructions. For example, there’s no direct correspondence
in behavior between the AltiVec and x86 permute instructions. The magnitude of the shift performed by the AltiVec permute operation can be changed at runtime, while the x86 permute requires the magnitude be set at compile time. This makes it difficult for the x86 permute to clean up misaligned data accesses, especially for use with the SSE instructions themselves. In general, AltiVec instructions that execute on the vector complex integer unit (such as the multiply-accumulate operations) have no direct counterparts in the SSE instruction set, and these portions of the vector code will need the most work. Returning to the triage list, even applications written in Cocoa and Carbon aren’t immune to certain processor issues. Applications that do any file or network I/O will have to be examined and modified, due to the Endian issue. Even mundane applications that make use of special data structure will need to be checked carefully. Those applications that make use of AltiVec will have to be completely rewritten, either to the Accelerate Framework or to SSE3 instructions. Whether they survive the transition depends on how much it will cost to correct these issues. A Real-World Example How well will this transition go? Some early developer reports pegged the initial porting process at taking anywhere from 15 minutes to 24 hours, depending upon how well Apple’s guidelines were followed
Listing One #define FAT_MAGIC 0xcafebabe #define FAT_CIGAM 0xbebafeca /* NXSwapLong(FAT_MAGIC) */ struct fat_header { uint32_t magic; /* FAT_MAGIC */ uint32_t nfat_arch; /* number of structs that follow */ };
56
when the application was written. Those developers whose applications were written in Cocoa usually experienced the least difficulty, which shouldn’t come as a surprise because Cocoa was engineered from the ground up as an application framework for NeXTSTEP, which became Mac OS X. Bare Bones Software’s port of BBEdit, its industrial-strength programming editor, offers an interesting glimpse of the process (http://www.barebones.com/products/ bbedit/). Portions of BBEdit were written in C++ and use the Carbon API, while others portions were written in Objective-C and use the Cocoa API. It only took 24 hours to get BBEdit running on the Mac x86 platform. It helped that the files BBEdit works with —ASCII text files that consist of byte data— were Endian neutral. However, BBEdit’s developers emphasize that although they got the program running quickly, getting every feature to work reliably took another several weeks of work, especially for testing to ensure that the features worked reliably in the new environment. Still, considering that the application was executing on a completely different platform in a short amount of time and without requiring a total code rewrite, this augers well for many Mac OS X applications making the transition. In the end, time and developers will show us how well Apple managed the transition to the Intel x86 processor. DDJ
struct fat_arch { cpu_type_t cputype; /* cpu specifier (int) */ cpu_subtype_t cpusubtype; /* machine specifier (int) */ uint32_t offset; /* file offset to this object file */ uint32_t size; /* size of this object file */ uint32_t align; /* alignment as a power of 2 */ };
Dr. Dobb’s Journal, October 2005
DDJ http://www.ddj.com
WINDOWS/.NET DEVELOPER
Calling C Library DLLs from C# Utilizing legacy software SHAH DATARDINA
T
he .NET Framework was designed to be the “lingua franca” for Windows development, with the expectation that it will set a new standard for building integrated software for Windows. However, it is inevitable that there is a time lag before .NET is fully adopted and existing applications are recoded. In particular, there is a large body of legacy code that will likely never be rewritten in .NET. To address this situation, Microsoft provides attributes, assembly, and marshaling. At the Numerical Algorithms Group (where I work), our particular interest in using these techniques is to utilize numerical software developed in C from within the .NET environment. Because C# is the premier .NET language, the examples I present here are in C#. While I use an example of data types that are current in the NAG C Library, the techniques I present are general enough for calling unmanaged code written in C from C# directly. The NAG C Library uses the following data types as parameters: • Scalars of type double, int, and Complex. These are passed either by value
Shah is a senior technical consultant for the Numerical Algorithms Group. He can be contacted at [email protected]. 58
or by reference (as pointers to the particular type). • enum types. • Arrays of type double, int, and Complex. • A large number of structures, generally passed as pointers. • A few instances of arrays that are allocated within NAG routines and have to be freed by users (these have type double**). • Function parameters (also know as “callbacks”). These are pointers to functions with particular signatures. For convenience, I include a C DLL containing definitions of all the functions being called from C#. This DLL is available electronically from DDJ (see “Resource Center,” page 4) and NAG (http://www.nag.com/ public/ddj.asp). For instance, take the example of a C function that takes two parameters of the type double and double*; that is, the first parameter is input (pass by value) and the second is output (pass by reference in nonC parlance). The corresponding C# signature for the C function is then double, ref double. Listing One presents the definition of the C function and its call from C#. In C#, you have to provide the DLL import attribute (line 5), specifying how the C signature maps to C#. Also the qualifier ref has to be used twice, in the declaration of the C function and in its call. Finally, note the use of the assembly directive, System.Runtime.InteropServices (line 3), which is important because it is the classes defined within the InteropServices that provide the mapping between managed code and unmanaged code. Arrays are the bedrock of numerical programming. By definition arrays are passed by reference in both C and C#. A Dr. Dobb’s Journal, October 2005
C function having a one-dimensional array as a parameter with the prototype: void OneDArray(double AnArray[]);
has the C# signature given by: public static extern void OneDArray(double [] AnArray);
“Arrays are the bedrock of numerical programming”
With this proviso, the call is straightforward; see Listing Two. Two-dimensional arrays are more interesting. In C, two-dimensional arrays are of the type pointer to pointer. For example, a two-dimensional array of doubles would have the prototype double **. However, these are rarely used in practice as they imply noncontiguous storage, whereas numerics are best carried out using contiguous storage. Numerical C code using “notional” two-dimensional arrays frequently store data in row-major order. For example, to read a two-dimensional array of size m*n (m rows and n columns) into contiguous storage, you might have the following: http://www.ddj.com
for (i=0; i<m; i++) for (j=0; j
where tda is the second dimension of a, in this case tda=n. In Fortran the equivalent construction would be: DO 10 I=1,M DO 10 J=1,N READ(5,*) A(I,J) 10 CONTINUE
Here the A(I,J) construction is equivalent to (using C notation): a[(J-1)*lda + I-1]
where lda refers to the first dimension of a. This implies column major storage starting with indices starting at 1 rather than 0. The point to note is that the A(I,J) notation in Fortran squirrels away the complexity of array element access. If we had such a notation in C, it might have been A[i,j]. This is precisely what you have in C#. Hence, the notional two-dimensional array in C, double fred[ ], is represented in C# as double [,] fred. Listing Three shows an example of the use of a C function using a two-dimensional array from C#. It is worth noting how well the “notional” two-dimensional C array dovetails with the C# double type. The C# array is a proper class, with members available to provide us with information on the dimensions of the array; hence, the C# member function callTwoDarray needs to have just the one parameter. Struct is a major type in C, but in C# (being a value type), it is but a poor cousin to the class type. However, it can be mapped to a struct type in C#. The simplest and the most ubiquitous structure in numerics is the complex type, being defined as a pair of reals. In Listing Four, the struct type is defined in C and its equivalent in C#. In particular, you have to tell the C# compiler that the structure members are laid out sequentially in memory by the use of the attribute, [StructLayout(LayoutKind.Sequential)]. Given this information, you can treat the complex type as a regular object, passing it by value, reference, or as an array. Listing Four shows how you can access information from a C function of the type Complex, which has three parameters, inputVar passed by value, OutputVar passed by reference, and an array of the Complex type. There is one further point to note here: As structures are of the type value in C#, you have to tell the compiler whether arrays are read or write. You do this by providing the attribute '[In, Out]' to the array parameter. Structures can get very complicated indeed. Structure members can be scalars, arrays, pointers to other structures, and http://www.ddj.com
the like. Pointers being taboo (or at least very undesirable ) in C# can be specified in C# as the IntPtr type. Listing Five presents a C and C# example showing the use of a Marshal class method to print the elements of an array that has been allocated internally within a C function. Memory has to be freed explicitly in this case within the unmanaged code. Function parameters, also known as “callbacks,” play a central role in numerical software. These are required in cases where code to carry out some specific task has to be supplied to a function. This can occur, for example, in optimization software where the value of the objective or the first derivatives have to be computed on a problem-specific basis. The difficulty with callbacks is mainly that
Dr. Dobb’s Journal, October 2005
they imply a reversal of the situation I’ve been looking at so far; that is, managed code (in C#) calling unmanaged code. Instead, the callback function calls the managed code being called by C. C# provides the delegate type to cater for this situation. An appropriate signature of the callback function is provided and an instance of this type created using a construction such as: NAG_E04CCC_FUN objfun = new NAG_E04CCC_FUN (funct);
Listing Six presents an example of a simple callback. This is a simple mechanism when the callback has simple types, but it gets more interesting when we have parameters of the callback that are arrays and structures and that may have to carry
59
information back to the unmanaged code. In this case, you have to use both marshaling techniques and attributes to the structure. This is illustrated in Listing Seven where I show how to handle arrays and structures within callbacks. The delegate in this case has an array parameter. If you use the following signature (which appears to be quite a reasonable signature in the first instance), we find that when the delegate is called from C, the array appears to be of length 1. This presumably has to do with the fact that C pointers do not carry any information on the length of the array. The trick is to spec-
ify the array pointer as C# type IntPtr and subsequently copy data to and from the IntPtr parameter. There are two more data types that occur in C worth mentioning. Enum types are an integral type that are mapped one to one between C and C#. Listing Eight illustrates how enum values may be passed from C# to C. The final type to consider is the string type in C#, mapping to the char* type in C. When a string type is defined in C# and passed to C, the interop services provide the conversion to ASCII by default. I use the StringBuilder type be-
cause this is a reference type and can grow and shrink as required. Listing Eight illustrates a C function modifying a string. For users of the NAG C Library, we provide an Assembly of structures, functions, and delegate signatures within a Nag namespace (see “Wrapping C with C++ in .NET,” by George Levy, C/C++ Users Journal, January 2004). We also provide examples of using this assembly from C# for some widely used NAG routines. DDJ
Listing One
Listing Three
/*************************************************************** * C Function Scalars * ***************************************************************/ #define NAG_CALL __stdcall #define NAG_DLL_EXPIMP __declspec(dllexport)
/*************************************************************** * C Function TwoDArray * ***************************************************************/
NAG_DLL_EXPIMP void NAG_CALL Scalars(double, double*); NAG_DLL_EXPIMP void NAG_CALL Scalars(double in, double *out) { *out = in; } /*************************************************************** * C# Class * ***************************************************************/ using System; using System.Runtime.InteropServices; namespace DDJexamples { public class ExerciseScalars { [DllImport("cmarshaldll")] public static extern void Scalars(double invar, ref double outvar); public static void CallScalars(double invar, ref double outvar) { Scalars(invar, ref outvar); } public static void Main() { double invar = 5.0; double outvar = 0.0; CallScalars(invar, ref outvar); Console.WriteLine("invar = {0}, outvar = {1}", invar, outvar); } } }
Listing Two /*************************************************************** * C Function OneDArray * ***************************************************************/ #define NAG_CALL __stdcall #define NAG_DLL_EXPIMP __declspec(dllexport) NAG_DLL_EXPIMP void NAG_CALL OneDArray(int n, double []); NAG_DLL_EXPIMP void NAG_CALL OneDArray(int n, double anArray[]) { int i; for (i=0; i
#define NAG_CALL __stdcall #define NAG_DLL_EXPIMP __declspec(dllexport) NAG_DLL_EXPIMP void NAG_CALL TwoDArray(int m, int n, double [], int tda); #define A(I,J) a2dArray[I*tda+J] NAG_DLL_EXPIMP void NAG_CALL TwoDArray(int m,int n,double a2dArray[],int tda) { int i, j, k = 0; tda = n; for (i=0; i<m; i++) for (j=0; j
Listing Four /*************************************************************** * C Function TryComplex * ***************************************************************/ #define NAG_CALL __stdcall #define NAG_DLL_EXPIMP __declspec(dllexport) typedef struct { /* NAG Complex Data Type */ double re,im; } Complex; NAG_DLL_EXPIMP void NAG_CALL TryComplex(Complex inputVar, Complex *outputVar, int n, Complex array[]); NAG_DLL_EXPIMP void NAG_CALL TryComplex(Complex inputVar, Complex *outputVar, int n, Complex array[]) { outputVar->re = ++inputVar.re; outputVar->im = ++inputVar.im;
public static void Main() { int n=2; double [] anArray = new double [n]; CallOneDArray(anArray); for (int i=0; i
60
array[0].re array[0].im array[1].re array[1].im
= = = =
99.0; 98.0; 97.0; 96.0;
} /*************************************************************** * C# Class * ***************************************************************/
Dr. Dobb’s Journal, October 2005
http://www.ddj.com
using System; using System.Runtime.InteropServices; namespace DDJexamples { // Nag Complex type [StructLayout(LayoutKind.Sequential)] public struct Complex { public double re; public double im; }; public class ExerciseTryComplex { [DllImport("cmarshaldll")] public static extern void TryComplex(Complex inputVar, ref Complex outputVar, int n, [In, Out] Complex [] array); public static void CallTryComplex(Complex inputVar, ref Complex outputVar, Complex [] array) { int n = 2; TryComplex(inputVar, ref outputVar, n, array); } public static void Main() { int n=2; Complex inputVar = new Complex(); Complex outputVar = new Complex(); Complex [] array = new Complex[n]; inputVar.re = 1.0; inputVar.im = 2.0; CallTryComplex(inputVar, ref outputVar, array); Console.WriteLine("outputVar = ({0},{1})", outputVar.re, outputVar.im); Console.WriteLine("Array on output"); for (int i = 0; i<array.GetLength(0); i++) Console.WriteLine("{0} {1}", array[i].re, array[i].im); } } }
Listing Five /*************************************************************** * C Function MarshalStructC * ***************************************************************/ #define NAG_CALL __stdcall #define NAG_DLL_EXPIMP __declspec(dllexport) #include <stdlib.h> typedef struct { int array_length; double *array;
http://www.ddj.com
} marshalStruct; NAG_DLL_EXPIMP void NAG_CALL MarshalStructC(marshalStruct *pointerinStruct); NAG_DLL_EXPIMP void NAG_CALL FreeMarshalStructptr(marshalStruct *pointerinStruct); /**/ NAG_DLL_EXPIMP void NAG_CALL MarshalStructC(marshalStruct *pointerinStruct) { int i; pointerinStruct->array = (double *)malloc((size_t)(sizeof(double)*pointerinStruct->array_length)); for (i = 0; i <pointerinStruct->array_length; i++) { pointerinStruct->array[i] = (double)(i) + 1.0; } } NAG_DLL_EXPIMP void NAG_CALL FreeMarshalStructptr(marshalStruct *pointerinStruct) { free(pointerinStruct->array); pointerinStruct->array = 0; } /*************************************************************** * C# Class * ***************************************************************/ using System; using System.Runtime.InteropServices; namespace DDJexamples { [StructLayout(LayoutKind.Sequential)] public struct marshalStruct { public int array_length; public IntPtr array; }; public class ExerciseMarshalStructC { [DllImport("cmarshaldll")] public static extern void MarshalStructC( ref marshalStruct pointerinStruct); [DllImport("cmarshaldll")] public static extern void FreeMarshalStructptr( ref marshalStruct pointerinStruct); public static void CallMarshalStructC(ref marshalStruct pointerinStruct) { MarshalStructC(ref pointerinStruct); } public static void Main() { marshalStruct pointerinStruct = new marshalStruct() ; pointerinStruct.array_length = 5; CallMarshalStructC(ref pointerinStruct);
(continued on page 62)
Dr. Dobb’s Journal, October 2005
61
(continued from page 61) double [] x = new double[pointerinStruct.array_length]; Marshal.Copy( pointerinStruct.array, x, 0, pointerinStruct.array_length ); for (int i = 0; i < pointerinStruct.array_length; i++) Console.WriteLine("x[{0}] = {1}", i, x[i]); FreeMarshalStructptr(ref pointerinStruct); } } }
Listing Six /*************************************************************** * C Function CallBack * ***************************************************************/ #define NAG_CALL __stdcall #define NAG_DLL_EXPIMP __declspec(dllexport) typedef void (NAG_CALL * NAG_D01_FUN)(double *); NAG_DLL_EXPIMP void NAG_CALL CallBack(NAG_D01_FUN f , double *output); NAG_DLL_EXPIMP void NAG_CALL f(double *x); /* */ NAG_DLL_EXPIMP void NAG_CALL CallBack(NAG_D01_FUN f , double *output) { (*f)(output); } NAG_DLL_EXPIMP void NAG_CALL f(double *x) { *x = 100.0; } /*************************************************************** * C# Class * ***************************************************************/ using System; using System.Runtime.InteropServices; namespace DDJexamples { // delegate public delegate void NAG_D01_FUN (ref double output); public class ExerciseSimpleCallback { [DllImport("cmarshaldll")] public static extern void CallBack(NAG_D01_FUN f , ref double output); public static void CallCallBack(NAG_D01_FUN f, ref double output) { CallBack(f, ref output); } public static void Main() { double output = 0.0; NAG_D01_FUN F = new NAG_D01_FUN (f); CallCallBack(F, ref output); Console.WriteLine("Ouput = {0}", output); } public static void f(ref double output) { output = 100.0; } } }
Listing Seven /*************************************************************** * C Function CallbackWithStruct * ***************************************************************/ #define NAG_CALL __stdcall #define NAG_DLL_EXPIMP __declspec(dllexport) typedef struct { int flag; } Nag_Comm; typedef void (NAG_CALL * NAG_E04UCC_FUN)(int, double *, Nag_Comm *); extern NAG_DLL_EXPIMP void NAG_CALL CallbackWithStruct(NAG_E04UCC_FUN funct, int array_length, double *a, Nag_Comm *user_comm); void NAG_CALL funct(int n, double *x, Nag_Comm *user_comm); /* */ NAG_DLL_EXPIMP void NAG_CALL CallbackWithStruct(NAG_E04UCC_FUN funct , int n, double *a, Nag_Comm *user_comm) { (*funct)(n, a, user_comm); if (user_comm->flag == 99) { a[0] = 99.0; } } void NAG_CALL funct(int n, double *x, Nag_Comm *user_comm) { int i; for (i=0; iflag = 99; } } /*************************************************************** * C# Class * ***************************************************************/ using System;
62
using System.Runtime.InteropServices; namespace DDJexamples { [StructLayout(LayoutKind.Sequential)] public struct Nag_Comm { public int flag; }; // delegate public delegate void NAG_E04UCC_FUN (int array_length, IntPtr a, ref Nag_Comm comm); public class ExerciseCallBackWithStruct { [DllImport("cmarshaldll")] public static extern void CallbackWithStruct(NAG_E04UCC_FUN f , int array_length, double [] a, ref Nag_Comm user_commt); public static void CallCallbackWithStruct(NAG_E04UCC_FUN f, int array_length, double [] a, ref Nag_Comm user_comm) { CallbackWithStruct(f, array_length, a, ref user_comm); } public static void Main() { double [] a = {1.0, 2.0, 3.0, 4.0, 5.0}; int array_length = a.GetLength(0); Nag_Comm user_comm = new Nag_Comm(); NAG_E04UCC_FUN F = new NAG_E04UCC_FUN (funct); CallCallbackWithStruct(F, array_length, a, ref user_comm); Console.WriteLine("user_comm.flag = {0}", user_comm.flag); Console.WriteLine("a[0] altered further as a result of user_comm.flag, a[0] = {0}", a[0]); } public static void funct(int n, IntPtr xptr, ref Nag_Comm user_comm) { double [] x = new double[n]; Marshal.Copy( xptr, x, 0, n ); int i; for (i=0; i
Listing Eight /*************************************************************** * C Function EnumString * ***************************************************************/ #include <stdlib.h> #include <string.h> #include <stdio.h> #define NAG_CALL __stdcall #define NAG_DLL_EXPIMP __declspec(dllexport) typedef enum { red=101, green, blue,black } colour; NAG_DLL_EXPIMP void NAG_CALL EnumString(colour rainbow, char *rainbowcolour); NAG_DLL_EXPIMP void NAG_CALL EnumString(colour rainbow, char *rainbowcolour) { if (rainbow == black ) { strcpy(rainbowcolour, "Black is not a rainbow colour"); } else { strcpy(rainbowcolour, "This is a rainbow colour"); } } /*************************************************************** * C# Class * ***************************************************************/ using System; using System.Runtime.InteropServices; using System.Text; namespace DDJexamples { public enum colour { red=101, green, blue,black }; public class ExerciseEnumString { [DllImport("cmarshaldll")] public static extern void EnumString(colour rainbow, StringBuilder rainbowcolour); public static void CallEnumString(colour rainbow, StringBuilder rainbowcolour) { EnumString(rainbow, rainbowcolour); } public static void Main() { StringBuilder colourstring= new StringBuilder("once upon a time ... "); colour somecolour = colour.black; CallEnumString(somecolour, colourstring); Console.WriteLine("{0}", colourstring); } } }
Dr. Dobb’s Journal, October 2005
DDJ http://www.ddj.com
Removing Memory Errors from 64-Bit Platforms Memory problems can multiply on new platforms RICH NEWMAN
I
t is crucial that 64-bit platforms be stable and reliable as development teams create applications for them. Any memory errors or memory corruptions can cause them to fail. The great challenge with memory errors is that they are elusive problems that are extremely difficult and time consuming to find. Memory errors do not expose themselves during typical means of testing. Because of their detrimental crashing potential, it is imperative that you remove memory problems from code before it goes into production. There are powerful memory- errordetection tools available that identify the cause of threaded memory errors in dual- core applications. Such detection tools let you find and fix elusive, crash- causing memory errors that traditional testing techniques fail to uncover. Memory- error- detection tools help you find and fix C/C++ memory errors prior to release. By fixing these problems before porting, you can improve the quality of applications on new platforms and architectures, streamline the porting process, and make the original application more robust and reliable. Rich is the lead engineer for Insure++ at Parasoft. He can be contacted at [email protected]. http://www.ddj.com
Why Is Porting So Difficult? Most developers responsible for porting C/C++ code to 64-bit processors — or to any new hardware, for that matter — find that memory problems seem to multiply when they reach the new platform or architecture. The fundamental problem with the transition to 64-bit architectures is that assumptions about the size in bits of the various integral and pointer types are no longer true. Coding constructs that are potentially problematic are implicit restrictions from long to int by assignment and explicit casts. The former most likely causes compilers to issue warnings; the latter, on the other hand, will be accepted silently, leading to all sorts of problems that will not surface until runtime. Another issue is that integer constants that are not explicitly sized are assumed to be ints. This is of some concern for mixing signed and unsigned constants. Proper use of the relevant suffixes should alleviate such problems. Other major sources of trouble are the various types of pointer incompatibilities. For example, on most 64-bit architectures, a pointer no longer fits inside an int and code that stores pointer values inside int variables no longer functions properly. These and other problems are typically exposed during porting because porting is essentially a type of mutation testing. When you port code, you essentially create what can be called an “equivalent mutant”— a version of the original code with minor changes that should not affect the outcomes of your test cases. You can expose many strange errors by creating and running these equivalent mutants. In C++, this process of creating and running equivalent mutants can uncover: • Lack of copy constructors or bad copy constructors. Dr. Dobb’s Journal, October 2005
• Missing or incorrect constructors. • Wrong order of initialization of code. • Problems with operations of pointers. • Dependence on undefined behavior, such as order of evaluation. Porting Preparation There are several steps involved in preparing your applications for porting.
“Memory-errordetection tools help you find and fix C/C++ memory errors prior to release” Step 1. Before you start porting, you should rid original code of problems such as memory corruption and memory leaks that will plague you on 64-bit processors. One of the most effective ways to expose the types of pointer and integer problems that causes trouble on 64-bit processors is to leverage mutation testing for runtime error detection. Mutation testing was introduced as an attempt to solve the problem of not being able to measure the accuracy of test suites. In mutation testing, you are in some sense trying to solve this problem by inverting the scenario. The thinking goes like this: Assume that you have a perfect test suite, one that covers all possible cases. Also assume that you have a perfect program that passes this test suite. If you change the 63
code of the program (this process is called “mutating”) and run the mutated program (“mutant”) against the test suite, you will have two possible scenarios: 1. The results of the program were affected by the code change and the test suite detects it. It was assumed that the test suite is perfect, which means that it must detect the change. If this happens, the mutant is called a “killed mutant.” 2. The results of the program are not changed and the test suite does not detect the mutation. The mutant is called an “equivalent mutant.” If you take the ratio of killed mutants to all the mutants that were created, you get a number that is smaller than 1. This number measures how sensitive the program is to the code changes. In reality, neither the perfect program nor the perfect test suite exists, which means that one more scenario can exist. The results of the program are different, but the test suite does not detect it because it does not have the right test case. If you take the ratio of all the killed mutants to all the mutants generated, you get a number smaller than 1 that also contains information about accuracy of the test suite. In practice, there is no way to separate the effect that is related to test-suite in-
accuracy and the effect that is related to equivalent mutants. In the absence of other possibilities, you can accept the ratio of killed mutants to all the mutants as the measure of the test suite accuracy. Example 1 (test1.c) illustrates the ideas just described (this code, and all subsequent code presented here, compiles and runs under Linux). The test1.c program can be compiled with the command: cc -o test1 test1.c.
The program reads its arguments and prints messages accordingly. Now assume that you have this test suite that tests the program: Test Case 1: input 2 4 output Got less than 3 Test Case 2: input 4 4 output Got more than 3 Test Case 3: input 4 6 output Got more than 3 Test Case 4: input 2 6 output Got less than 3 Test Case 5: input 4 output Got more than 3
This test suite is representative of the test suites in the industry. It tests positive
main(argc, argv) int argc; char *argv[]; { int c=0; if(atoi(argv[1]) < printf("Got less if(atoi(argv[2]) c = 2; } else printf("Got more exit(0);
3){ than 3\n"); > 5)
than 3\n");
}
/* /* /* /* /* /* /* /* /* /* /* /* /* /* /*
line line line line line line line line line line line line line line line
Example 1: The test1.c program. main(argc, argv) int argc; char *argv[]; { int c=0; int a, b; a = atoi(argv[1]); b = atoi(argv[2]); if(a < 3){ printf("Got less than 3\n"); if(b > 5) c = 2; } else printf("Got more than 3\n"); exit(0); }
line 9 to the form 5) line 7 to the form 3) line 5 to the form
If you run this modified program against the test suite, for Mutants 1 and 3 the program completely passes the test suite. For Mutant 2, the program fails all test cases. Mutants 1 and 3 do not change the output of the program, and are thus equivalent mutants. The test suite does not detect them. Mutant 2, however, is not an equivalent mutant. Test Cases 1– 4 will detect it through wrong output from the program. Test Case 5 may have different behavior on different machines. It may show up as bad output from the program, but at the same time, it may be visible as a program crash. Switching gears for a moment, if you calculate the statistics, you see that you created three mutants and only one was killed. This tells you that the number that measures the quality of the test suite is 1/3. As you can see, the number 1/3 is low. It is low because you generated two equivalent mutants. This number should serve as a warning that you are not testing enough. In fact, the program has two serious errors that should be detected by the test suite. Returning to Mutant 2, run it against Test Case 5. If the program crashes, then the mutation testing that you performed not only measured the quality of the test suite, but also detected a serious error in the code. This is how mutation testing can find errors. Consider the equivalent mutant (Mutant 4) in Example 2. The difference between Mutant 4 and previous mutants is that Mutant 4 was created in an attempt to make an equivalent mutant. This means that when it was constructed, an effort was made to build a program that should execute exactly the same as the original program. If you run Mutant 4 against the test suite, Test Case 5 will probably fail — the program will crash. As you can see, by creating an equivalent mutant, you actually increased the detection power of the test suite. The conclusion that you can draw here is that you can increase the accuracy of the test suite in two ways: 1. Increase the number of test cases in the test suite.
Example 2: Equivalent mutant. 64
tests, which means it tests if the program reports correct values for the correct inputs. It completely neglects illegal inputs to the program. The test1 program fully passes the test suite; however, it has serious hidden errors. Now, mutate the program. You can start with these simple changes:
Dr. Dobb’s Journal, October 2005
http://www.ddj.com
(continued from page 64) 2. Run equivalent mutants against the test suite. These conclusions are important; the second conclusion is especially important because it reveals that mutants result in more effective tests. In the examples, you created each mutant by manually making a single change to a program. The process of generating mutants is difficult and time consuming, but it is possible to generate equivalent mutants automatically. Example 3 helps illustrate how this is done. This program has no input and only one output. In principle, it only needs one test case: Test Case 1: input none output 12
The interesting thing about this program is that it can give the answer 13 or 12 depending on the behavior of the compiler. Suppose that you were given the task of creating this program and making sure that it runs on two different platforms. If the platforms have compilers that exhibit different behavior, you will discover the difference when running the program, triggering the question, “What is wrong?” This will probably result in the program’s problem being fixed. Suppose that you create the equivalent mutant in Example 4. The result of this program does not depend on the compiler and is, in fact, exactly predictable — it is 13. If you run the mutant against the test suite, you will discover the error. The most amazing thing about mutation testing is that it can discover errors that normally are almost impossible to detect. Frequently, when these errors are uncovered, they manifest themselves as the program is crashing. Often, programmers do not understand that. The equivalent mutant is an opportunity to discover errors, not a headache. Typically, programmers expect equivalent mutants to behave the same as the original program. If this were true all the time, mutation testing would be completely useless.
Step 2. After you clean the most critical errors, use a static-analysis tool to identify code that is likely to cause trouble when it is ported to the new platform/architecture. There are two main tasks to focus on while performing static analysis: 1. Identify and fix code that is likely to result in an error on any platform or architecture. 2. Identify and fix code that might not port well. First, check industry-respected C/C++ coding standards that identify coding constructs, which are likely to lead to problems on any platform or architecture. By ensuring that code complies with these coding standards, you prevent errors. This translates to less debugging on the new platform or architecture and reduces the chance of having bugs that elude testing and make their way into the release. Some coding standards to check include: • Never return a reference to a local object or a dereferenced pointer initialized by “new” within the function. Returning a reference to a local object might cause stack corruption. Returning a dereferenced pointer initialized by “new” within the function might cause a memory leak. • Never convert a const to nonconst. This can undermine the data integrity by allowing values to change that are assumed to be constant. This practice also reduces the readability of the code because you cannot assume const variables to be constant. • If a class has any virtual functions, it shall have a virtual destructor. This standard prevents memory leaks in derived classes. A class that has any virtual functions is intended to be used as a base class, so it should have a virtual destructor to guarantee that the destructor is called when the derived object is referenced through a pointer to the base class.
• Public member functions shall return const handles to member data. • When you provide nonconst handles to member data, you undermine encapsulation by allowing callers to modify member data outside of member functions. • A pointer to a class shall not be converted to a pointer of a second class unless it inherits from the second. This “invalid” down casting can result in wild pointers, data corruption problems, and other errors. • Do not directly access global data from a constructor. The order of initialization of static objects defined in different compilation units is not defined in the C++ language definition. Therefore, accessing global data from a constructor might result in reading from uninitialized objects. (For more rules, see the works of Scott Meyers, Martin Klaus, and Herb Sutter.) After you locate and repair this errorprone code, start looking for code that works fine on your current platform/architecture, but that might not port well. Some rules that are applicable to most 64bit porting projects include: • Use standard types whenever applicable. Consider using size_t rather than int, for example. Use uint64_t if you want a 64-bit unsigned integer. Not only will this practice help identify and prevent current bugs in the code, it will also help with the porting effort in the future when the code is ported to 128-bit processors. • Review all existing uses of long data types in the source code. If the values to be held in such variables, fields, and parameters fit in the range of 2Gig–1 to –2Gig or 4Gig to 0, then it is probably best to use int32_t or uint32_t, respectively. • Examine all instances of narrowing assignment. Avoid such assignments because the assignment of a long value to an int results in truncation of the 64-bit value. • Find narrowing casts. Use narrowing casts on expressions, not operands. int doublew(x) int x; { return x*2; }
int doublew(x) int x; { return x*2; }
int triple( y) int y; { return y*3; }
int triple( y) int y; { return y*3; } main() { int i = 2; printf("Got %d \n", doublew(i++)+ triple(i++)); }
main() { int i = 2; int a, b; a = doublew(i++); b = triple(i++); printf("Got %d \n", a+b); }
Example 3: Automatically generating mutants. 66
Dr. Dobb’s Journal, October 2005
Example 4: A mutant. http://www.ddj.com
• Find casts from long* to int*. In 32-bit environments, these might have been used interchangeably. Examine all instances of incompatible pointer assignments. • Find casts from int* to long*. In 32-bit environments, these might have been used interchangeably. Examine all instances of incompatible pointer assignments. • Find uses of multiplicative expressions not containing a long in either operand. To have integral expressions produce 64-bit results, at least one of the operands must have a data type of long or unsigned long. • Find long values that are initialized with int literals. Avoid such initializations because integral constants might be represented as 32-bit types even when used in expressions with 64-bit types. • Locate int literals in binary operations for which the result is assigned to a long value. 64-bit multiplication is desired if the result is a 64-bit value. • Find int constants used in 64-bit expressions. Use 64-bit values in 64-bit expressions. • Find all pointers cast to int values. Code involving conversions of pointers from or to integral values should be reviewed. • Find and review any in-line assembly This probably will not port well.
if the new code complies with all appropriate coding standards. At this point, every change that should have been made but that was not made is an error. Fix these errors immediately! You don’t want to look for these errors as the application is running. Step 3. Link your application and try to build it. Step 4. At this point, you should try to run your code. If you have a problem getting code running on the 64-bit processor, use a unit-testing framework to run the code function by function; that way, you can flush exactly what in the code is not portable. Test the function, and then fix the problem. Continue this process until the entire application is running. Step 5. Repeat runtime error detection.
Once the application is running, you’ll want to repeat runtime error detection because the porting process is likely to cause some new problems that could not be detected before the port (for example, new memory corruption problems or different behaviors). If the runtime error detection exposes problems, fix every bug related to porting. Conclusion Following the guidelines proposed in this article, you will find and fix C/C++ memory errors prior to release to save weeks of debugging time and prevent costly crashes from affecting your customers. DDJ
Step 3. Repeat runtime error detection to verify that the modifications you made while fixing coding standard violations did not introduce any runtime errors. Step 4. At this point, you have the option of performing one more step to ensure that your code is as clean as possible before you port it. This additional step is unit testing. Unit testing is traditionally used to find errors as soon as each application unit is completed. It can also be beneficial later in the development process because, at the unit level, it is easier to design inputs that reach all of the functions. In turn, this helps you find errors faster and expose errors that you might not uncover with application-level testing. Identifying Problems on The 64-Bit Processor Of course, there may be problems with the 64-bit processor itself. If so, follow these guidelines: Step 1. Recompile your application on the 64-bit processor. If you have trouble compiling it, work out all of the quirks related to compiler variations. You might also want to create coding standard rules that will automatically identify the code associated with these quirks so that you can prevent these compilation problems from occurring in the future. Step 2. Once you recompile the code, perform code inspection again to check http://www.ddj.com
Dr. Dobb’s Journal, October 2005
Dobbs_0508.indd 1
67
07.06.2005 17:24:19 Uhr
Pointer Containers All the benefits of smart pointers, but none of the baggage THORSTEN OTTOSEN
O
bject-oriented programming in C++ has always been a bit awkward for me because of the need to manage memory when filling containers with pointers. Not only must we manage memory, but the memory management strategy implies lots of extra syntax and typedefs that unnecessarily clutter the code. Ignoring garbage collection, there are two approaches to memory management: • Smart pointers, as in Listing One(a). • Making your own implementation, such as Listing One(b). The good thing about smart pointers is that they are safe and fast to implement (just a typedef ). On the downside, they are inefficient (in space and time) and have a clumsy syntax. Consequently, in this article, I describe the design and implementation of the second approach, which I call “pointer containers,” that has all of the benefits of smart pointers but without the downside. The complete source code that implements this technique is available electronically; see “Resource Center,” page 4. Exception Safety When dealing with memory, exception safety is always a concern. Basically, the Thorsten is a coowner of Dezide (http:// www.dezide.com/), which specializes in troubleshooting programs based on Bayesian-network technology. He is currently writing a second major in computer science in the area of expert systems. He can be contacted at nesotto@ cs.aau.dk or [email protected]. 68
concept can be summarized as follows (from stronger to weaker): • The “nothrow” guarantee means code will not throw. • The “strong” guarantee means code has roll-back semantics in case of exceptions. • The “basic” guarantee means code will only preserve invariants in case of exceptions— in all cases, no resources leak. I start by extending the interface of the container, as in Listing Two(a). Granted, it is easy to implement push_back( ), as in Listing Two(b), but vector::push_back( ) might throw an exception if a new buffer cannot be allocated (leading to a leak of it). Consequently, you have to change the implementation, as in Listing Two(c). Now, push_back( ) has the strong exceptionsafety guarantee. Although this seems simple, even experienced programmers seem to forget it. Listing Three(a) is a naïve implementation of the iterator range insert( ). You must heap allocate copies because the container takes ownership of its pointers. This is not only a naïve implementation, but it is also horribly flawed. First, the elements are inserted in reverse order — if they ever get inserted! Second, before might be invalidated by vector::insert( ), leading to illegal code. Third, there are potential memory leaks (as with push_back( )). And fourth, it is potentially slow because you might cause several reallocations of a larger buffer. Listing Three(b) is better. That’s it, right? Well, yes and no. Yes, because I have removed all the stupid errors; and no, because you might still get several reallocations. So why not just call: vec_.reserve(distance(first,last)+ vec_.size() );
as the first line in insert( )? The answer is that reserve( ) can invalidate before. (As a caveat, the iterator type I might be an InputIterator, which means you cannot determine the distance between them, so you would need to do your own tagging. I will ignore that issue for now, although Dr. Dobb’s Journal, October 2005
the implementation must not.) Furthermore, if you decided to implement several pointer containers in one generic wrapper, you cannot call container-specific operations, such as reserve( ), without extra traits. Still, you have only achieved the basic exception-safety guarantee (which would be acceptable), you’re left with a clumsy implementation, and inserting pointers one-at-a-time means you’re not taking advantage of the fact that you know that N pointers must be inserted.
“When dealing with memory, exception safety is always a concern”
What you really would like to do is to use the iterator range version of insert( ) in vector. One idea would be to make an iterator wrapper whose operator*( ) would return a newly allocated copy. Iterators that do something extra are sometimes called “smart iterators” (see “Smart Iterators and STL,” by Thomas Becker, C/C++ Users Journal, September 1998). This would again lead to basic exception safety; however, the C++ Standard does not exactly specify how and when insertions take place and I could imagine prohibited memory errors occurring. In particular, the Standard does not guarantee that one element is inserted before the next iterator indirection happens. Therefore, I decided to use a different strategy and go for the strong exceptionsafety guarantee. The only thing it requires is a temporary exception- safe, fixed-sized buffer to hold the new pointers while they are being allocated. This http://www.ddj.com
is my scoped_deleter class and its role is similar to auto_ ptr in the implementation of push_back( ): It holds pointers and deletes them if an exception occurs. Once it is created, the implementation looks like Listing Four. There you have it — a generic, strongly exception-safe, and elegant implementation. Because copying a T* cannot throw, vec_.insert( ) is guaranteed to have the strong exception guarantee; hence, you never insert pointers that will also be deleted by the scoped_deleter if an exception occurs. The price you pay is one extra heap allocation for the internal buffer in scoped_deleter. An extra heap allocation is normally expensive, but it is acceptable here because you make N heap allocations while copying the pointers (and you could optimize this by using stack space for small buffers). You can then continue in this fashion to ensure that the insert( ), assign( ), erase( ), and clear( ) interfaces from std::vector are reimplemented in ptr_vector in an exception-safe manner. Iterator Design You might have wondered why new(T(*first )) would compile; after all, if the iterator is defined as the iterator of std::vector, you would certainly need two indirections (unless the constructor took a pointer argument). The reason is that you want normal iterators to be indirected by default, and the “old” style iterators to be available if needed. Therefore, you have Listing Five in ptr_vector, along with similar versions for const iterators. The indirect_iterator adaptor from Boost (http://www.boost.org/) will most likely be part of the new Standard. The adaptor does all the hard work for you and has several good consequences: • First of all, it promotes pointerless programming and gives code a clean look. For example, compare Listing Six(a) to Listing Six(b). If you allowed direct pointer assignment, you would at least have a memory leak and (most likely) a pending crash if the same pointer is deleted twice. • The second good consequence is that it allows for seamless integration between ptr_vector and (for example) std::vector. This can be useful if we deal with copyable, nonpolymorphic objects. In that case, you could say something like Listing Six(c). Of course, it will also work the other way around from v2 to v. • The third benefit is that it is just safer to use standard algorithms. • The fourth benefit is that you can use normal functors directly without speculating with wrappers that do the indihttp://www.ddj.com
rection. However, there are also situations where you want to use the old ptr_iterator. These situations are characterized by using mutating algorithms from the Standard Library, which need to copy objects. Copying the object is expensive (compared to copying a pointer) or simply prohibited (as in most polymorphic classes). So, if copying is cheap and possible, then you can just stick to the normal, safer iterators. In short, iterators promote safety and interoperability while not restricting access to the nonindirected iterators when needed. The Clonable Concept Even though we have a solid iterator range insert( ), we still demand that new T( *first ) is a valid expression. There are several reasons why this is not ideal. The first example I can think of involves objects that are created by object factories. A second example involves polymorphic objects that are not copyable — a polymorphic object should very rarely be copyable. In both cases, you cannot use the call new T( *first ), so you need a hook of some kind. The required indirection is given by the “Clonable” concept; let t be an object of type T, then T is clonable if new T( t ) is a valid expression or if allocate_clone( ) has been overloaded or explicitly specialized for T; see Listing Seven. What the implementation of ptr_vector now has to ensure is that the right version of allocate_clone is actually called. There are several pitfalls here and they all resemble the problems that the Standard has with calling std::swap( ). One solution is simply to call allocate_clone( ) unqualified within ptr_vector and to define the overloaded version in T ’s namespace. This way, you rely on argument-dependent lookup and replace the function template specialization with a simpler overloaded function. (Another possibility would be to add another template parameter to ptr_vector— the parameter could then be a type with allocate_clone( ) as a static function.) Domain-Specific Functions Because the pointer container manages resources, there is suddenly a whole new interface that makes sense — an interface that deals with releasing, acquiring, and transferring ownership. Consider first the possibility in Listing Eight(a) to release a single object, which makes it simple and safe to hand over objects in the container. Should the caller forget to save the return value, you still have no leaks. While all standard containers have copy semantics, providing the same for ptr_vector would be a bit strange — after all, the Dr. Dobb’s Journal, October 2005
objects you insert into the ptr_vector are only required to be clonable. The fact that copying (or cloning) a whole ptr_vector can be expensive also suggests that you need something different. Hence, you add the two functions in Listing Eight(b). The first makes a deep copy of the whole container using allocate_clone( ) and is easily made strongly exception safe. The second simply releases ownership of the whole container and it is also strongly exception safe. Notice that it cannot have the “nothrow” guarantee because you must allocate a new ptr_vector for the returned auto_ ptr. The new constructor is then used to take ownership of the result of clone( ) or release( ), and it gives the “nothrow” guarantee. What is really elegant about these functions is that they let you return whole containers as results from functions in an efficient and exception-safe manner. Recall that the iterators prohibited direct pointer assignment. This is certainly a good thing, but you lack the ability to reset a pointer within the container. Therefore, you add two functions to ptr_vector; see Listing Eight(c). The first is a rather general version that makes sense on other containers, too, whereas the last only makes sense on random-access containers such as ptr_vector. Now you can add these functions to your container; see Listing Nine. The idea behind these functions is that they let you transfer objects between different pointer containers efficiently without messing with the pointers directly. In a way, they are similar to splice( ) in std::list and can easily fulfill the strong exception-safety guarantee. Caveats There are still several issues that must be dealt with — support for custom deleters, for instance. Because you can specify how allocation is performed with allocate_clone( ), you should also be able to specify deallocation with deallocate_clone( ). The introduction will affect the implementation greatly because you need to use it whenever deletion is done; this means that you must scrap std::auto_ ptr because it cannot support a custom deleter, and change scoped_deleter similarly. There are certain cases where you still want to use the old-fashioned ptr_iterators with mutating algorithms. The good news is that some mutating algorithms can be used with an indirecting functor; that is, a functor that compares the objects instead of the pointers. The bad news is that “some” is not “all,” and algorithms such as remove( ) and unique( ) just copy elements instead of swapping them. This leads to both memory leaks and undefined behavior (see “The Standard Librarian: 69
Containers of Pointers,” by Matt Austern, http://www.cuj.com/documents/s=7990/ cujcexp1910austern/). There is a workaround for remove( ) and its cousin remove_if( ), but it is not very practical (see “A remove_if for vector,” by Harold Nowak, C/C++ Users Journal, July 2001). Though I have not yet decided how to deal with this, I am leaning toward implementing these few error-prone functions as member functions. While I’ve focused on the ptr_vector class, the same rules apply to all the standard containers. Thus, you have a wide range of pointer containers that suit almost any special situation. For sequences, the default choice should (as usual) be ptr_vector. A ptr_list should be re-
served for the rare cases where you have a large container and insertions/deletions are done at places other than the two ends. By a “large container,” I mean one that holds more than 100 elements (but this is only a rough estimate and the guideline differs from platform to platform). You have to remember that a list node often (but not always with special allocators) has to be on the heap each time an insertion takes place. Such an allocation can easily cost the same as moving 100 pointers in a ptr_vector. Conclusion Clearly, implementing your own pointer containers is not a walk in the park. How-
ever, once done, you have a useful and safe utility that enables flawless object-oriented programming. If you consider all the extra work that you have to do to make containers exception safe, it should not be surprising that garbage collection can be just as fast as manual memory management. The downside to garbage collectors, of course, is that they waste more memory (for example, a compacting garbage collector will probably double the memory used). It is ironic that with the use of smart pointers and pointer containers, we’re close to not needing garbage collection at all. DDJ
(b) template< class T > class ptr_vector { std::vector vec_; public: ~ptr_vector(); // delete objects // ... much more typedef <something> iterator; };
(c) void push_back( T* t ) { std::auto_ptr p( t ); vec_.push_back( t ); p.release(); }
Listing Three (a) void insert( iterator before, I first, I last ) template< class I > { while( first != last ) { vec_.insert( before, new T( *first ) ); ++first; } }
(b) void insert( iterator before, I first, I last ) template< class I > { while( first != last ) { auto_ptr p(new T( *first )); before = vec_.insert( before, p.get() ); ++before; // to preserve order ++first; p.release(); } }
Listing Four void insert( iterator before, I first, I last ) { size_t n = distance(first, last); scoped_deleter sd( n ); for( ; first != last; ++first ) sd.add( new T( *first ) ) );
Listing Seven // primary template: template< class T > T* allocate_clone( const T& t ) { return new T( t ); } // overloaded function template template< class T > X* allocate_clone( const X& t ) { return factory::clone( t ); } // function template specialization template<> Polymorphic* allocate_clone( const Polymorphic& p ) { return p->clone(); }
(c) void replace( iterator where, T* x ); void replace( size_type idx, T* x );
Listing Nine template< class I > void transfer( iterator before, I first, I last, ptr_vector& from ); void transfer( iterator before, I i, ptr_vector& from );
DDJ Dr. Dobb’s Journal, October 2005
http://www.ddj.com
EMBEDDED SYSTEMS
Using Hardware Trace for Performance Analysis Easily gathering accurate information about the execution of embedded systems MICHAEL LINDAHL
W
hether you want to create a faster printer, squeeze extra features into your cell-phone design, or minimize the cost of your embedded device by lowering your computation and processor requirements, performance analysis is a vital component of developing high-quality embedded systems. Thus, the consequences of inefficient code are very real when making embedded devices. There are several traditional tools and techniques that you can use to diagnose and help fix performance problems in embedded systems. However, many of these tools have limitations because they require modification to the code running on the system. This means that the tool interferes with the execution of the software, which can lead to inaccurate results or other problems. One class of tools that does not have any impact on your running system, yet makes performance analysis easy, is based on hardware trace technology. Hardware trace essentially gives you a complete history of the instructions executed by your microprocessor. This information can be collected without modifying the running code or altering its runtime characteristics — it is collected with zero intrusion on your system. Trace-analysis tools can then convert this information into meaningful and powerful performance-analysis data that lets you analyze your system (without altering its runtime behavior) and optimize your system for maximum performance. Michael is a senior software engineer at Green Hills Software. He can be reached at [email protected]. http://www.ddj.com
In this article, I examine some traditional performance-analysis techniques as they apply to embedded systems and discuss some of their inherent limitations. Next, I explore hardware trace by investigating what it is and how it can be used to effectively and efficiently debug your embedded software. Finally, I walk through a real-world example to illustrate some of the unique benefits of using hardware trace for system optimization. Performance Analysis and Optimization Performance analysis is an important part of developing a complete embedded product. Whatever your reasons for needing higher performance from your software, you almost certainly could use code that runs faster. However, performance analysis is generally a difficult task because it requires careful measurement and analysis of your system. In addition, after gaining visibility into what your software is doing, you must find areas where the code is performing unnecessary computations or other actions. Oftentimes, system performance can be greatly improved with relatively simple changes. However, finding the places that can be easily optimized is often a challenging task. There are many techniques that have been traditionally used to analyze software performance. The most common techniques involve various forms of profiling, which give you information about how often each part of your program is running. The most popular forms of profiling provide statistical information about what code is running. This data can be collected by either periodically sampling the program counter or by instrumenting the object code running on your system. Regardless of the method, profiling methods generally maintain an array of the program counter locations in your program. Profiling methods then increment a counter for each program counter location as it is encountered, letting you see how much time is spent at each instruction in your program. A debugger or other tool then correlates this raw information to the time spent in each task, function, and source code line in your system. Dr. Dobb’s Journal, October 2005
Statistical profiling lets you see the percentage of time spent at every location in your program. For example, Figure 1 is output from a profiling tool for the program in Listing One. However, this information comes at a steep cost, especially for real-time systems. By either halting your system periodically to sample the program counter or modifying your executable, standard profiling methods change the behavior of your system, often in dramatic ways.
“Performance analysis is an important part of developing a complete embedded product” It is possible that these methods could change the timing behavior of your system, potentially causing your software to miss deadlines or encounter other serious problems. Also, storing the profiling information on embedded systems is often a challenge because either additional memory or a way to transmit this information as the system runs is required. Finally, instrumentation solutions may increase the size of your code, potentially making it too large to fit into the limited memory available on many embedded systems. Hardware Trace Hardware trace is a technology available on many embedded microprocessors that allows you to effectively analyze your system without any of the drawbacks previously mentioned. In addition, hardware trace offers some unique performanceanalysis benefits that are unavailable with standard profiling methods. Hardware trace provides a complete history of what your microprocessor is doing as your software runs. This lets you 71
examine the various interactions in your software in great detail. By collecting information about what instructions are executed, when interrupts and context switches occur, and sometimes even what data addresses and values are loaded and stored, hardware trace provides nearly complete visibility into the workings of your software. In addition, because hardware trace uses dedicated logic on your processor and specialized collection devices, hardware trace does not impact the execution of your software and you are able to collect detailed information about your production software while it runs at full speed. These capabilities let you visualize your software in unprecedented ways, making it easier to debug and optimize your software. Hardware trace data is output over a dedicated port from a trace-enabled microprocessor. There are several embedded microprocessors available today that can output trace data, including the
Figure 1: Standard performanceanalysis tool that displays the amount of time spent in each function. Trace-Enabled Microprocessor Trace Port
Trace-Port Analyzer
USB, Ethernet, etc.
Host PC
Figure 2: Block diagram of hardware trace collection.
Figure 3: SuperTrace Probe ready to collect trace data from an ARM target. 72
ARM7, ARM9, and ARM11; the IBM PowerPC 405 and 440; and the PowerPC 5500 and MAC7100 families from Freescale. In addition, more and more microprocessors are adding support for trace as the benefits of debugging and analyzing embedded systems with hardware trace are becoming more widely recognized. Special hardware called a “trace probe” or a “trace port analyzer” connects directly to a microprocessor’s trace port and captures the trace data. This data is then uploaded to a host system, such as a PC running Windows or Linux, to be decompressed and analyzed. Figure 2 is a block diagram of a trace-collection system using an embedded microprocessor. Trace probes also offer features that are particularly important when optimizing the performance of an embedded system. The most important of these features is the ability to collect “time stamps” along with each piece of trace information that is collected. This lets you accurately measure the amount of time elapsed between two events and to precisely analyze where your program is spending its time. In addition, some processors let you collect information about how many cycles each instruction takes, which allows you to investigate the effects that your caches are having on your system and perform other detailed types of analysis for improving the performance of software. Once the trace data is uploaded, host tools let you analyze various parts of the execution of your system, including debugging incorrect behavior and searching for performance problems. There are a number of trace probes and trace-analysis software packages available, including the SuperTrace Probe and MULTI TimeMachine Suite from Green Hills Software (where I work; http://www.ghs.com/). Hardware Trace Applied to Performance Analysis By providing a wealth of information about the execution of your system without changing its behavior, hardware trace data enables unique advanced performanceanalysis techniques. In addition, because it lets you collect timing and cycle information, you can very accurately measure important system metrics such as interrupt latency and context switch times. And once you can measure these numbers accurately, you can find the parts of your system that are running too slowly and that can be easily sped up. The first and most important step of performance optimization is to find the “hot spots” where your system is spending a large amount of time and the code can be improved to yield the greatest Dr. Dobb’s Journal, October 2005
gains in performance. These are often inner loops and other parts of your code that are executed many times, although they could also be inappropriate algorithms or other slow code. Locating these hot spots is the primary purpose of nearly all performance-analysis tools, including those tools that let you visualize your program using trace data. One major advantage that trace data provides in searching for hot spots is that it enables you to locate anomalous execution patterns in your software. This may be a function that takes an unusually long time with certain parameters; a loop that takes too long on one or more specific iterations; or even an interrupt service routine that is fast most of the time, but sometimes takes too long. For instance, if a function executes 100 times, taking 100 cycles 99 times, and then taking 10,100 cycles the final time, the function takes an average of 20,000/100= 200 cycles per call. A statistical-analysis profiling tool will be unable to distinguish this case from a function that takes 200 cycles every time it is called. The types of performance improvements that you would look for may differ between these two cases because you may need to debug just one slow path through your function in the first case, and you may want to do a more general overhaul of your function or algorithm if it takes 200 cycles on every call and its performance needs to be improved. However, trace-analysis tools can distinguish between these two cases by giving you a list of each call to the function and its duration. Once you have identified the functions that are taking most of your execution time, you can then examine the paths through that function that cause it to run slowly and eliminate these inefficiencies. Many trace tools also let you visualize your system over time so that you can graphically see what functions take most of the time in your system. By allowing you to visualize this data rather than having to pore over lists of numbers, you can gain a solid understanding of how your software spends its time to easily let you make your program more efficient. In addition, some trace tools even let you easily determine the parameters to your function for a slow case so that you can immediately debug it and improve your system performance. An Example Listing Two is a simple program that sorts an array a large number of times. I use an implementation of the quicksort algorithm, which should be getting reasonable performance from the sorting algorithm. Each time through the main loop, http://www.ddj.com
I read an array, sort it, and verify that it is sorted correctly. If the sort function ever fails, then you bail out of the main loop and print an error message. When running this program, you discover that the program runs slower than expected. If you were using an ordinary performance-analysis tool, such as a statistical profiler, you would be able to determine that the sort function takes too long. In addition, you could determine what source lines were the cause of the delay. However, it would be difficult for you to distinguish between a sorting algorithm that is too slow on average and one that is too slow during a single iteration of the loop. You could then insert additional code into the loop to measure the execution time of each iteration to see if you could discover anything interesting. You would then have to dig deeper once you discovered the iteration that was taking too long. In short, you would have to do a lot of work and probably recompile and rerun the program several times before you could solve the problem. In addition, many embedded systems have strict timing deadlines, so inserting instrumentation code may cause incorrect behavior or lead to other negative outcomes. While this additional code may not affect this simple example, it is easy to imagine a scenario where tight deadlines might be missed when inserting additional instrumentation code. If you collect trace data for this example instead of using traditional performanceanalysis techniques, you can then easily get information about each function that executes and how long it takes to run. You can even precisely measure the amount of time that each iteration through the loop takes and how many cycles elapsed. This information is invaluable in locating hot spots quickly so that you can maximize your efficiency when tuning your system. Moreover, this data lets you focus on the code that runs slowly by precisely accounting for all elements of your system, including cache hits and misses, interrupt latencies, and others. Returning to the example, I run the program on an ARM processor and collect trace data as the program runs. Figure 3 shows a photo of this setup and Figure 4
Figure 4: Output from PathAnalyzer shows a graphical call stack of your program over time. http://www.ddj.com
shows the resulting output from a TimeMachine tool called the “PathAnalyzer” that shows a graphical call stack over time. You can immediately see that the quick_sort( ) function takes up the majority of the running time and that a single iteration takes longer than the rest. If you want to investigate the quick_sort( ) function in more detail, you can easily look at all of the calls to quick_sort( ) and their durations. Browsing all of the calls to the quick_ sort( ) function results in Figure 5, which contains a list of the 10 calls to quick_sort( ) and the duration of each call in processor clock cycles. You can quickly see that the seventh call to quick_sort( ) takes almost four times longer than the rest. And you are able to easily collect and display this information while the system is running at full speed without modifying its behavior in any way. At this point, you know that you should investigate this specific call to quick_sort( ). By examining and debugging the code in this example, you discover that you are using an array that is already sorted on the seventh iteration. As you may recall, quicksort degenerates into an O(N2) algorithm when it is run on an already sorted array, which is the cause of our performance problem in this example. There are several options for fixing this performance problem by modifying the implementation of quicksort; this is left as an exercise for the reader. For more information on the quicksort algorithm, see a book on algorithms such as Algorithms In C by Robert Sedgewick (Addison-Wesley, 1998).
Figure 5: Duration in processor clock cycles of each call to the quick_sort() function from Listing Two. You can see that the highlighted call takes almost four times as long as the others. Dr. Dobb’s Journal, October 2005
73
In addition, if you simply wanted to characterize this sorting algorithm, you could collect trace data from our implementation running on various arrays and collate very accurate results into some interesting information about our algorithm. While this type of analysis can be done in other ways with simple sorting algorithms, if you are collecting the data from sensors on your embedded system and want to analyze the cost of reading the data and sorting it over long periods of time, hardware trace would be an ideal solution because it does not modify the runtime behavior of your system, yet provides detailed and accurate information about the execution of your system. While this was a fairly simple example, you can see how trace tools make it painless to find hot spots and perform
advanced analysis in an embedded system so that performance optimization can be performed more efficiently and effectively. Conclusion Hardware trace provides a mechanism through which embedded developers can easily gather very accurate information about the execution of their systems without having to modify their software or impact the runtime behavior of their system. This enables hardware trace to be used in nearly any embedded system that runs on a trace-enabled processor, regardless of whether the application has strict timing requirements. Also, because hardware trace does not modify the behavior of your system, you can analyze your software in real-world conditions while it is running
Listing One /* This code implements a simple bubble sort algorithm. Figure 1 shows the * output from a traditional profiling tool for this code. You can see that * roughly 28% of the time running this program consists of copying memory * using the memset function. From looking at the code, this is a non-obvious * result. */ #include <stdio.h> #define N 10 int array_to_sort[N] = {15, 6, 25, 36, 92, 0, 2, 67, 82, 91}; void bubble_sort(int *array, int size) { int i, j; for(i=0; i<size-1; i++) { for(j=0; j<size-i-1; j++) { if(array[j] > array[j+1]) { int tmp = array[j]; array[j] = array[j+1]; array[j+1] = tmp; } } } } void print_array(int *array, int size) { int i; for(i=0; i<size; i++) { printf("%d\n", array[i]); } } int main(int argc, char *argv[]) { int tmp_array[N]; memcpy(tmp_array, array_to_sort, sizeof(array_to_sort)); bubble_sort(tmp_array, N); print_array(tmp_array, N); return 0; }
/* This code contains an implementation of the basic quicksort algorithm. * The program's main loop reads an array, sorts it and then verifies that * the array is properly sorted. By tracing this program, we can easily * identify slow call to quick_sort() and determine causes of this slowdown. */
int array1[N]; int array2[N]; #define swap(a, b) { int tmp = a; a = b; b = tmp; } void init() { int i; for(i=0; i
74
on actual data collected from the rest of the system. These characteristics make trace a promising solution for many embedded developers. Whether you want to increase the performance of your application to free up CPU cycles for extra features, to lower the clock frequency of your processor to save power, or to achieve higher performance for some key metric of your system, optimizing embedded systems is an important task that hardware trace can make easier. By providing zero intrusion and enabling very accurate measurements, hardware trace can help you achieve your performance-optimization goals quickly and efficiently.
} void quick_sort(int *array, int size) { quick_sort_r(array, 0, size-1); } int verify_array(int *array, int size) { int i; int last_val = 0; for(i=0; i<size; i++) { if(array[i] < last_val) return 0; last_val = array[i]; } return 1; } int main(int argc, char *argv[]) { int i; int tmp_array[N]; int size; init(); for(i=0; i<10; i++) { // Let's read in an array and sort it read_array(tmp_array, &size, i); quick_sort(tmp_array, size); // Now let's verify that our array is // sorted properly. if(verify_array(tmp_array, size) == 0) { printf("Array not sorted correctly\n"); break; } } return 0; }
DDJ Dr. Dobb’s Journal, October 2005
http://www.ddj.com
PROGRAMMING PARADIGMS
Spawn of Crazy Frog Michael Swaine
I
t’s not really all Francisco Tarrega’s fault. Tarrega, a 19th-Century Spanish classical guitarist known as the father of the modern classical guitar, died in 1909, but he became, posthumously and for a time, the father of the world’s most annoying ringtone — at least according to Britain’s Daily Mirror. Tarrega earned that honor a few years back when everybody finally got fed up with a composition of his called “Gran Vals.” It wasn’t such an annoying tune in itself, but in 1991 the Finnish cell phone maker Nokia pulled 13 notes out of the middle of “Gran Vals” to create one of the first cell phone ringtones. Other annoying ringtones followed, but this one was pretty clearly the most annoying ringtone of them all. Of course, there was less competition back then. “Gran Vals” was not intended to be annoying; it got that way after so many people were forced to listen to those 13 notes endlessly repeated in theaters and subways and during meals and meetings. That’s a recipe that can turn any tune from great to grating. But some modern ringtones were actually designed to be annoying. The Annoying Thing According to Wikipedia, the ultimate source of the current champion ringtone annoyance is Swede Daniel Malmedahl, a teen-aged internal combustion engine impressionist (I’m just reporting), who was trying to nail the sound of a two-stroke moped engine and came up with a screwy vocal sound effect that enjoyed a certain vogue on the Internet before capturing the attention of Erik Wernquist, who in turn, came up with The Annoying Thing, an animation featuring an anatomically correct anthropomorphic frog in goggles Michael is editor-at-large for DDJ. He can be contacted at [email protected]. http://www.ddj.com
and a helmet and not much else who performs hand movements simulating the twisting of the handgrips on a motorcycle while making appropriate noises appropriated from Malmedahl’s moped engine impression. And the rest is what passes for history in an ADD world: Wernquist’s Annoying Thing begat the annoying ringtone, now called “Crazy Frog” and on its way to taking over the world. In short order, Crazy Frog earned 14-million pounds for Jamba, its publisher (“ringtone publisher”: Add that to the list of lucrative job categories that didn’t exist when you were choosing a major), making it the most commercially successful ringtone ever. To say that Jamba promoted Crazy Frog heavily would be putting it lightly. Crazy Frog pretty much took over British ITV in May, and the resident of the United Kingdom who hadn’t been exposed to Crazy Frog probably hadn’t been exposed to the common cold, either. Then it became a hit single: A Crazy Frog dance single based on Malmedahl’s dopey moped impression outsold the nearest competitor by four to one and rocketed to the top ten in many European markets. There’s a Crazy Frog album out, there’s video game in the works, there’s a computer virus that masquerades as Crazy Frog, and at this moment someone is probably optioning the movie rights with one hand while getting Jim Carrey’s agent on the cell with the other. The Ringtone Phenomenon What the heck is up with this bizarre ringtone phenomenon? You may well ask. Here’s what’s up with that. In the U.S., ringtones were a $300-million business last year and will be double that this year. In Chicago, you can buy ringtones in McDonald’s. But the U.S. trails many other countries in cell phone adoption, and therefore, in ringtone mania. The worldwide market is Dr. Dobb’s Journal, October 2005
at least ten times the size of the U.S. market, and is already about a tenth of the total global music market. In Britain last year, ringtone sales surpassed CD sales. That’s in pounds, not just units. The ringtone market has attracted talents as diverse as Boy George and Andrew Lloyd Webber. Last year, Dutch R&B singer/songwriter Alain Clark released “Ringtone,” a song commemorating the phenomenon. Naturally, I Googled ringtones. The word scored 11,200,000 hits; for comparison, Harry Potter got me 26,200,000 and Karl Rove got 5,700,000. So in salience, ringtones fall geometrically halfway between a warlock and a popular fictional character. I looked for an explanation from some of the bloggers who are following the ringtone phenomenon. “I cannot for the life of me explain the ringtone phenomenon,” one said. Another: “I’m a bit flabbergasted by wireless consumers’ attraction to ringtones.” And another: “It’s OK to say you don’t understand the ringtones business.” It is okay, because the phenomenon defies economics, self-interest, and arithmetic. People are paying more for snippets of music than they pay for full songs. There are sources of free ringtones just as there are sources of free songs, but the average price paid for ringtones these days is a buck, which is the going price for a full song from Apple’s iTunes Music Store, and you’ll pay Nokia several times that for a high-quality True Tones ringtone. Not to mention the fact that the ringtone phenomenon is built on one of the most socially obnoxious aspects of the cell phone: its invasion of others’ auditory space. I won’t even get into what the cacophony of ringtones is doing to the sex lives of songbirds. It’s worth pointing out that one motivation for using ringtones really is to annoy people. Crazy Frog is evidence of this, 75
but then there are the ringtones from RudeTones, featuring farts, burps, and sneezes. And ringtones themselves are the tip of a larger iceberg of annoyance. There are also ringbacks, which analysts expect to be bigger than ringtones. Ringback tones are heard by the person calling you, rather than by you and those unfortunate to be in your immediate vicinity when you get a call. Two reasons why ringbacks might indeed surpass ringtones are that they are operator owned, so there’s an opportunity for the operators to make money, and businesses can put pitches (sales, not musical) in their ringbacks, for a new advertising channel. Now that’s annoying. There are bird calls on ringtone, and the porn industry has spun the ringtone into the moantone. The (sound) quality is improving: You can Google real tones or true tones or polyphonic ringtones for the skinny on that. You can also download complete songs to your phone. Robbie Williams was the first artist to release an album on a cell phone memory card. This is a different phenomenon from ringtones, but it’s clearly on some sort of convergence or collision course with the ringtone phenomenon. My Ringtone, My Self There have to be other reasons for the ringtone phenomenon besides the urge to annoy strangers. Many of the people who have thought about this question point to the issue of personalization. A cell phone is a generic consumer product, but it becomes your link to your friends, and therefore, a very personal device. The argument is that this strong social aspect of the device creates an equally strong need to personalize it, to make it represent you because it is, in some sense, your avatar in your social world. There is an apparent contradiction in this, because ringtones are very much about being part of the crowd, fitting in. The more popular a ringtone is, the less useful it is in defining your particular quirky uniqueness. I think maybe the way to cut through the apparent contradiction is to think about the audience. Musical taste is an individualizing property: You can assert your individuality by flaunting your musical taste. But teenagers, who are the prime market for cell phones and ringtones, are in the process of discovering or creating their identities and their tastes, as opposed to demonstrating it. So their strategy is: “Try out what others like to see what you might like.” So I do think that the idea is correct that ringtones and ringback tones are 76
about making this very personal device, the cell phone, a more accurate projection of yourself. Let’s assume that’s right. What are the implications? What’s the opportunity? What if, beyond the early stage of exploration and trying on other people’s tastes, the ultimate goal is to have your own personal ringtone or ringback tone, unique to you and reflecting your taste, designed by you without requiring you to be a composer. How would you produce that? Who would produce it? How would anybody make money off it? Enter Stephen Wolfram Maybe Stephen Wolfram and his colleagues know. Wolfram is the genius behind the massive mathematics program and language Mathematica. For many years, he has been concentrating on the complex structures that can be generated using the kinds of simple rules found in Cellular Automata (CA). If you know all about CAs, you can skip ahead, but for those who aren’t up to speed on this fascinating realm of programming, here’s a quick glimpse. A CA operates in a Cartesian digital world of n spatial and one temporal dimensions; n=1 is complicated enough for the current discussion. A particular CA requires two things: an initial state and a transformation rule. The initial state specifies the values of all cells in the CA’s one-dimensional row of cells. These cells may be defined to be a single bit in depth, or they may be deeper. A CA with 1-bit cells can be represented as a row of black and white boxes (ON and OFF), and a CA whose cells are deeper can be represented as a row of colored boxes. Often, the initial state of a CA will be a single ON cell, all others OFF. The rule specifies how you get to the next state from the current state. Subsequent states are also called “generations,” and the rule specifies how the CA evolves from generation to generation. CA rules often embody a principle of locality, so that cells are only influenced by adjacent or nearby cells. An example of a simple rule for a CA with binary cells might be: A cell will be ON in the next generation if, and only if, exactly one of [the cell, its left neighbor, and its right neighbor] is ON in the current generation, or if the cell and its right neighbor, but not its left neighbor, are ON in the current generation.
This is the so-called Rule 30, one of the 256 possible rules for this variety of CA, with one spatial dimension, binary cells, and nearest- neighbor locality. When you run it out for a few generations, you begin to see something curious. The sequence of cells expands and Dr. Dobb’s Journal, October 2005
seems to get increasingly more complex. It doesn’t just get more ornate, it seems to really increase in complexity. It is, apparently, chaotic. The central 1-bit-wide slice of the expansion of Rule 30 gives a series of bits that are, for practical purposes, random; in fact, Mathematica uses Rule 30 as a random-number generator. Simply Deep Wolfram believes that there is something of great profundity in the fact that some of these simple rules with even simpler input can produce output of such great complexity. So complex and unpredictable, in fact, that it may be logically impossible to characterize the output of the nth iteration of such a rule in any way shorter than executing the rule n times and observing the result. In his fat book A New Kind of Science, Wolfram explores some of the implications of this. Along the way, he has something to say about everything from nanotechnology to consciousness. More to the point, Wolfram is convinced, and I think he’s convincing, that these small programs have something to say about all these areas. Some such simple program, he hints, or at least I infer, could even be the engine that runs the universe, that makes the universe of this instant transform into the universe of the next instant. Wolfram continues to explore the implications of small programs, applying them to many problems in different areas. And now he’s applied them to the problem (if you accept that this is a problem requiring a solution) of producing new cell phone ringtones. Mass Customization In July, I got what looked like a color swatch book in the mail. Art directors and interior decorators know what I mean: 1–3/4×8-inch cards, bolted at one corner to fan out, er, like a fan, with a different color on each card. But these cards had color patterns: Each displayed a narrow horizontal rectangle broken up into maybe 1500 squares of different colors, with runs of color, repetitions, patterns showing up in the limited palette chosen for each or in other features, but still basically random looking. On the back of the swatch was a URL, a guest username, and a password. Ah, a mystery. So I went to the site, WolframTones at http://tones.wolfram.com/. Sheet music for the 21st century, the pitch went. Beginning in October, WolframTones would, for a small fee, let anyone with web access create their own music in a matter of seconds. http://www.ddj.com
This music-creation service is just one aspect of a trend that Wolfram has identified and wants to ride: mass customization. That is, providing made-to- order products and services on a mass-market basis at one-size-fits-all pricing. Creating your own custom ringtone at WolframTones requires no knowledge of chords or scales or other music theory or practice. You generate a composition by clicking checkboxes and moving sliders to make selections on variables of musical style and tempo and scale, and assigning roles to multiple instruments. You don’t have to know what these variables mean, because you can experiment with them to find something you like. You also specify the parameters of the mathematical rule underlying the composition. The rule is one of several billion different 5-neighbor cellular automata rules and its seed, or input value, plus a couple of other parameters. WolframTones uses the rule to generate a tune by grabbing an n-cell-wide swath down the center of the visual representation of the running rule and then rotating it so that the time axis is horizontal as you’d expect in musical notation. It then maps aspects of the generated pattern to musical properties to produce the melodic sequence. WolframTones produces compositions with
http://www.ddj.com
multiple instruments, and each instrument can be mapped to different aspects of the pattern. Some sequences will be repetitive, but some will noodle on in endless variations within tight musical parameters. Wolfram Science hopes to make money on this (and keep it all — no royalties) by charging for downloading ringtones via your carrier. And they intend to charge somewhere around $2. Hmmm. Well, the market has shown that it has no money sense, so the price may not be a problem. But will WolframTones really catch on in Ringtonia? I Just Called to Feel Your Vibe I guess I’m skeptical. The technology behind WolframTones is fascinating, and I’d like to see it employed, for example, in online games, where every time you get in a particular situation you hear some mathematical variation on a piece of music. Because the music is different each time, the game seems deep, but because it is recognizable as the same theme, it defines your locus in the game. Something like that. But ringtones? These WolframTones ringtones meet the individualization requirement brilliantly, because the tone patterns (melodies?) are essentially unique. As for the musicality
Dr. Dobb’s Journal, October 2005
part, I’m not so sure. Does the marginally musical nature of WolframTones make it more or less annoying than real tunes? More or less memorable? I dunno. But there is still the group cohesion aspect, and on that measure, I’d say that Wolfram’s unique ringtones fail precisely because of their uniqueness. And I suspect that the group cohesion thing is important. This is, after all, a communication device, and we can only communicate about shared experience. But I could be wrong. Here’s something else I could be wrong about: I can’t help but think that the whole billion-dollar ringtone edifice might fall if creative minds focused on the vibrating option rather than the sound option. Cell phones that offer a vibrating notification option instead of the noise option raise the possibility of custom vibration patterns. And there’s no need to stop with replacing ringtones: How about replacing ringback tones with vibration patterns, too? So, if you got a particularly nice vibeback when you called a particular number, might you call that number more often? And hope that they did not pick up too soon? I’m just thinking. DDJ
77
EMBEDDED SPACE
Strong Language Ed Nisley
D
esktop computer operating systems must be optimized for mass-market distribution to folks who have neither the interest nor the skill for customization. As a result, any given user may take advantage of only a fraction of the available programs, but every user will find a suitable fraction. For example, Windows versions differ largely in their go-fast stripes. The top bullet item for Windows Media Center boasts, “A clean, new look for menus, taskbars, and a host of new themes and screensavers,” which tells you what’s really important to their marketing folks. While the exterior may look different, it has much the same assortment of stuff under the hood. Early GNU/Linux distributions took this to an extreme by installing not only the kitchen sink, but every sink created since primordial UNIX crawled ashore, plus a broad assortment of plumbing and sculpting tools should you become desirous of building your own sink. Recent distros have reined in this tendency, if only because a program that’s not present can’t become a security exposure. A secondary benefit accrues to new users suffering a case of “the crazy you get from too much choice” when confronted by a dozen editors, half a dozen browsers, and four CD burners. That’s less of an issue with stock Windows boxes, which are downright feature-poor by comparison. Classic embedded systems take this to the other extreme, sometimes consisting of a single program in ROM. Perhaps a decade ago, the need for an operating system became obvious in larger projects, and five years ago, Do-It-Yourself (DIY) operating-system designs reached the far limits of typical DIY ability. By now, old-line embedded OS and RTOS vendors are tarting up their offerings to attract DIY designers who’d otherwise pick Linux, embedded Linux vendors are adding real-time features to entice escapees from per-unit RTOS royalties, and doughty DIY folks are scratch building Linux kernels. One thing is clear, however: Larger and more complex embedded systems have Ed’s an EE, PE, and author in Poughkeepsie, NY. Contact him at ed.nisley@ieee .org with “Dr Dobbs” in the subject to avoid spam filters. 78
more places for things to go wrong. Data points from the desktop world can therefore help us predict problems in the embedded realm, if only we look closely at the trouble. While the sight may not be pretty, averting our eyes won’t help. High Warble My desktop PCs run SuSE Linux, but I keep a token Windows laptop around for those few vital programs without a GNU/Linux equivalent. Therefore, I subscribe to both the SuSE Security Announcement and the Microsoft Security Bulletin e-mail lists, which provide regular doses of problems and fixes. Each problem includes a number that uniquely identifies it in the MITRE Corporation’s Common Vulnerabilities and Exposures (CVE) list. Because the same problem may occur across many different versions and types of operating systems, the CVE number (oddly, with a CAN- prefix) helps reduce the inevitable confusion about which bug you’re discussing. First, a pair of broadsides. CAN-2005-1206: Buffer overflow in the Server Message Block (SMB) functionality for Microsoft Windows 2000, XP SP1, and SP2, and Server 2003 and SP1 allows remote attackers to execute arbitrary code via unknown vectors, aka the “Server Message Block Vulnerability.” CAN-2004-1137: Multiple vulnerabilities in the IGMP functionality for Linux kernel 2.4.22 to 2.4.28, and 2.6.x to 2.6.9, allow local and remote attackers to cause a denial of service or execute arbitrary code via (1) the ip_mc_source function, which decrements a counter to -1, or (2) the igmp_marksources function, which does not properly validate IGMP message parameters and performs an out-of-bounds read.
These errors permit a remote attacker, one not sitting at your keyboard, to execute arbitrary code, the current euphemism for “do a hostile takeover.” Because any embedded system complex enough to require an operating system will also be networked, these are particularly severe errors. Direct network-to-kernel attacks are fairly rare, however, because kernel code receives a fair amount of scrutiny. Generally, an attacker must compromise another networked program, which is particularly easy on the desktop: Folks really like their music. Dr. Dobb’s Journal, October 2005
CAN-2004-0258: Multiple buffer overflows in RealOne Player, RealOne Player 2.0, RealOne Enterprise Desktop, and RealPlayer Enterprise allow remote attackers to execute arbitrary code via malformed (1) .RP, (2) .RT, (3) .RAM, (4) .RPM or (5) .SMIL files.
If you can imagine a large-scale embedded system running a music player, it’s equally easy to see why it might run unattended. If not music, then surely the system uses pictures? CAN-2005-1211: Buffer overflow in the PNG image rendering component of Microsoft Internet Explorer allows remote attackers to execute arbitrary code via a crafted PNG file.
Now we’re talking! Internet Explorer is less a separate program than a thin GUI film atop a seething mass of Windows DLLs. Because DLLs were intended for reuse, I suspect any program dealing with PNGs will smack into that error. In any event, if an attacker can force-feed a bad PNG into that DLL, your system’s control goes with it. CAN-2004-0981: Buffer overflow in the EXIF parsing routine in ImageMagick before 6.1.0 allows remote attackers to execute arbitrary code via a certain image file.
That trick requires more effort than with IE, but I can see an embedded system doing image processing for other boxes. Notice any similarities so far? Programmed Injection Many help- desk sites use NNTP news servers for support discussions. If you can imagine reading news from an embedded system, here’s an attack path. CAN-2005-1213: Stack-based buffer overflow in the news reader for Microsoft Outlook Express (MSOE.DLL) 5.5 SP2, 6, and 6 SP1 allows remote malicious NNTP servers to execute arbitrary code via a LIST response with a long second field.
Do you prefer instant messaging? CAN-2005-1261: Stack-based buffer overflow in the URL parsing function in Gaim before 1.3.0 allows remote attackers to execute arbitrary code via an instant message (IM) with a large URL.
Both of these seem to require a successful server compromise to put an attacker on the far end, or at least a deliberate conversation with a Black Hat. Alas, even if an embedded system has fixed URLs for its upstream servers so that it will http://www.ddj.com
(continued from page 78) fetch and store data only in machines you control, there’s another route into your system. When you use the Internet and OPC, you’re dependent on everything working correctly, even stuff you didn’t know existed. CAN-2000-1218: The default configuration for the domain name resolver for Microsoft Windows 98, NT 4.0, 2000, and XP sets the QueryIpMatching parameter to 0, which causes Windows to accept DNS updates from hosts that it did not query, which allows remote attackers to poison the DNS cache.
Similar flaws in UNIX- flavored DNS server code permitted an exploit at the nameserver, allowing an attacker to redirect your packets to a hostile server without the inconvenience of first cracking your box. Given control of your data, the rest is easy. It’s even easier getting in when the door isn’t locked. CVE-2002-0676: SoftwareUpdate for MacOS 10.1.x does not use authentication when downloading a software update, which could allow remote attackers to execute arbitrary code by posing as the Apple update
server via techniques such as DNS spoofing or cache poisoning, and supplying Trojan Horse updates.
Both of these sound like errors of omission, rather than coding errors, but I remain amazed that nobody’s pulled off a similar stunt with Windows Update or Symantec Liveupdate. Should you plan to keep your embedded systems patched using a similar mechanism, remember that strong crypto forms a necessary, but not sufficient, foundation. Exploitation Common wisdom has it that UNIX systems will suffer less damage from an attack than a Windows box. Although Windows distinguishes between “Administrators” and “Users,” in actual practice, you can’t get much done without being an Administrator and some programs simply won’t run for ordinary Users. As a result, Windows users generally have full Administrator privileges that translate any rogue code into a total system compromise. UNIX-style users have severely limited access to the system’s gizzard and, because that’s the way it’s supposed to be, users can run all the programs they should
Dr. Ecco Solution Solution to “Maximum Lottery,” DDJ, September 2005. Following the idea of the sultan’s daughters problem, we will express our protocols by three numbers in increasing order: x, y, and z. The idea is to reject the first x, then choose the next one better than those first x. Call the position of that one p1 (so p1>x). If p1>y then choose the next one better than any seen before (at a position we’ll call p2). Otherwise, reject until position y and then pick the next one better than any seen before (at a position we’ll still call p2). If p2>z, then choose the next one better than any seen before (at a position called p3). Otherwise, reject until position z and then pick the next one better than any seen before (at a position we’ll call p3). For 100 balls in total and three keeps, you win roughly 68 percent of the time if you set x, y, and z to be 14, 32, and 64, respectively. Here is one example win for the following sequence: 3 78 80 90 25 95 51 27 57 40 65 48 55 72 26 73 54 31 15 2 89 61 97 98 8 50 38 18 88 52 4 42 68 16 62 9 94 99 20 28 56 58 76 93 10 96 63 35 81 91 66 11 30 5 0 24
The p1 position value in this case would be 23 where the value 97 is found, because 97 is the first value larger than 90, which is the largest of the first 14 numbers. The p2 value would be 38 where 99 is found and the third keep then is irrelevant. Reader Improvements Concerning “Treasure Arrow” Alan Dragoo pointed out that the delta is 15 centimeters so the arrow and plank each weigh 75 kilograms. Carl Smotricz noted that because the arrow is not quite horizontal, the horizontal distance between the two ends of the arrow is less than 10 meters. In fact, it is √(102–(0.6)2)= 9.98. So, the arrowhead has dipped 0.6 meters for each 9.98 meters of horizontal distance. Continuing this line (Carl used trigonometry, but let’s do this in a more elementary way), there would be a further dip of some amount x for the next 10 meters of horizontal distance. So, x=0.6×(10/9.98). This gives an extra 0.601 meters. So the arrow, in fact, points to 2.201 meters below the ceiling. DDJ
Dr. Dobb’s Journal, October 2005
and none of the rest. Even so, if a user has write access to files used by others, an attacker can cause considerable damage without killing the system. Unfortunately, even on Linux boxes, once an attacker gains any access to the system there’s little standing in the path to root-dom. If all else fails, a carefully scripted crash will work. CAN-2005-1263: The elf_core_dump function in binfmt_elf.c for Linux kernel 2.x.x to 2.2.27-rc2, 2.4.x to 2.4.31-pre1, and 2.6.x to 2.6.12-rc4 allows local users to execute arbitrary code via an ELF binary that, in certain conditions involving the create_elf_tables function, causes a negative length argument to pass a signed integer comparison, leading to a buffer overflow.
Common Causes If your similarity sense is tingling by now, there’s a good reason. With the possible exception of those DNS and Mac errors, the attacks come down to buffer overflows in C programs. It’s easy to blame this on poor coding practices, but the real problem lies elsewhere. Ideally, the language you pick for your next application would be the one that simplifies the solution so that your code becomes obviously correct. Yes, you can write a GUI app in Assembler or a business application in APL, but that’s going about it the hard way. A quick check shows that something like 2500-odd programming languages have come and gone in the last half-century. Although old languages rarely die or completely fade away, face it, APL just isn’t in your future. Assembler, now, maybe that’s a different story. Anyhow, Hobson’s Choice dictates using either C++ (C# for folks in the dot-Net gulag) or Java. Deep embedded stuff still happens in C, with C++ gaining traction for big projects. You’ll find a zillion other languages complete with zealous partisans, some of whom I’ll hear from within the next week, but to a good first approximation, it’s a C-style language or nothing. And there’s the problem. As Kernighan and Ritchie put it in The C Programming Language, “C is a relatively ‘low-level’ language.” Before you can tackle large-scale problems, you must first acquire the scaffolding required to support those abstract concepts. Unfortunately, while you’re concentrating on the abstractions, the lowlevel details still matter. C++ slathers a (massive) layer of abstraction atop the same low-level language substrate to provide the worst of both worlds: The arrogance of high-level semantics deployed on fragile syntax. Even a trivial mistake can clobber the entire system, generally providing good war-story fodder. http://www.ddj.com
CAN-2005-1974: Unknown vulnerability in Java 2 Platform, Standard Edition (J2SE) 5.0 and 5.0 Update 1 and J2SE 1.4.2 up to 1.4.2_07 allows applications to assign permissions to themselves and gain privileges.
Sun’s description adds a few more details to that spare note. A vulnerability in the Java Runtime Environment may allow an untrusted applet to elevate its privileges. For example, an applet may grant itself permissions to read and write local files or execute local applications that are accessible to the user running the untrusted applet.
Even if you eschew C, it can still clobber you. Ground Rules While it’s easy to blow off desktop problems as irrelevant to embedded systems, that’s simply ignoring some conspicuous evidence of trouble ahead. None of the problems in the MITRE list got there because somebody thought it’d be cool to leave that error in the code, okay? If you’re using C, which is highly likely for a typical embedded system, you must assume that the language is working against you. Establish a style guide, turn on all the compiler’s checks, use Lint, shake your heap, do everything you can to flush out errors before they hit the field. Keep in mind that all those errors came from folks who are at least as smart as you are and who thought they knew what they were doing, too. If an attacker can gain sufficient access to your system to run a program, even a crash can compromise the kernel. An unattended box that automatically restarts after a crash may remove any evidence of the compromise, while leaving the attacker in control. If your embedded system controls a valuable device or provides a critical service, it will attract more highly motivated attackers than a typical desktop box. Worse, they may value the system for reasons that have nothing to do with its code or data, as demonstrated by the myriad compromised Windows boxes on the Internet. Indeed, a clever attacker may leave the normal functions running, while siphoning off CPU cycles and a little network bandwidth. Perhaps the recent decrease in zombie attacks simply means a million PCs are quietly cracking somebody’s crypto instead? If your device depends on an Internetfacing server for critical functions or updates, your security strategy must succeed despite a complete compromise of the other end of the conversation. In fact, you cannot assume the Internet infrastructure has your best interests in mind. Even if you’re using, say, Oberon (possible, but so weird that you’ll certainly confuse the attackers, not to mention any bystanders), keep in mind that http://www.ddj.com
any system has flaws. You must provide an in- depth defense, so that a single catastrophic failure doesn’t provide a direct route to the kernel. That’ll make for a good story, too. Reentry Checklist Windows Media Center features appear at http://www.microsoft.com/windowsxp/ mediacenter/evaluation/features.mspx. That quote’s from Joni Mitchell in “Barandgrill” on her For the Roses album. The Microsoft TechNet Security Center is at http://www.microsoft.com/technet/ Security/default.mspx and SuSE’s security announcements are at http://www.novell .com/linux/security/securitysupport.html. Mitre’s Common Vulnerabilities and Exposures list is at http://cve.mitre.org/. The
Dr. Dobb’s Journal, October 2005
CAN-prefixed numbers I use will change to CVE-prefixed numbers at about the time you read this, so adjust your searches accordingly. The first four digits of each CVE number indicate when the number was allocated, so it’s entirely possible to have an oldlooking CVE record holding a new error. More than you want to know about APL starts at http://burks.bton.ac.uk/ burks/language/apl/. You can find your favorite language at http://people.ku.edu/ ~nkinners/LangList/Extras/langlist.htm and download a genealogy poster from http://www.oreilly.com/pub/a/oreilly/ news/languageposter_0504.html.The Oberon home page is at http://www .oberon.ethz.ch/. DDJ
81
CHAOS MANOR
Deep Impact Jerry Pournelle
N
ASA’s Project Deep Impact (http:// deepimpact.jpl.nasa.gov/), which provided us a first look inside a comet, was a triumph and reminiscent of the old glory days when we expected everything and got more. NASA’s Project Deep Impact wasn’t quite like Deep Impact, the movie, in which Robert Duvall brilliantly played a thinly disguised Pete Conrad, but it was exciting enough, and the fountain of matter spewing from the comet should tell us a lot about what comets are made of. At the moment, there’s so much dust, it’s a bit early to tell, but “dirty snowball” (or Hot Fudge Sundae) still seems reasonable. Whatever the answer, we have asked the question properly, and NASA and JPL deserve plenty of credit. We can also congratulate them on Pathfinder and Sojourner, which continue to operate long past their planned useful lives. Perhaps the awful times are over. No more sending up probes with part of the team calculating in English units while others insist on Metric, and neither bothering to tell the other. Or the probe that locked its landing gear in plenty of time for landing, while another team used the “landing gear locked” as a signal to shut off the engines, thus dropping a Mars Lander from 50,000 feet (or 15,240 meters) onto the surface. I mention these embarrassments not to detract from the latest triumphs, but to remind everyone that without some adult supervision, even the smartest people can do some awfully silly things. Just like computers. Also, I confess some gratification when both scientists and science writers talked about Lucifer’s Hammer, the novel I wrote with Larry Niven. Alas, it is also clear that NASA thinks it will take a very long time and a lot of money to return to the Moon, much less to go to Mars. As one of my readers put it, every man who has walked on the Moon may well be dead before another goes there. And yet, I remember when we had never been there at all, and I was thought mad to say I would live until we got there.
The Future of Weapons of Mass Destruction Deep Impact the Project wasn’t the only science-fiction-like event in the past few Jerry is a science-fiction writer and senior contributing editor to BYTE.com. You can contact him at [email protected]. 82
weeks. In the summer, I traveled to England to take part in a conference sponsored jointly by the Defense Departments of the United States and the United Kingdom on the “Future of Weapons of Mass Destruction.” The speakers included not only senior police, military, and civilservice experts involved with the problem, but also, notably, my colleagues Orson Scott Card, Wil McCarthy, Allen Steele, and Vernor Vinge. The conference took place in the Wilton Park Conference Center, which was founded by Winston Churchill who viewed it as a tool for reaching international cooperation, and conferees came from many parts of the world. House rules forbid me to quote any of the conferees. There will be a conference report written by the Center Director, Dr. Richard Latter, and I’ll let you know when it’s available. The conference was as much for the education and exchange of views of the participants as anything else; while science-fiction writers don’t have much responsibility for dealing with the future other than speculating about it, most of the participants do. There were senior police, diplomatic, arms control, and compliance officers responsible for being sure that the military doesn’t violate international conventions, and other such officials. I think some of them were stunned by the list of potential threats my colleagues and I were able to generate, but everyone said it had been a good investment of their time. I had been home less than a week when the bombs went off in London. Had that happened during the conference, it would probably have changed the nature of the discussions. Then perhaps not. They were pretty good without that stimulus. Inventing the Future I can’t talk about what the others said, but I can give you a summary of part of my presentation. Long ago, Dandridge Cole said we cannot predict the future, but we can invent it. It’s a good point. The best way to get where we would like to be is to see where we are, decide where we want to be, and invest in the technologies that will get us there. This seems like a good thing to do. We may also get places we had not intended or even dreamed of. In 1964, I was the General Editor of a classified USAF Systems Command study called “Project 75.” This was conducted by the San Dr. Dobb’s Journal, October 2005
Bernardino campus of the Aerospace Corporation and was done by the Ballistic Systems Division in conjunction with the Air Systems Divisions companion study Project Forecast directed by Col. Francis X. Kane. As to why San Bernardino, General Schriever wanted this study group as far from the Pentagon as you could get and still be in the continental U.S. To get to San Bernardino, you had to go to Los Angeles, then drive back a hundred miles into the low desert. It was also a place few wanted to go. Project 75 was ambitious. The goal was to look at everything we knew about ballistic missiles, including what we knew about the Soviet programs, then project that to the year 1975. We would then look at what we’d need in 1975 to fulfill the USAF missions and that would help determine what technologies we should begin developing in 1965 so we would have in 1975 the future — at least, in the realm of ballistic missiles — we wanted. The study was large and highly classified, and many of those who worked on it, including me, were not authorized to see the end product (even though, in my case, I had written just about every word in it — which makes more sense than is apparent at first sight). There were a number of conclusions and recommendations for new technology development, but one stood out — we needed better accuracies at intercontinental range. If you want to hit the other guy’s weapons and minimize damage to his cities, you want to use small accurate birds rather than monster nukes. One way to get that accuracy was to develop better inertial platforms with smaller and more accurate gyros so that the missile knew where it was at all times. We were already working on that, moving inertial gyros from basketball-sized with mechanical coupling to grapefruit-sized with laser data acquisition. The next step was to make use of that better position information, and the only way to do that was by onboard guidance computers. (A moment’s thought will show that you don’t dare allow an ICBM to accept midcourse corrections from ground bases.) We recommended development of onboard guidance computers. That required computers that were much smaller and lighter than any then in existence. This, in turn, required Large-Scale Integrated http://www.ddj.com
Circuits. Accordingly, the Department of Defense directed investments into LSIC technologies. The result was better onboard guidance, and thus, far more accurate missiles, which was the future we were trying to invent — but there were other results. The work led to the 4004 chip, then the 8080 by way of the 8008. That was good enough to power small general-purpose computers, such as the Altair, Sphere, Imsai — all now on display in the Smithsonian, right next to Ezekiel, my old friend who happened to be a computer. The personal computer was born. So was BYTE magazine and Dr. Dobb’s Journal and this column. In 1980, I predicted that as a result of the small computer revolution, “by the Year 2000, everyone in Western Civilization will be able to get the answer to any question that has an answer.” That happened pretty well on schedule. But while Project 75 led, pretty directly, to the computer revolution that produced the Internet and World Wide Web and had enormous impact on the lives and occupations of most of the inhabitants of Western Civilization, that wasn’t what we set out to do. All we were trying to do was make our missiles more accurate. The conclusion is obvious: You can invent the future, but you can’t predict it even as you are inventing it. Vernor Vinge and Wil McCarthy said much the same things using different examples. McCarthy is president of a nanotechnology company, so naturally he raised the question of threats like “gray goo” in which nanobots seek to convert everything they can find into copies of themselves. That got me thinking along the lines of Fred Saberhagen’s berserkers — war robots that seek to eliminate all life in the universe — and what they might do with nanobots. That led to wondering if “gray goo” could evolve once it had destroyed everything else. The resulting discussion was continued at dinner and was pretty terrifying, but was probably more interesting to the science-fiction writers than the police and military policy people. Installing XP64: A Journey Worth Taking We’ve been testing out a Tyan Tiger K8WE-based system and a HewlettPackard xw9300 workstation, both dual Opteron systems. Both are based on the same nVidia nForce Professional chipset, and in fact, the motherboards were developed in parallel by Tyan and HP. Until recently, these systems didn’t have comprehensive 64-bit XP drivers, so all that spacious RAM (4 and 8 GB!) was going to waste. Once the 64-bit drivers were available, we decided to experiment with the Tyan system, setting it up for dual boot http://www.ddj.com
(32- and 64-bit Windows XP). The story has a happy ending, but like so much around Chaos Manor, it took some detours to get there. The Tyan system (Atlas, by name) is built in an Antec Server Style tower case with a 550-watt TruePower ATX12V 2.0 power supply. The power supply was a bit of a challenge, and is necessary to run dual video cards. Atlas has an nVidia PCI-E Quadro FX 4400 and a PNY Quadro FX 540 (also nVidia-based) card, both of which require a 6-pin PCI-Express power connector. This is not the familiar +5/+12V disk-drive connector, but rather a new squarish 6-pin connector that looks just like a P4 motherboard power connector. (This connector can be seen at http://images.anandtech.com/reviews/ shows/computex/2004/nv45update/ powerconnector.jpg.) Both systems have been set up with 32bit Windows XP for several months, and have proven to be very stable performers. They are not remarkable merely for their greater memory, faster CPU, bigger disks, or stratospheric video speed, but rather the entire combination; while the lulling incrementalism of the past few years of computing improvements seems boring, occasionally we realize that, no, this batch of machines really is more useful than those of two years back, particularly for high-performance tasks. Still, the increase hasn’t been particularly big; we’ve been waiting for the 64bit revolution, which is only now making its presence felt. Alex Pournelle decided to see if he could make Atlas dual-boot; for now, keeping the 32-bit Windows install is essential, because David Em uses this machine as his secondary creative station. Fortunately, we had the luxury of both Hitachi Serial ATA and Seagate SCSI-320 drives, connected to the motherboard. The S-ATA drives were the boot and data drives, with the SCSI drives as secondary data drives. We thought to use one of the SCSI drives as the 64-bit boot drive. 64-bit Windows XP installs identically to 32-bit; the entire first-pass procedure runs almost indistinguishably, then the system reboots and tries to start up. In this case, “tries” is the operative word. While the 32bit partition started fine, trying the 64-bit one just crashed the machine. A bit of sober reflection revealed the reason: Boot block limitations. With the primary boot drive being Serial ATA, the boot block couldn’t properly address a drive of a different technology type (SCSI), and got confused trying. So back to square one. We cleared off the second S-ATA drive, copying its contents to the SCSI drive instead, rebooted with the install CD again, and reinstalled Dr. Dobb’s Journal, October 2005
to the second S-ATA drive. The first-pass install finished, reboot again. Second-pass installation proceeded for a while, but then the Windows installer suddenly couldn’t find some files that were obviously on the CD. Worse, it couldn’t find the CD drive at all! Dan Spisak took over and diagnosed the problem in about five minutes — the Seagate external FireWire drive was connected to this computer, and it was confusing the Windows installer because the drives had been relettered after startup. (Remember that, during installation, there isn’t a complete Windows, and the installer only has limited intelligence for such problems.) Note also that this problem is related to the difficulty RAID 1 machines have booting up when there are USB or Firewire external drives connected. BIOS writers and Microsoft need to get together to fix this once and for all. Moral of this story: Remove all removable drives and media during installation, and save yourself the hassle. Ignore the temptation to connect everything at once, and wait until after Windows is stable before adding anything. This applies to all “new” installations, and XP64 counts as a new installation, even if it’s on a computer that already has XP on it. The “new installation” rule applies to applications, as well. Winding Down The computer book of the month is Robert and Barbara Thompson’s Astronomy Hacks (O’Reilly & Associates, 2005). I’m not an amateur astronomer, although I’ve often thought I’d like to try it. For those in the same situation, this book will tell you whether it’s worth your while, and if so, not only how to get started but how to be pretty good at it. If you’re already deep into amateur astronomy, I’m not competent to judge whether you need this book because I don’t know how much you know; but having known Thompson for a while, I’d be astonished if you didn’t find out things you never knew in every chapter. I asked him to pick out a representative hack, and he chose #29 as his favorite: “Plan and Prepare for a Messier Marathon: Locate, observe, and log all 110 Messier Objects in one night.” I know just enough to know what Messier Objects are, but I’m certain I have never seen all 110 of them. I probably could have when I did my turn on the board of the Lowell Observatory, but I was too busy getting Shoemaker a computer. Ah, well. As with all the books Bob Thompson does with his wife, it’s both technically competent and very readable. DDJ 83
PROGRAMMER’S BOOKSHELF
Crunchin’ That Data Michelle Levesque
A
few weeks ago, a customer handed me some files that I had requested. We had previously agreed upon a convention for storing date information in these files and had decided that every file would include the date as its first line in the format YY-MM-DD, such as 05-06-23. When the files were given to me, however, the first line was in the form DD-Month-YYYY, such as 23-June-2005, and sometimes the file contained a blank line or two before the date. There were hundreds of files, so fixing this problem was too big a job to do by hand, but it wasn’t a task that I planned on repeating several times, so it didn’t warrant a lengthy design and development cycle. This type of small data manipulation task falls right into the realm of a recent addition to the Pragmatic Bookshelf — Greg Wilson’s Data Crunching. The book promises a pragmatic look at some of the most useful data-crunching techniques, and its delivery on this promise is stellar. From cover to cover, Data Crunching provides an exceptionally practical look at how to save time and effort when it comes to doing that “other stuff” that seems to creep up on every project. (In the spirit of full disclosure, I need to mention that Greg Wilson is an Adjunct Professor in Computer Science at the University of Toronto, where I am an undergraduate student. In addition, Wilson is a DDJ contributing editor, too.) Wilson’s clear, concise (and often humorous) writing makes it easy to linearly Michelle is a computer-science student at the University of Toronto. She can be contacted at [email protected]. http://www.ddj.com
Data Crunching: Solving Everyday Problems Using Java, Python, and More Greg Wilson Pragmatic Bookshelf, 2005 188 pp., $29.95 ISBN 0974514071 consume its 188 pages, but the book’s examples and structured layout make it equally valuable as a reference text. Wilson dedicates a chapter to each of the most common aspects of data crunching— text files, regular expressions, XML, binary files, and relational databases. There’s also a chapter on unit testing, dates and times, encoding, and other “horseshoe nails” that are described as “apparently trivial things that can bring the whole system crashing down when they go wrong.” Though all of the important datacrunching techniques and idioms were included in this remarkably succinct book, its real strength comes from the fact that it never leaves the real world behind. Realworld programmers have to work on multiple platforms, and so the examples are appropriately platform independent. Realworld programmers code in dozens of different languages, and though the book’s examples are mostly Java and Python, the wisdom behind them transcends any one specific programming language. And, most importantly, real-world data is messy, and so Data Crunching continuously reminds you that users will add in capital letters where you didn’t expect them, and edge cases can’t be forgotten. Incomplete and unexpected data are realities of data Dr. Dobb’s Journal, October 2005
crunching, and rather than avoid the issue, the author jumps right into it and begins to explain how to deal with it. Each of the topics of the book are taught through clear, practical examples. The examples and code are simple enough to be understandable upon first read but never feel trivial. Each one is powerful enough to be directly reused or altered slightly to solve some future problem. The author does a remarkable job of justifying his decisions at each point along his examples’ narratives, and always favors the practical approach over one that might be found in other data-manipulation texts. For example, he covers not only unit testing, but also some simpler alternatives for when an entire testing infrastructure would be overkill. The listed skill range for this book is beginner to intermediate, but I would argue that it’s appropriate for every developer, whether as a first-time instructional tool or a reference guide for the seasoned professional. Wilson’s text and examples have been woven together into 188 succinct pages of wisdom and pragmatic advice for programmers of all levels. My copy of Data Crunching lives on top of my computer monitor where it’s within arms length at all times. Regardless of what computing field you’re in, you’ll find this book to be valuable. Data manipulation tasks won’t ever go away, but this book provides the strategy and mindset necessary to spend less time on data crunching and more time on the rest of your programming. DDJ 85
OF INTEREST
Wibu-Systems AG has launched updated versions of its Wibu-Key 5.0 and CodeMeter 2.10 antipiracy tools. Wibu-Key offers copy protection and license-management using hardware-based encryption. WibuKey 5.0 supports “Hot Plugging,” which lets you “plug-in” and “plug-out” of the Wibu-Boxes at any time. The CodeMeter 2.10 Digital Rights Management system includes a daemon mode that can be used for special system services. At the heart of CodeMeter is the CM-Stick, a USBbased encryption and storage device. The CM-Stick is available without flash disk or in different memory sizes starting with 128 MB. Wibu-Key and CodeMeter are available for Windows, Mac, and Linux. Wibu-Systems AG Rueppurrer Strasse 52– 54 76137 Karlsruhe, Germany + 49-721-93172-0 http://www.wibu.com/
joined in real time with XSL. Multiple SQL query also competes in the heterogeneous database migration market. Rather than using database-specific SQL dumps or flat files that cannot contain table relationships or constraints, the complete database structure and data can be persisted into XML for easy access or transport, and recreated on any other database platform at minimal cost. Other enhancements in the Allora 4.1 Mapper include support for namespace definitions, complex database expressions, NetBeans 4.1, and stored procedures in Oracle packages. HiT Software Inc. 4020 Moorpark Avenue, Suite 100 San Jose, CA 95117 +1-408-345-4001 http://www.hitsw.com/ Red Gate Software has released SQL Log Rescue, which enables undo and redo of individual SQL Server database transactions. SQL Log Rescue examines both backup files and live transaction logs to ensure full data recovery. Red Gate Software St John’s Innovation Centre Cowley Road Cambridge CB4 0WS United Kingdom +1-866-733-4283 http://www.red-gate.com/
SplineTech has released JavaScript HTML Debugger, which lets you edit and debug JavaScript and VBScript inside HTML pages without inserting additional lines of code to handle the debugging process. Clientside JavaScript, JScript, and VBScript debugging languages are fully supported for simple and complex HTML and DHTML debugging scenarios. Spline Technologies Corp. 801–110 rue de La Barre Longueuil (G. Montreal area) Quebec, Canada J4K 1A3 +1-514-907-1677 http://www.RemoteDebugger.com/
PrismTech has announced the launch of its Spectra Power Tools productivity suite for Software Defined Radio (SDR) developers, which addresses the software development and deployment lifecycle for SDR development. Spectra Modeling Tools provide a visual approach to waveform and radio platform development. They support modular development, the outsourcing of relevant functionality (if desired), and a common reference methodology for developers and systems engineers. Furthermore, Spectra Power Tools inherently facilitate reconfiguration and reuse, thus independently extending the life of both the radio platform and software applications (waveforms). PrismTech Corp. 6 Lincoln Knoll Lane, Suite 100 Burlington, MA 01803 +1-781-270-1177 http://www.prismtech.com/
Hit Software has released Allora 4.1, its middleware reference for XML to database integration projects. Allora lets you export data from any relational database to XML, or map and transform XML data to relational data. Among other features, Allora 4.1 introduces a multiple SELECT feature, which lets you work with multiple submaps that are then
OC Systems has released Hitchhiker for Eclipse, an Eclipse plug-in and runtime engine that provides tracing, profiling, memory leak, tracking, and function coverage tools for C/C++ applications. Hitchhiker collects performance and control flow data by automatically inserting machine code instrumentation into an application. Data collection is configured us-
http://www.ddj.com
Dr. Dobb’s Journal, October 2005
ing the Eclipse Workbench. The application can be launched either manually on the target machine or from the Eclipse Workbench; in either case, Hitchhiker detects and instruments the target application. Hitchhiker is nonintrusive; the application can be traced and profiled at near full execution speed. Collected data is sent back to the Eclipse Workbench and viewed in real time using a variety of data visualization schemes. OC Systems Inc. 9990 Lee Highway, Suite 270 Fairfax, VA 22030 +1-703-359-8160 http://www.ocsystems.com/eclipse/ SearchBlackBox Software has released its SearchBlackBox SDK 1.0, a library that lets you add full text search capabilities to .NET applications in only a few lines of code. SearchBlackBox SDK is a C#-based native .NET assembly useful in a broad range of applications such as web site search fields, online documentation, document management systems, content search solutions, and more. SearchBlackBox SDK does not require any preinstalled ActiveX or COM objects and is packaged in one DLL file. SearchBlackBox Software generala Antonova 4-2-166 117279 Moscow, Russia http://www.searchblackbox.com/ Catalyst Systems has released Version 6.4 of its Openmake tool. Openmake 6.4 supports IBM- and Eclipse-based software development and Perl environments. Openmake replaces make and Ant/XML scripts with generated Build Control Files that follow comprehensive construction rules. Openmake is designed to locally and remotely build components destined for a variety of deployment platforms including embedded devices, handhelds, workstations, and servers. Catalyst Systems Corp. 213 West Institute Place, #404 Chicago, IL 60610 +1-800-359-8049 http://www.openmake.com/ DDJ Dr. Dobb’s Software Tools Newsletter What’s the fastest way of keeping up with new developer products and version updates? Dr. Dobb’s Software Tools e-mail newsletter, delivered once a month to your mailbox. This unique newsletter keeps you up-to-date on the latest in SDKs, libraries, components, compilers, and the like. To sign up now for this free service, go to http://www.ddj.com/maillists/.
87
SWAINE’S FLAMES
Advice to Merlin
I
have been following with considerable interest the challenge that bloggers present to the mainstream media (MSM, to use the blogosphere acronym). On important news stories — political or technical — I now find that I need to track both the MSM and certain key bloggers to get anything like a true picture of events. I also find that my list of prime news sources changes from day to day and issue by issue. In all of this, I pretend to myself that I am a bemused observer rather than a confused participant. I’d love to see the MSM respond to the challenge of blogs by getting better at its job and welcoming the bloggers as a force to keep it on its toes and honest. The optimist in me believes that this is possible. The cynic in me says that all the MSM seems to be picking up from the blogosphere is an independent attitude toward spelling and fact-checking. That same cynic notes that while the MSM gives us more channels of news today than in the 1950s, those channels are all controlled by a small clique of large corporations increasingly willing to shape the news to their partisan interests, and that they seem to be sliding down a steep decline in professionalism from the bright days of Edward R. Murrow to the present dark night of Bill O’Reilly— and the cynic thanks whatever cynics thank for Markos Moulitsas of the Daily Kos. So naturally, when I heard reports that Apple was including Intel’s Trusted Platform Module (TPM) security chip in its Intel-based Macintoshes, I skimmed the whole range of technology news and Apple-watcher sources for reactions. What I got was a correspondingly wide range of views, from “Let’s wait and see” to “I’m having my Apple tattoo removed.” The TMP chip contains a serial number that lets the OS check that it is running on particular hardware, and some of the news sources, such as the U.K.-based VNUnet, figure that this is Apple’s reason for including the chip — to keep anyone from (too easily) installing Mac OS X on a nonMacintosh computer. Then, too, Microsoft is a member of the Trusted Computing Group that is behind the chip, and Open for Business suggests that Apple might be motivated by the belief that you will need a TCG-friendly chip in your computer to talk to Microsoft platforms in the future. Neither of these scenarios is in itself particularly frightening. But paranoia about this chip and the organization behind it is entirely appropriate. This Trusted Computing business looks to me to be all about industry control over computer-user behavior, including the possibility of prohibiting perfectly legal behavior. And it opens doors best left closed, doors to government and corporate monitoring of individuals and remote censorship (Microsoft seems particularly interested in this capability). I won’t try to summarize the concerns that have been raised about Trusted Computing here, nor will I prejudice your own research by telling you what sources to read, I will just suggest that you google “Trusted Computing.” Trusted Computing seems to me to be part of a larger threat, one of useful technology being subverted to the illegitimate power grabs of governments and corporations. Maybe your paranoia is of a different flavor than mine, and you’d point to the danger of white-collar criminals and international terrorists misusing technology. In any case, technology is power today, more than ever in history, and new technologies are emerging at a rapid pace. And by and large, these new technologies are being released into the wild without thought for how they might be misused. If that was ever morally justifiable, I suggest that it isn’t now. Those who build and deploy new technologies need to think about how they might be used. And a little paranoia is entirely appropriate when imagining the possible misuses of your technology. When the news gets me down, I turn to fiction. I’ve just been rereading Roger Zelazny’s tenvolume Amber series, and I came across this exchange between a software engineer/sorcerer and his mentor: Mandor: “You designed a remarkable machine, and it never occurred to you it might also become a potent weapon.” Merlin: “You’re right. I was more concerned with solving technical problems. I didn’t think through all the consequences.” Me: You live in the world, Merlin. Think through the consequences.