,
Dr. Dobbs J O U R N A L
#364 SEPTEMBER 2004
SOFTWARE TOOLS FOR THE PROFESSIONAL PROGRAMMER
http://www.ddj.com
DISTRIBUTED COMPUTING Grid Computing & The Linda Programming Model
Debugging Distributed Systems SCTP: The Next-Generation TCP?
Bonus Eclipse Coverage
Continuous Integration & .NET
Eclipse & General Purpose Applications Writing Plug-Ins In C/C++ for Eclipse CDT
Version Control, Builds, & Testing
RFID: You Make the Call Al Stevens on
Band-In-A-Box, Finale, & MusicXML Michael Abrash on
Performance Optimization The New Qt Designer IDE Bluetooth & Remote UIs Improving .NET Events Building MFC Dialogs at Runtime
Erich Gamma & Kent Beck on
Contributing To Eclipse
FEATURES
C O N T E N T S
SEPTEMBER 2004 VOLUME 29, ISSUE 9
Grid Computing & the Linda Programming Model 16 by Rob Bjornson and Andrew Sherman
Compared to web services, the Linda programming model provides a number of advantages for building grid apps.
Debugging, Message-Oriented Middleware, & Distributed Systems 26 by Paul Pazandak and Steve Ford
If you’ve built distributed messaging systems, you know about problems with tracking down and isolating bugs.
Stream-Control Transmission Protocol 32 by Ian Barile
The Stream Control Transmission Protocol is a new transport layer protocol that offers an alternative to TCP.
Continuous Integration & .NET: Part II 37 by Thomas Beck
Thomas introduces a complete Continuous Integration solution.
RFID Blocker Tags 42 by Burt Kaliski
Blocker tags let you choose when, where, and what RFID devices are tracking you.
Optimizing Pixomatic for x86 Processors: Part II 46 by Michael Abrash
Michael discusses his greatest performance challenge ever— optimizing an x86 3D software rasterizer.
Band-In-A-Box, Finale, & MusicXML 52 by Al Stevens
Al converts Band-In-A-Box file formats into Finale notation files using MusicXML as a porting medium.
C#, COM Objects, & Interop Services S1 by Shehrzad Qureshi
Shehrzad implements an ActiveX control in both C++ and C#.
Improving .NET Events S6 by Richard Grimes
.NET provides facilities for writing your own event mechanisms — if you know how.
Building MFC Dialogs at Runtime S11 by Adrian Hill
Adrian presents a class for defining MFC-based dialogs.
The Qt Designer IDE 57 by Dave Berton
With Qt Designer 3.3.1, you have a feature-rich IDE for designing and coding GUI applications.
Eclipse & General-Purpose Applications 66
by Todd E. Williams and Marc R. Erickson
Eclipse provides the framework for combining disparate tools into a single integrated application.
Writing Plug-Ins in C/C++ for Eclipse CDT 70
by Doug Schaefer and Sebastien Marineau-Mes
The Eclipse CDT Project delivers a fully functional C/C++ IDE for the Eclipse platform.
Contributing to Eclipse 74
by Kent Beck and Erich Gamma
Eclipse’s plug-in architecture means that every programmer is potentially a toolsmith.
Tools for Domain-Specific Modeling 79 by Steven Kelly
The Eclipse Modeling Framework and Graphical Editor Framework provide a domain-specific modeling solution.
EMBEDDED SYSTEMS Bluetooth & Remote Device User Interfaces 61 by Richard Hoptroff
The FlexiPanel Bluetooth Protocol is a remote UI service for computers, electrical appliances, and other devices.
FORUM EDITORIAL 8 by Jonathan Erickson LETTERS 10 by you DR. ECCO’S OMNIHEURIST CORNER 12 by Dennis E. Shasha NEWS & VIEWS 14 by Shannon Cochran OF INTEREST 94 by Shannon Cochran SWAINE’S FLAMES 96 by Michael Swaine
RESOURCE CENTER As a service to our readers, source code, related files, and author guidelines are available at http:// www.ddj.com/. Letters to the editor, article proposals and submissions, and inquiries can be sent to
[email protected], faxed to 650-513-4618, or mailed to Dr. Dobb’s Journal, 2800 Campus Drive, San Mateo CA 94403. For subscription questions, call 800-456-1215 (U.S. or Canada). For all other countries, call 902-563-4753 or fax 902-563-4807. E-mail subscription questions to ddj@neodata .com or write to Dr. Dobb’s Journal, P.O. Box 56188, Boulder, CO 803226188. If you want to change the information you receive from CMP and others about products and services, go to http://www.cmp .com/feedback/permission.html or contact Customer Service at the address/number noted on this page. Back issues may be purchased for $9.00 per copy (which includes shipping and handling). For issue availability, send e-mail to
[email protected], fax to 785838-7566, or call 800-444-4881 (U.S. and Canada) or 785-8387500 (all other countries). Back issue orders must be prepaid. Please send payment to Dr. Dobb’s Journal, 4601 West 6th Street, Suite B, Lawrence, KS 66049-4189. Individual back articles may be purchased electronically at http://www.ddj.com/.
COLUMNS Programming Paradigms 83
Chaos Manor 89
by Michael Swaine
by Jerry Pournelle
Embedded Space 86
Programmer’s Bookshelf 92
by Ed Nisley
by Lynne Greer Jolitz
NEXT MONTH: Intelligent systems are our focus in October.
DR. DOBB’S JOURNAL (ISSN 1044-789X) is published monthly by CMP Media LLC., 600 Harrison Street, San Francisco, CA 94017; 415-905-2200. Periodicals Postage Paid at San Francisco and at additional mailing offices. SUBSCRIPTION: $34.95 for 1 year; $69.90 for 2 years. International orders must be prepaid. Payment may be made via Mastercard, Visa, or American Express; or via U.S. funds drawn on a U.S. bank. Canada and Mexico: $45.00 per year. All other foreign: $70.00 per year. U.K. subscribers contact Jill Sutcliffe at Parkway Gordon 01-49-1875-386. POSTMASTER: Send address changes to Dr. Dobb’s Journal, P.O. Box 56188, Boulder, CO 80328-6188. Registered for GST as CMP Media LLC, GST #13288078, Customer #2116057, Agreement #40011901. INTERNATIONAL NEWSSTAND DISTRIBUTOR: Worldwide Media Service Inc., 30 Montgomery St., Jersey City, NJ 07302; 212-332-7100. Entire contents © 2004 CMP Media LLC. Dr. Dobb’s Journal is a registered trademark of CMP Media LLC. All rights reserved.
http://www.ddj.com
Dr. Dobb’s Journal, September 2004
5
,
Dr.Dobbs J O U R N A L
PUBLISHER Timothy Trickett
SOFTWARE TOOLS FOR THE PROFESSIONAL PROGRAMMER
EDITOR-IN-CHIEF Jonathan Erickson
EDITORIAL MANAGING EDITOR Deirdre Blake MANAGING EDITOR, DIGITAL MEDIA Kevin Carlson SENIOR PRODUCTION EDITOR Monica E. Berg NEWS EDITOR Shannon Cochran ASSOCIATE EDITOR Della Song ART DIRECTOR Margaret A. Anderson SENIOR CONTRIBUTING EDITOR Al Stevens CONTRIBUTING EDITORS Bruce Schneier, Ray Duncan, Jack Woehr, Jon Bentley, Tim Kientzle, Gregory V. Wilson, Mark Nelson, Ed Nisley, Jerry Pournelle, Dennis E. Shasha EDITOR-AT-LARGE Michael Swaine PRODUCTION MANAGER Douglas Ausejo INTERNET OPERATIONS DIRECTOR Michael Calderon SENIOR WEB DEVELOPER Steve Goyette WEBMASTERS Sean Coady, Joe Lucca CIRCULATION SENIOR CIRCULATION MANAGER Cherilyn Olmsted ASSISTANT CIRCULATION MANAGER Shannon Weaver MARKETING/ADVERTISING ASSOCIATE PUBLISHER Brenner Fuller MARKETING DIRECTOR Jessica Hamilton AUDIENCE DEVELOPMENT DIRECTOR Ron Cordek ACCOUNT MANAGERS see page 95 Michael Beasley, Randy Byers, Andrew Mintz, Kristy Mittelholtz SENIOR ART DIRECTOR OF MARKETING Carey Perez DR. DOBB’S JOURNAL 2800 Campus Drive, San Mateo, CA 94403 650-513-4300. http://www.ddj.com/ CMP MEDIA LLC Gary Marshall President and CEO John Day Executive Vice President and CFO Steve Weitzner Executive Vice President and COO Jeff Patterson Executive Vice President, Corporate Sales & Marketing Mike Mikos Chief Information Officer William Amstutz Senior Vice President, Operations Leah Landro Senior Vice President, Human Resources Sandra Grayson Vice President & General Counsel Robert Faletra President, Group Publisher Technology Solutions Vicki Masseria President, Group Publisher Healthcare Media Philip Chapnick Vice President, Group Publisher Applied Technologies Michael Friedenberg Vice President, Group Publisher Information Technology Paul Miller Vice President, Group Publisher Electronics Fritz Nelson Vice President, Group Publisher Network Technology Peter Westerman Vice President, Group Publisher Software Development Media Shannon Aronson Corporate Director, Audience Development Michael Zane Corporate Director, Audience Development Marie Myers Corporate Director, Publishing Services
American Buisness Press
6
Dr. Dobb’s Journal, September 2004
Printed in the USA
http://www.ddj.com
EDITORIAL
A Special Kind of Platform
T
o my way of thinking, every issue of Dr. Dobb’s Journal is something special. But in fact, this issue really is special in its look, feel, and content. As you can tell just by thumbing through the magazine, we’ve changed the paper stock to one that’s softer and less glossy, but easier to read. That’s what we were promised anyway. While a trial press run wasn’t possible, we did pour over other magazines that publish source code on similar paper and felt good about it. We hope you do. too. That’s the “feel” part of the look-and-feel. As for the look part, you’ll also notice that DDJ Art Director Margaret Anderson has been tinkering with the page design, in an effort to make DDJ easier to read. She’s cleaned up the headlines, punched up the pullquotes, and generally cleared the decks. In all likelihood, she’s not yet done, which means you may see more changes in the coming months. In terms of the content, we’re starting something special this month that, to my recollection, DDJ hasn’t done before — launching a multiissue, multiarticle, in-depth examination of a specific topic. In this issue, our special Eclipse coverage includes four articles on various aspects of the platform — a general background by Todd Williams and Marc Erickson (no relation) examining what Eclipse is, an article by Erich Gamma and Kent Beck on how you can participate in the Eclipse Project, one by Doug Schaefer and Sebastien Marineau-Mes on the Eclipse CDT, which delivers a C/C++ IDE for Eclipse, and another by Steven Kelly on the Eclipse Modeling Framework. We’ll continue the series in the October issue with articles on the Eclipse Visual Editor for Java, a plug-in that modifies the Eclipse development environment for ease of use, and refactoring with Eclipse. We’ll then wrap up the series in November by looking at how Eclipse is being used in embedded-systems development, how it is being used as a Rich-Client Platform, and what’s what with EMF, the Eclipse modeling framework. That said, DDJ’s coverage of Eclipse won’t go away with the November issue, however. We will continue covering the platform in subsequent months. So what’s the attraction? Simply put, Eclipse is a powerful open-source IDE built on top of a plug-in architecture. Moreover, Eclipse projects and subprojects are available under the Common Public License, which has been approved by the Open Source Initiative. This royalty-free license lets you use and redistribute Eclipse for both commercial or noncommercial purposes (for licensing details, see http://www.eclipse.org/legal/cpl-v10.html). Finally, with the recent release of Version 3.0, coverage of Eclipse is timely. The 3.0 release is a significant step forward with its focus on the development of a Rich-Client Platform, UI responsiveness, an improved user experience, and tools that go beyond Java source file manipulation. As our Eclipse series rolls out over the coming months, I’ll be curious to know what you think both in terms of the content itself and this broad, in-depth approach. If you’d like more series such as this, let us know— and don’t be afraid to suggest topics you think are worth investigating. Shifting gears, it is interesting how the definition of the term “platform” has, well, shifted. It used to be that “platform” referred to the CPU and/or operating system, as in “a computing platform is the combination of computer hardware and operating-system software that defines a particular computing environment” (http://www.geodyssey.com/tutorial/tapp.html). Along the way, the definition was stretched to include stuff such as an “infrastructure…that gives people and businesses value” (http://discuss.fogcreek.com/joelonsoftware/default.asp?cmd=show&ixPost=57675). The first thing we knew, Java was categorized as a platform, followed by web browsers, and then IDEs such as Eclipse. More recently, applications and APIs have moved from the realm of “interfaces” to that of “platforms.” eBay, Amazon.com, PayPal, Google, and Yahoo!, among others, are referring to themselves as web-services-based “platforms” that make liberal use of the usual suspects —XML, SOAP, Java, XSLT, HTTP, and the like. In short, it seems the best way to define the word “platform” is simplest — it’s anything that you build on top of. When it’s all said and done, I suppose DDJ is in this sense a special kind of development platform, too. We present algorithms, source code, and the like, and you build systems using that information. Jeez, that sounds too much like marketing. If you don’t mind, I’ll just continue calling DDJ what it is — a magazine, albeit a special one.
Jonathan Erickson editor-in-chief
[email protected] 8
Dr. Dobb’s Journal, September 2004
http://www.ddj.com
LETTERS
, PO S S O BB
are: Cloque, 52,265 secs; FacePerm, 72,204 secs. German Gonzalez-Morris
[email protected]
T
D
2
C
T EN
S
2
Backtracking Algorithms Dear DDJ, In regard to Timothy Rolfe’s article “Backtracking Algorithms” (DDJ, May 2004), I found another discard rule to insert in the algorithm, doing the first algorithm (Cloque.java) as good (or better) than the second one (FacePerm.java). This new rule sums the complement of the numbers used, then divides by 3 (to get the triplets) the numbers left, then this result divides the sum, thus if the last result is bigger than the allowed value, it discards the rest. For instance, assume we are calculating 12 numbers and 21 as boundary if we are beginning our search having 1, 2, and 3 at the first positions face[0,1,2], we can sum and the complement will be 72 then divide the numbers left 9 by 3 (always three because it is a triplet —>9/3=3) then this new result divides 72 —>72/3=24. If this number is bigger than the allowed (21 in this example) we can discard the rest of permutations because we’ll always get an over number. The following code was inserted in the check method making the program more effective: // true/false only to enter or not to this // new check (optimization) if ( true ) { total = 0; for(idx=0; idx<=posn; idx++) total += face[idx]; total = size*(size+1)/2 - total; if ( total/((size-posn)/3D) > maxTot ) return false; }
The new performance is: Numbers 12 13 13 14 15
Limit Calls-Original 21 2,168,123 22 8,092,606 23 18,990,336 24 77,374,157 25 308,576,798
Calls-New 637,269 549,160 5,486,228 8,199,708 4,829,667
The new Cloque is similar to FacePerm in performance with the examples shown, but if you try 16 and 27 as numbers and limit, respectively, on my computer (PIII 500-MHz, 256 MB), the results 10
Dear DDJ, Regarding my article “Backtracking Algorithms” (DDJ, May 2004), Paul Purdom of Indiana University, Bloomington made the following observation clock face permutation backtracking: There is one simple test you can add to your code that should make it much faster yet. Suppose you have assigned numbers to the clock face for positions 1 to k, but have not yet assigned numbers for positions k+1 to N. You have the following situation: X[k–1] X[k] X[k+1] … X[N]
X[1]
X[2]
known known unknown … unknown known known
You also know that the sum of each three numbers should be no more than MaxSum, and you have Nk+2 groups of unknown sums-of-three. Let R be the sum of the numbers not yet assigned. Then X[k–1] appears in one group-of-three, X[k] appears in two groups, each X contributing to R appears in three groups, X [1] appears in two groups, and X[2] appears in one group. Thus, we need: (N– k+2)MaxSum>=X[k–1]+2X[k]+3R+2X[1]+X[2].
This, indeed, massively reduces the number of function calls and the time required to solve the problem. For instance, using the straight backtracking implementation in Java took 19.44 hours to discover that a clock face of 20 entries has 506 cyclic permutations with a maximum sum of 33. Doing the same thing after adding the bound that Professor Purdom provided, the program required was only 13.85 seconds. 5.3E+11 function calls got trimmed down to 7.0E+04. Timothy Rolfe
[email protected] C, as in Duct Tape Dear DDJ, I’ve been writing code (mostly in C) for the past 20 years and I’d like to comment on some of the points in Jerry Pournelle’s February 2004 column. • Strong type/range checking can prevent buffer overflows. This is true, and wonderful when a program runs in a veritable vacuum, like a spreadsheet that takes user input directly and read/writes its own files. It is also nearly true when a program communicates with a known partner for known purposes. Unfortunately, in the world of the Internet, one can rarely be assured that the program on the other end will always be the correct one or even the correct version of the correct program. That means programs must be Dr. Dobb’s Journal, September 2004
able to handle unexpected buffer lengths and sometimes must be able to determine the data type of incoming data after it is read—very, very difficult in strongly typed languages. Yes, better hardware can speed up the necessary hoop jumping, but a loosely typed language can easily adjust on the fly to new data types, at least when coded correctly. • Shooting feet. I once heard (from a fellow at DEC) this comment about programming in C: “We provide the gun and bullets, you provide the foot.” However, as most sane people know, just because you can do a thing, doesn’t mean you should do that thing! • C’s “near unreadability.” When I was learning C, the guys in my shop called it a “write-only language” because it was so hard to guess what another person meant by reading their code. However, that was just an excuse for writing poorly commented code that made use of every possible syntactical shortcut combined with lousy formatting. When I was given the opportunity (at my next job) to set the coding-style guidelines, things were much better… For the past 10 years, I’ve been coercing the other programmers to follow my style guidelines and none of us ever have trouble following each other’s code. Even customers who have purchased source licenses use our source as self training because it is easier to understand than many samples on the MSDN CD! • Other languages. You missed good old Ada in your list! I had to learn that while I was in the U.S. Air Force (in 1986, when the first validated compilers were becoming available) and I think it would’ve been a very good choice, too. Based on tests at the time, it was almost always true that a program developed in Ada was done once it would compile. Very few bugs were ever found after a compile completed successfully. On the other hand, there’s an old joke that may bear repeating here: “REAL PROGRAMMER’s can write Fortran in any language!” And my personal corollary: “Most programmers (or at least, people who call themselves programmers) write bad code in every language (even English or their native tongue).” I think of C as kind of like Duct tape — very flexible, easy to get something small done very quickly, great for a very wide number of jobs. But would anyone want to fly in an airplane made entirely out of Duct tape? Steve Valliere
[email protected] DDJ http://www.ddj.com
DR. ECCO’S OMNIHEURIST CORNER
Occam’s Ringleaders Dennis E. Shasha
A
s I entered Ecco’s apartment, I could see that he and his 10-year- old nephew Tyler were engaged in a heated discussion. “This makes no sense uncle,” Tyler said. “The rule in checkers that forces me to jump prevents a whole world of strategies. I don’t think it’s necessary.” Ecco chuckled. “You would have gotten along well with William of Occam young Tyler.” “Occam’s razor?” asked Liane. “Right,” Ecco responded, “the 13th century Franciscan theologian.” “I have his quote on my bedroom wall,” Liane interjected. ‘Plurality should not be assumed without necessity.’ he wrote. He used the principle to fight beliefs in witchcraft. At the time, people accused witches of every possible mishap. He thought that the mishaps could come from completely natural causes. I figure that they would have heard me argue about mathematics or cosmology and convicted me forthwith. So he would have been my natural defender and ally.” “I agree,” Ecco said smiling at his 16year-old niece who somehow managed to look quite stylish in her lacrosse uniform. “He wanted explanations to be much simpler. Justice is often simple. But we’ll come back to witchcraft and checkers another day, I promise. Today, I want to talk about using the Occam principles to find the ringleaders of a gang of 11 thieves. “Here is the data we have. There have been several stock-trading events on 11 stocks that the authorities believe are collusions in purchases and sales of stock. Represent a purchase by a 1 and a sale by a 0. The 11 conspirators are smart enough not to communicate by phone or e-mail. We believe they collude by their actions. There are a few ringleaders who perform actions and then their subordinate conspirators compute a particularly simple logical/arithmetic function to decide on their action. Dennis is a professor of computer science at New York University. His latest books include Dr. Ecco’s Cyberpuzzles: 36 Puzzles for Hackers and Other Mathematical Detectives (W.W. Norton, 2002) and Database Tuning: Principles, Experiments, and Troubleshooting Techniques (Morgan Kaufman, 2002). He can be contacted at
[email protected]. 12
“Our informant tells us that they use just two circuit elements: AND takes some collection of 1s and 0s as inputs and outputs a 1 if all the inputs are 1; otherwise, it ouputs a 0. ODD returns a 1 if the number of 1s in its input collection is odd and otherwise it returns 0. (Note that if there are no 1s in its input collection, ODD returns 0.) Just those two circuit elements. No more. “One or both of these circuit elements are used by each subordinate conspirator we believe. Here’s a warmup. There were 12 events and 6 conspirators who did the following (on six different stocks): conspirator conspirator conspirator conspirator conspirator conspirator
1: 2: 3: 4: 5: 6:
1 0 1 1 0 0
1 1 1 1 0 1
0 0 0 0 0 0
1 1 1 0 0 0
0 1 0 1 1 0
0 1 0 1 1 0
0 1 0 1 1 0
1 0 1 0 1 1
0 0 0 1 1 1
1 1 1 0 0 0
1 1 1 0 0 0
1 1 1 0 0 0
Which ones are the ringleaders? Try before you read on.” Solution to the Warm-Up Puzzle Conspirators 2, 4, and 6 are, or at least could be, the ringleaders. Conspirators 1 and 3 simply take the ODD function on the buy/sell values of the ringleaders. For example, in the first column, the values of conspirators 2, 4, and 6 are 0, 1, 0, respectively, so there are an odd number of 1s. In the second, 2, 4, 6 have the values 1, 1, 1, so again there are an odd number of 1s. Conspirator 5 computes the function: AND the results of 2 and 4, then take the ODD function on that result and the input from 6 (see Figure 1). .Algebraically, C5=ODD(C6, AND(C2, C4)). So, the first column produce a 0 from the AND and the input from 6 is also 0 so the result is 0. In the second column, the AND produces a 1, but the input from 6 is also a 1, so the number of 1s is even; hence, we get a 0 result. C2
C4
C6
AND
ODD
1. Ok, now it’s your turn. Here are the conspirators’ buy/sell decisions on 11 different stocks. conspirator 1: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 conspirator 2: 0 1 1 1 0 1 1 1 0 1 0 1 1 1 1 0 conspirator 3: 0 0 1 1 1 1 1 0 1 1 0 1 1 0 1 0 conspirator 4: 0 1 0 1 1 0 1 1 1 0 0 0 1 0 0 0 conspirator 5: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 conspirator 6: 1 0 0 0 1 1 0 1 1 1 1 1 0 1 0 1 conspirator 7: 1 1 0 1 0 0 1 1 0 0 1 0 1 0 0 1 conspirator 8: 1 0 1 0 0 1 0 0 0 1 1 1 0 1 1 1 conspirator 9: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 conspirator 10: 1 0 1 0 1 1 0 1 1 1 1 1 0 1 1 1 conspirator 11: 0 1 1 0 0 0 0 0 0 0 0 0 0 1 1 0
Reader: Try to find the ringleaders and how each of the subordinate conspirators make his or her decision. You may be able to do this on your kitchen table. Hint: Tyler and Liane were able to do this assuming only four ringleaders. It’s open whether fewer could be the ringleaders, assuming each subordinate computes a function using only at most one AND and one ODD circuit. 2. There is a new group of trades. Liane and Tyler believe there are only 3 ringleaders this time. They also think the conspirators have grown careless. Can you see why? conspirator 1: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 conspirator 2: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 conspirator 3: 0 1 0 0 0 0 1 1 0 0 0 1 1 0 1 1 conspirator 4: 0 0 0 1 1 0 1 0 1 0 1 0 0 0 1 1 conspirator 5: 0 0 1 0 0 0 0 1 1 0 1 0 0 1 1 1 conspirator 6: 0 1 1 1 1 0 0 0 0 0 0 1 1 1 1 1 conspirator 7: 0 0 0 1 1 0 1 0 1 0 1 0 0 0 1 1 conspirator 8: 0 0 0 1 1 0 1 0 1 0 1 0 0 0 1 1 conspirator 9: 0 0 0 1 1 0 1 0 1 0 1 0 0 0 1 1 conspirator 10: 0 1 1 0 0 0 1 1 0 0 0 1 1 1 0 0 conspirator 11: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1
One thing that worries Ecco is that perhaps any random set of buy/sell decisions among 11 traders over a time scale of 16 could lead to the conclusion that four ringleaders and possibly even three were responsible. That would mean innocent people would be falsely accused. Do you think this would be a common occurrence for the simple AND/ODD circuits presented here? Perhaps you can find an elegant counting argument. For the solution to last month’s puzzle, see page 88.
C5
Figure 1: Warm up. Dr. Dobb’s Journal, September 2004
DDJ http://www.ddj.com
SECTION
A
MAIN NEWS
Dr. Dobb’s
News & Views
DARPA Prepares Next Robot Race A million dollar prize went unclaimed in last year’s DARPA Grand Challenge (http:// www.darpa.mil/grandchallenge/), as none of the autonomous robot racers managed to travel more than eight miles of the 142 mile off-road course. DARPA, however, has decided to make a yearly event of the race. The next Grand Challenge has been scheduled for October 8, 2005, and the prize money has been doubled. This year, DARPA is also holding a Participants Conference “intended for participants, interested sponsors, and groups looking for others to help complete their teams.” The conference will take place on August 14 in Anaheim, California. Fifteen teams from a field of 106 applicants managed to qualify for the race in 2004. The best showing was made by “Sandstorm,” a robotic Humvee built by a team at Carnegie-Mellon, which managed to complete 7.4 miles with an average speed of 15 mph (http://www.redteamracing .org/index.html). The Carnegie-Mellon team is already preparing for next year’s race, and DARPA expects overall participation to rise significantly this year.
Largest Prime Number Yet The Great Internet Mersenne Prime Search (GIMPS), a distributed computing effort that uses the spare computing cycles of volunteers to search for as-yet-unknown prime numbers, has discovered a 7,235,733-digit prime number. The new number, which can be concisely expressed as 2 to the 24,036,583th power minus 1, is the largest known prime and only the 41st known Mersenne prime. The discov-
14
ery was made by Josh Findley, a consultant to the National Oceanic and Atmospheric Administration in La Jolla, California. He has been part of the GIMPS project (http://www.mersenne.org/) for five years; the specific calculation that led to the new discovery took two weeks to perform on his 2.4-GHz Pentium 4 PC. Mersenne primes are named for Marin Mersenne, a French monk who published a sequence of these numbers in 1644, although he was not the first to study the relationship between prime numbers and numbers of the form 2n–1. For example, these numbers are prime when n equals 2, 3, 5, 7, 13, 17, 19, 31, 61, 89, 107, and 127. This is the seventh Mersenne prime discovered by the GIMPS project over the last eight years. The Electronic Frontier Foundation offers a $100,000 prize for the discovery of the first 10-million-digit prime: “An award-winning prime could be mere weeks or as much as few years away— that’s the fun of math discoveries,” said GIMPS founder George Woltman in a prepared statement.
New Algorithm for 3D Compression A team of researchers from the University of Southern California says their Variational Shape Approximation scheme lets 3D images be compressed with much more efficiency than existing methods (http:// www-grail.usc.edu/pubs.html). Assistant professor Mathieu Desbrun worked with two postdoctoral researchers — Pierre Alliez, now with France’s National Institute for Research in Information and Automation, and David Cohen-Steiner, now of Duke University — to produce the “Variational
Dr. Dobb’s Journal, September 2004
DR. DOBB’S JOURNAL September 1, 2004
Shape Approximation” scheme, which represents 3D images as simplified meshes. It uses a technique called “Lloyd Clustering,” from the field of machine learning, to dissect 3D objects into nonoverlapping connected regions that can then be separately optimized.
Microsoft Revises Shared Source License Microsoft is eliminating many of the restrictions on the use of its “shared source” license for Windows CE 5.0. The revised shared source license increases the source code that Microsoft is making available to more than 2.5 million lines of code, including the GUI, operating-system kernel, and the like. Among other revisions, the license for CE 5.0 lets developers customize the OS code in their software without having to sublicense the modified code back to Microsoft. For more information, see http://msdn.microsoft.com/embedded/.
Java 3D & Project Looking Glass Open Sourced Sun has announced the open-source availability of the Java 3D API and Project Looking Glass. The Java 3D API provides a set of object-oriented interfaces that support a high-level programming model you can use to build, render, and control 3D objects and visual environments. With the Java 3D API (http://www.java.sun.com/products/javamedia/3D/), you can incorporate highquality, platform-independent 3D graphics into Java-based applications and applets. Project Looking Glass (http://www.sun .com/software/project-looking-glass/) is an interface that offers an intuitive, 3D environment to interact with desktop apps.
http://www.ddj.com
Grid Computing & the Linda Programming Model An alternative to web-service interfaces ROB BJORNSON AND ANDREW SHERMAN
W
ith all the current interest in Grid computing, it is surprising that there is still a lack of agreement on exactly what it is. Our definition is a pragmatic one. Grid computing is an approach to computing that lets you:
• Organize widespread, diverse collections of CPU resources into a virtual supercomputer. • Organize widespread, diverse collections of data resources into a virtual file system. • Organize widespread, diverse collections of applications into standardized, reusable libraries of components (virtual applications). • Collect and organize disparate, valuable, but hard-to-use resources into a more uniform, manageable, visual whole. • Make this virtual grid of resources accessible to multiple users simultaneously. Much of the current mainstream thrust in grid computing has grown out of the web-services community, which naturally envisions the grid as a large collection of machines that offer resources via web-service-like interfaces accessed directly by clients that speak primarily in XML. This view of grid computing is appropriate in some contexts, such as loosely coupled service architectures. However, this abstraction level is too low for most users, and XML requires far too much data translation and parsing for efficient communication. Furthermore, direct RPC-style communication is too restrictive in the dynamic, even disorganized, environment in which grid applications have to exist. Alternatively, “Linda” is a communication style abstraction (called a “tuplespace”) based on a bulletin board rather than direct messaging. Linda Linda is not a full programming language. Rather, it is a set of coordination operations that can be added to any existing language, providing a way to communicate among and synchronize different processes. Linda has been added to languages as diverse as C, Java, Prolog, Fortran, and Smalltalk. Moreover, there Rob is chief architect at TurboWorx. He received a Ph.D. from Yale University, where he focused on massive parallel computing. He can be contacted at
[email protected]. Andy is CTO at TurboWorx. He also received a Ph.D. from Yale and has 30 years of experience in high-performance computing. He can be reached at
[email protected]. 16
are at least three robust, commercially supported versions of Linda for Java. Although their APIs differ somewhat, all offer largely similar capabilities. All support the basic Linda operations, multiple tuplespaces, and transactions on tuplespaces (for consistency and fault tolerance):
“Scheduling is a decentralized, cooperative process between clients and workers” • Paradise, from Scientific Computing Associates (http://www .lindaspaces.com/) emphasizes speed and interlanguage connectivity. It has APIs for Fortran, C, and Java, letting tuples serve as a bridge between languages as well as between processors. (Rob was one of the designer/implementers of Paradise.) • JavaSpaces, from Sun Microsystems (http://java.sun.com/ developer/products/ jini/index.jsp), is part of the Jini project. Not surprisingly, JavaSpaces is confined to operating within Java, and leverages Java in its API. In particular, Linda operations are performed directly on a single object; its public fields become the fields of the tuple. It is integrated with other Java technologies, most notably transactions. (An excellent book on the subject is JavaSpaces Principles, Patterns, and Practice by Freeman, Hupfer, and Arnold; http://java.sun.com/docs/ books/jini/javaspaces/.) • Tspaces from IBM (http://www.almaden.ibm.com/cs/TSpaces/) is also supported only in Java, but has a focus on extensibility, database-like functionality, and fault tolerance. Tspaces supports more complex queries against tuplespace by creating custom matches( ) methods in tuples. In Linda, data is created and communicated via tuples. Tuples are simply ordered collections of data. For example, the tuple: ("task", taskid, arguments[4], taskdesc)
could describe a task to be performed, its inputs, and a unique task identifier. Tuples are produced by writing them to tuplespace:
Dr. Dobb’s Journal, September 2004
http://www.ddj.com
ts.out("task", taskid, arguments[4], taskdesc)
Not surprisingly, the exact syntax for Linda operations varies among different host languages and even among implementations of Linda. (In this article, we use a generic Java syntax.) Tuples are read by performing read( ) or in( ) operations, using a template to search for matching tuples. Some values in a template are filled in, in which case they must match exactly; others are wildcards and must simply be type compatible. Wildcards are denoted by a “?”; for example: ts.in("task", ? taskid, ? arguments[], ? taskdesc)
searches for a tuple that matches; if one is found, it consumes it. As a side effect, taskid, arguments, and taskdesc are set to the values found in the tuple. If several matches are available, one is chosen, typically at random. If no matching tuple is found, the operation blocks until one appears. Using the bulletin-board metaphor, tuplespaces are boards and tuples are the notes posted on them. At first blush, this might appear an odd way to communicate. If you want to send a message to Bob, why would you write a message on a board when you could just call him (message passing being equivalent to directly calling someone)? There are several reasons: • Bob might not be available right now to take a call. • You might not know Bob’s number. • More importantly, you may not know Bob’s identity. Just as you might post a note on a board looking for a ride to Seattle, you can post a tuple requesting a certain type of service. You don’t need to know who will provide the service, where they are located, or if they are currently available. Linda provides a communication mechanism that is both associative (using qualities of the data itself to find it) and anonymous (producers and consumers of data communicate without directly addressing one another). This is in sharp contrast to traditional message-passing approaches (of which web/grid services are simply the latest incarnation), where messages are always directed to a particular recipient. Traditionally, Linda has been considered a parallel programming tool for improving the performance of a single application, analogous to the Message-Passing Interface Standard (http://www-unix.mcs.anl.gov/mpi/). However, Linda is even better suited to distributed or grid computing, where individual components of the system are less well known to one another. A common Linda idiom is the client/worker model in which clients submit tuples that represent tasks to a particular tuplespace. Included in the tuples is all the information required for scheduling and executing a particular task. Workers, which represent compute nodes, use in( ) to withdraw tuples for compatible tasks. They execute the tasks and put a result tuples back into tuplespace.
Figure 1: A protein classification workflow. http://www.ddj.com
This idiom is simple, yet powerful, because of the properties of Linda communication. Scheduling is a decentralized, cooperative process between clients and workers. Workers can dynamically come and go as they please because tasks are not being directed at any particular worker. Achieving fault tolerance is greatly simplified because tuples are manipulated under transactions; if a worker fails holding a task tuple, it is automatically regenerated. Example 1 shows simplified code for the tuple client and worker. Visual Programming, Workflows, & Grids Figure 1 shows a workflow that implements a compute-intensive application. The workflow is built using TurboWorx Builder, a component and development environment from TurboWorx (the company we work for; http://www.turboworx.com/). Figure 1 shows a life-science application for classification of protein domains. In the figure, the boxes represent components performing computations, while the (directed) lines represent the flow of data and dependencies. The visual workflow created by the user is automatically transformed by the GUI into an XML representation that guides the runtime system as it carries out workflow execution. Components in workflows perform computations by invoking applications or other workflows. Applications may be compiled executables, scripts written in languages such as Perl, Python, or Java classes. Before use in workflows, each application is “componentized” using a wizard that creates an XML wrapper file describing the application’s inputs and outputs and how it should be invoked. Typically, no modifications are made in the underlying program or script. For a command-line invocation, the XML would describe the invocation command, including input and output files, environment variables, switches, and the like. For a Java method, the wrapper would describe the accessor and execution methods, much as with JavaBeans. Workflows are themselves components and they may be nested to arbitrary depth as components in other workflows. At runtime, components may execute as soon as all of their required input dependencies are satisfied, and the runtime system normally tries to schedule them on appropriate compute (continued on page 20) (a) worker() { RESULT r; TASK t; while (1) { transaction(ts); ts.in("task", ?t); r = compute_task(t); ts.out("result", r); commit(ts); } } (b) master() { int i; RESULT r; TASK t; for (i=0; create_task(&t);i++) ts.out("task",t); for (;i;--i){ ts.in("result",?r); update_result(r); } output_result(r); }
Example 1: Master/worker paradigm: (a) worker; (b) master. Dr. Dobb’s Journal, September 2004
17
(continued from page 17) nodes (that is, workers) as soon as possible after they become ready to run. After executing, components normally provide data for each of their outputs that must be delivered eventually to one or more downstream components in the workflow. Figure 1 also illustrates one of the advanced TurboWorx programming features for exploiting parallelism. The components joined by parallel lines (beginning with “Fasta Splitter” and ending with “Concatenation Joiner”) actually constitute a parallel loop. Multiple independent data elements flow through this subgraph, possibly causing multiple instances of the components to run on independent nodes. Such advanced programming features make it easy for users to apply workflow systems to a wide variety of computational problems. Runtime System The runtime system required for executing workflows in a grid environment poses a number of challenges: • Tasks must be scheduled to run when their required inputs are present. • Results must be propagated to downstream tasks. • Tasks should be efficiently assigned to worker machines on the grid, even as the workers come and go dynamically. • Workers failing while executing a task should not interrupt the overall computation. The Linda communication model supports these requirements quite naturally, using a more advanced form of the client/worker model, in which tasks express interdependencies and can generate subtasks. The runtime system transforms the XML describing the workflow into a partially ordered collection of tasks and executes them on a set of worker nodes, using tuples to represent the tasks. Figure 2 portrays the overall architecture of the TurboWorx runtime system. The system contains three major elements — a single Master, one or more Clients, and one or more Workers. Master. The Master refers to a collection of services, often colocated on a central server, that coordinate the overall runtime system. These services include a Linda server for storing tasks and other metadata, a web server and a data server. Each of these services is backed by nonvolatile storage so that, in the event of failure, the services can be restarted without losing vital state. The Linda server is used to store information about tasks and to communicate metadata or small quantities of real data between tasks. Each task is represented by a task tuple that contains information required to execute the underlying computational program; data or metadata about the inputs available thus far (that is, satisfied dependencies for the task); the number of remaining inputs required; information about the user submitting the task; and a reference to the component definition (to be described shortly). One field of the tuple indicates task status. When all the inputs for a task are available (all dependencies are satisfied), that field of the task tuple is changed to indicate that the task is ready to run. Workers take tasks by selecting from among the readyto-run task tuples. All tuple manipulations occur under transactions, so that a recovery mechanism can be invoked should failures occur. Recovery is accomplished by rolling back the workflow’s execution state to the most recent consistent state. Thus, the Linda server acts as the “short-term” state memory of our system. In concert with a database, the web server is used to manage the collection of users and the library of defined components. It also provides the service side of a browser-based user interface to the system, letting users execute components from a portallike web interface, forwarding execution requests to the master’s Linda server, and holding completed results until users request 20
Dr. Dobb’s Journal, September 2004
http://www.ddj.com
(continued from page 20) them. Thus, the web server serves as the “long-term” memory of our system. The data server is used to store and stream large data objects among the clients and the workers performing component execution. For large data objects, the metadata passed via the Linda server only includes reference-counted pointers to the data. The data itself is only transmitted when the component requiring it has been scheduled on a specific worker. The system currently supports a variety of data server implementations, including NSF, FTP, and WebDav. Clients. Clients represent the users of our system. Clients submit execution requests by naming a component (typically, a workflow) to be executed and providing a set of input values to be passed to the executing instance of that component. The client then waits for the results to be returned. Currently, a client may use a SOAP or Java API to interact with our system, and we provide users with a Java Swing GUI that invokes a servlet-based web API. Workers. The workers are the actual loci of computation for the runtime system. They are responsible for selecting ready-to-run task tuples from the Linda server, collecting required data and executables, carrying out the actual computations, and delivering output data for use by downstream components. Task scheduling is completely decentralized in the sense that each worker makes its own decisions about task selection based on locally configurable selection/scheduling criteria. (This is a major advantage in the grid setting.) Task selection may be based on a variety of criteria, including task metadata (for example, task priority, user identity, application name, characteristics of the inputs) as well as local information about the worker’s state (resource availability, current load, prior local availability of input data, and so on). Once a worker has selected a task, it requests the component definition from the component library, examines the XML wrapper, and invokes an appropriate component interpreter. A component interpreter is a program that understands how to interpret the component metadata to set up and execute a particular type of component (for example, command-line executable, script, or Java method). The interpreter collects required input data, sets up the necessary runtime environment, and invokes the underlying application executable or scripting system. Fault Tolerance The runtime system as a whole is fault tolerant in that any element (Client, Master, or Worker) can fail without causing the system to lose the states of currently executing workflows. Fundamentally, fault tolerance is accomplished via Linda but is handled somewhat differently for each of the three elements of the system. If a Client fails after submitting a workflow, the workflow execution will continue as usual. The master retains the results until such time as a new authorized Client requests them via an in( ) operation. If a Worker fails while executing a task, the Linda server automatically cancels the transaction under which the execution was proceeding, and the task returns to the list of pending tasks available for selection by another Worker. In general, the work performed on the partially completed, failed task is lost, but completed tasks in the workflow are safe. If the Master fails, the system ceases executing until a new Master is started from the saved state of the failed Master (the Linda server maintains its own checkpoint). Restarting the Master restores the state of the various services from their most recent consistent checkpoints, and workflow executions can proceed from that point. Scalability A possible objection to the aforementioned architecture concerns the scalability of the Linda server. Our current implementations employ a single server, primarily to simplify administration. 22
Dr. Dobb’s Journal, September 2004
http://www.ddj.com
N
e ativ
..
CompTask PirTask DFTB
TBTB
Jpiranhad TupleAPI
Na
FTP
NFS
WebDav
WebDav
FTP
Web Server tive
Client
Soap
Apache
Postgres ...
Ta s
ks
UsrSvc ComDir CompRep Tomcat
DataRepAPI NFS
Linda Server
DF Intp
Soap
CLI Intp
THubAPI
Builder
s
Task
TCP/IP ...
Data Server
Worker
Master
Figure 2: Runtime system architecture. (continued from page 22) However, multiple servers can be scattered around the grid, each serving as a local grid-access point. Tuplespaces could serve as directories listing available access points. Workers and Clients could wander from server to server, as their needs dictate. Conclusion The Linda programming model provides a number of advantages for building Grid applications, compared to more traditional approaches using web services: • Compute nodes (workers) can dynamically appear and disappear without any additional programming effort or architectural complexity.
• Scheduling is a distributed activity. Each worker makes its own decision about which tasks to take. No centralized, oracular scheduler is required. • The state of the system is open and available for inspection. The state of tasks can be determined by browsing the tuples in the Linda server via a variety of tools, including webbrowser interfaces. • Native data types, such as Java objects, can be easily passed among cooperating processes. This does not exclude the possibility of exchanging tuples containing XML in cases where that is appropriate, of course. DDJ
Debugging, Message-Oriented Middleware, & Distributed Systems Runtime software modification PAUL PAZANDAK AND STEVE FORD
I
f you are involved in designing and building distributed messaging systems, you are well aware of the problems in tracking down and isolating bugs. A common approach is to log into each of the remote hosts, start the application on each, then watch for exceptions or debug statements you inserted in the code. The problem with this approach is that it doesn’t scale well. For one thing, when the distributed system involves tens of computers and hundreds of processes, it is impossible to observe all of the console windows. Furthermore, when the distributed processes are sending hundreds or thousands of messages (and it’s your job to find out why a handful of messages out of those thousands aren’t making it to their destination), it is impossible to visually track every message through the system by monitoring the debug statements emitted to the screen. This is exactly what we were faced with in the Cougaar Agent System. Cougaar (short for “Cognitive Agent Architecture”) is a Java-based architecture for the construction of large-scale distributed agentDr. Paul Pazandak (
[email protected]) and Steve Ford (
[email protected]) are research scientists at Object Services and Consulting (http://www.objs.com/). Distribution Statement A Approved for public release; distribution is unlimited. 26
based applications and the product of a multiyear DARPA research project (UltraLog, http://www.ultralog.net/) into largescale agent systems. The Cougaar agent system is big. This year, the largest of the Cougaar test societies will have 50+ host computers running 1000+ agents; a small test society involves about six computers. Therefore, it should not be a surprise that debugging messaging problems in Cougaar is labor intensive. Given the quantity of messages being sent, we simply could not monitor console windows. Instead (in the beginning), we sent all of the debug statements to log files using Log4J (http://logging .apache.org/log4j/docs/), then inspected the files offline after the application had terminated. We improved on this by implementing a console tool to inspect the log files for us and identify what messages had been lost. This tool sorted all messages first by agent, and then by whether the message was sent or received by that agent. A second iteration through these files identified which messages were sent but not received. A problem with this approach was that it was not dynamic — we had to wait for the application to run to completion (for instance, generating a logistics plan, which might be an hour or more). However, an even bigger problem was that it allowed only coarse visibility — it would tell us if a message didn’t make it, but it would not tell us how far it had gotten. We would then have to hand inspect the log files to track down debug statements to see where the message was last seen. We should say that we are assuming that the underlying network is not the likely problem, rather we are debugging the multiprotocol sending and receiving message stacks that we wrote. What we did next was to Dr. Dobb’s Journal, September 2004
implement a tool that solved both of these problems. Solving Problem 1: Offline Debugging Instead of collecting data into files, we routed all of the message-related Log4J debug statements from every agent to our new centralized collection tool running on one machine. To do this, we defined a
“ProbeMeister supports both XML configuration-directed probe insertion and drag-and-drop insertion” Log4J SocketAppender in Cougaar, then passed in the host and port for the tool via each host’s command line. The Log4J debug statements were then modified to emit to an event router class that we implemented (in Cougaar), which simply sent them across the socket if it existed (if the command-line arguments were provided); otherwise, we sent them via the default Cougaar logging mechanism. The centralized tool at the other end collected, organized, and displayed all of the data as soon as it was received (Figure 1). The main window showed a summary of all known agents, how many messages they had sent, received, and were still outstanding. From here, you can drill-down and look at all of the messages sent by any http://www.ddj.com
(continued from page 26) given agent, and quickly see which messages had made it or had not yet arrived. While this was a big improvement over the previous tool, we still wanted to pinpoint where a message was being lost. Solving Problem 2: Coarse Visibility Rather than just knowing whether a message was received, we wanted to know where it was last seen in the code for the sending/receiving protocol stacks so that we could track down the offending bug. To solve this, we needed the tool to recognize and process debug statements emitted at multiple points in the stacks. This would let us see how far each message traveled through the code. These points, or logpoints, were simply additional Log4J debug statements that we inserted in the protocol stacks (having debug statements at multiple layers in the message stack results in thousands of entries for each agent). With the logpoints in place, our tool lets us drill down further to see the logpoints each message made it to and when it arrived at each logpoint (Figure 2). The tool learned about logpoints via an XML-based configuration file (Exam-
Figure 1: Centralized collection tool.
ple 1). It defined the logpoints, which stack they belonged to (send or receive), supplied a user-friendly name, and the order they appeared in the stacks. The last receive stack logpoint defined defaults to be treated as the endpoint. Any message having reached this point was considered to have been successfully received. We also permitted the endpoint to be changed at runtime. Since the tool’s main window displays the number of messages sent and received, changing the endpoint could affect how many messages were considered to have been successfully received. This lets us quickly see how many messages had made it to a specific logpoint. Using our tool, we were then able to see (in near real time) the flow of messages through the system and quickly observe when messages were apparently lost in the ether. In the MessageDetail window (Figure 2), we could examine the logpoints for each individual message and the time it was seen. The send stack logpoints were color-coded in black, receive logpoints in blue, and the final logpoint in red. When we identified messages disappearing (not making it past some logpoint), the next step was to insert one or more logpoints after the last logpoint the message was seen at. This, of course, required us to stop the application, modify the code, and then restart it. For small applications, or when reaching new debug statements occurs relatively quickly, the run-observe-stop-add_more_logpointsrestart debug cycle might be tolerable. In our situation, testing the various message protocols might take an hour. So, while the tool reduced the time to identify lost messages substantially, we were still left with an expensive debug cycle to track down where the messages were being lost. Fortunately, we had another tool to bring to bear.
TRUE" ID="B TRUE" ID="B TRUE" ID="B FALSE" ID="A 1" SEND_STACK="F FALSE" ID="A
Example 1: Logpoint configuration file. 28
Dr. Dobb’s Journal, September 2004
http://www.ddj.com
Inserting Logpoints at Runtime The last major hurdle to efficiently debugging our code was to terminate the application to insert new logpoints (Log4J debug statements). What we really needed was to be able to insert (and remove) logpoints into the agents while Cougaar was running. For this feature we turned to ProbeMeister, another tool we had developed (available electronically; see “Resource Center,” page 3). ProbeMeister (Figure 3) is a runtime software instrumentation tool (it was the first tool of its kind to use the advanced distributed debugging facilities introduced in JDK 1.4). ProbeMeister’s key strength is that it is capable of inserting Java bytecode into any local or remotely running Java application without stopping it. We developed ProbeMeister a few years ago to support ad hoc insertion of software probes into multiple remotely running Java applications. To insert probes into an application, users can either connect to the remote application at will from ProbeMeister or configure the remote application (using additional Java Virtual Machine directed command-line arguments) to connect to ProbeMeister when it starts up. No specialized JVM or modification of the application itself is required. ProbeMeister supports XML configurationdirected probe insertion as well as dragand-drop insertion. Using XML-based configuration files, you can specify exactly where to insert a given set of probes into a Java process (the probes can be inserted into application classes as well as core JVM classes). The custom configurations can then be automatically applied when ProbeMeister first attaches to the application or at any point after. While configuration files could be defined by hand, in general, you build up a configuration simply by deploying probes — ProbeMeister records all deployments that can then be exported to a configuration file (the file can then be edited if desired). Custom configurations could be used to focus on specific problems such as networking or file I/O, or to define application patches. Probes can be just about any code you might want to write; they are certainly not limited to debugging statements. Ad hoc deployment of probes involves simply dragging a selected probe to a source-code window and dropping it in between two lines of code. The sourcecode window displays the decompiled bytecode of the selected class and method, and labeled icons indicate where other probes currently exist in the code. Because ProbeMeister decompiles the current in-memory version of a class’s bytecode, no source code files are used by ProbeMeister. Once a probe has been dropped into the source-code window, it http://www.ddj.com
is immediately deployed to the remote JVM. Specifically, a copy of the method’s code is modified in the remote JVM and it is swapped in as soon as no process is executing the method. Once a probe is deployed, it remains in existence for the application’s execution lifetime (until the application is terminated), or until it is removed. Probe removal is just as simple as insertion — you
“To support this capability, our tool would need to understand how messages flow through the application” simply drag a probe out of the sourcecode window. While currently unimplemented, it would also be possible to support modification of the class files so that probes persisted across executions. Recall that the purpose of ProbeMeister was to insert probes into remotely running applications. Thus, if you insert debug statements into a remote application, they will by default emit their output to the remote machine’s console. However, it won’t do much good if you cannot see the output! To solve this, ProbeMeister includes a probe that forwards debug statements to a remote collection facility. We first implemented a basic web server that collected and displayed the data from all probed processes in a web page. We later extended ProbeMeister to act as a collector itself and to display the data within an easily accessible window. Data from multiple processes can be interwoven (the text from each process is color coded) or displayed in separate windows.
Figure 2: Message detail window. Dr. Dobb’s Journal, September 2004
ProbeMeister comes with a variety of probes built-in (for instance, one to call a specified method in the remote application, another to emit strings, one to emit a method’s arguments, and so on). Other basic probes can be written without excessive trouble using high-level probe-building library calls. More sophisticated probes are implemented in the target application as probe plugs. Using probe plugs, which are written in Java (not in bytecode like the other probes), you can easily and quickly write code capable of accessing and manipulating specific application objects and state. Then, in ProbeMeister, one drags and drops a probe stub onto the source code that is then customized by the user to call one of the predefined probe plugs (ProbeMeister displays a list of all compatible registered probe plugs that could be invoked). The most verbose probe stub passes several items to a probe plug for its use — the name of the remote JVM, the probe stub name, the name of the instrumented method, a user-supplied message, the current thread, an array containing all of the method’s arguments, and the current object (if the instrumented method is not a static method). Recall that the probe stub is executing within the remote application, so it has access to all of this information. Its job is simply to collect it all up and pass it to another method (the probe plug) to do something with it. This might include simply emitting the information, though the method could easily modify the data or object to affect application behavior. Using ProbeMeister To Solve Our Problem Using ProbeMeister, we developed a custom probe plug that would emit new logpoints by calling our event router in Cougaar. Specifically, we used a ProbeMeistersupplied probe stub to make a method call to a custom probe plug that we defined in Cougaar. We could use this plug in any method where a message was accessible.
Figure 3: Main ProbeMeister window showing a probe stub that was just deployed into the source-code window via drag-and-drop. 29
This gave us the ability to inspect the message and to identify the sending and receiving agents, message sequence number, message type, and so on. The probe plug then composed a debug statement containing this information and emitted it to the SocketAppender via our event router. Thus, using ProbeMeister, new logpoints could be added anywhere in the message stack. Our collection tool (see Figure 4) then automatically identified the new logpoints and correctly organized them. New logpoints appeared as NEW entries in our LogPoint Configuration window. From here, we could identify which ones to pay attention to, essentially indicating whether we cared about them or not. As they appeared in this window, we could redefine the endpoint and immediately see if any new messages were making it past the new logpoints (Figure 5). This significantly shortened the coding cycle by reducing the need for application restarts required after the insertion of new debug statements. You might ask why we needed to use ProbeMeister when we could simply place debug statements everywhere. Given the timing sensitivity of Cougaar, and likely most messaging middleware, the main problem with this approach is that it perturbs the behavior of the application such that it behaves differently than it would without all of the debug statements. A secondary problem with this approach, in general, is that of information overload because you may have 50 to 100 or more debug statements emitting data per message. Future Enhancements While we found ProbeMeister to suit our needs in some respects, it was never designed to scale to instrument hundreds of processes. It was also not designed with the specific intent to connect to several identical remote processes and to deploy identical probes to each. When we used it with Cougaar, we watched for where specific messages were being lost, and then attached to the host machine(s) that the message was lost on and deployed a probe. Ideally, and potentially in some future version of ProbeMeister, we could instruct a probe to be deployed simultaneously to several or all running JVMs in the agent society. A second enhancement that we could envision would be to automate the deployment of probes. This would come into play when a message didn’t arrive: Our message auditing tool would identify the last logpoint a message was seen at, select a point in the code between that logpoint and the next existing logpoint, and insert a new logpoint via ProbeMeister (via ProbeMeister’s existing RMI interface). If a message still didn’t make it to the new logpoint, another would be chosen between 30
Dr. Dobb’s Journal, September 2004
Figure 4: LogPoint display configuration window.
Figure 5: Agent message list. the last one it made it to and the one that was just inserted. Logpoints not of interest could be removed from the application via ProbeMeister to minimize perturbations. To support this capability, our tool would need to understand how messages flow through the application. Automated tools, and potentially user assistance, could be used to analyze the code to construct a data flow description that could then be used to identify logpoint locations. ProbeMeister is looking for a home. While it is fairly robust, it could use a good home where it could become a widely useful and available debugging tool. Acknowledgments Thanks to Dr. John Salasin for funding the development of ProbeMeister (funded under the DARPA DASADA program), as well as Dr. Mark Greaves for funding the tool development under UltraLog. We would also like to thank Tom Bannon for his contributions on this effort. This research is sponsored by the Defense Advanced Research Projects Agency and managed by the Department of the Interior/National Business Center under contract NBCHC010011. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Defense Advanced Research Projects Agency, Department of the Interior/National Business Center, or the United States Government. DDJ http://www.ddj.com
Stream-Control Transmission Protocol An alternative protocol to TCP IAN BARILE
T
he Stream Control Transmission Protocol (SCTP) is a new transport layer protocol proposed in RFC 2960 (http://www.faqs.org/rfcs/rfc2960 .html). Like TCP and UDP, SCTP uses the network layer protocol for intersystem communication. SCTP is a connectionoriented protocol, similar to TCP that provides reliable communication between two endpoints. Unlike the byte stream TCP uses for data transport, SCTP sends data in message chunks similar to UDP packets. SCTP was originally developed for telephony signaling networks. Integrated Services Digital Networks (ISDN) is one SS7 protocol implemented on SCTP. SCTP has many features that make it a beneficial protocol for use outside of the telecom industry. SCTP’s unique features such as native virtual streams and multihomed support make SCTP an attractive alternative to TCP. Over the next several years, SCTP will start to play a role in Internet applications. SCTP stacks are available for almost every operating system. To find an implementation for a particular OS, visit http:// www.sctp.org/implementations.html. In this article, I use the LKSCTP implemen-
Ian currently works as a development consultant for Marimba and his interests are in computer security and networking. He can be contacted at ian_barile@ yahoo.com. 32
tation (LKSCTP is integrated into the 2.6 Linux kernel; http://www.kernel.org/). Types of Socket Communication Several types of socket communication are available for use in developing network applications. Developers must choose the transport layer protocol and technique that transmits data efficiently for applications. Transport layer protocols transmit data between two endpoints on IPv4 networks using the following methods: • Unicast. One endpoint to one endpoint. • Broadcast. One endpoint to all endpoints on a subnet. • Multicast. One endpoint to many. TCP and SCTP are connection-oriented protocols that only support unicast data transmission. UDP is a connectionless protocol that supports unicast, broadcast, and multicast data transmission. Connectionless protocols send data to endpoints regardless of the endpoint ability to receive and process the data. Connection-oriented protocols ensure reliable data transmission, flow control, and error control. Unicast Data Transmission When developing applications that require unicast data transmission, you must choose the correct transport layer protocol. SCTP, TCP, and UDP have unique features that can benefit your application. SCTP is a connection-oriented protocol similar to TCP. SCTP provides error and flow control for data packets such as TCP. SCTP also gives you the ability to tag data through the use of virtual streams and protocol payload identifiers. SCTP sends data in messages similar to UDP packets — unlike the TCP byte stream. TCP uses a full-duplex byte stream to transmit data between endpoints. TCP enDr. Dobb’s Journal, September 2004
sures reliable, sequenced data transmission between two endpoints by using flow- and error-control algorithms. Like IP, UDP transmits data using unreliable, unordered datagram packets. Any packet reordering and validation occurs
“SCTP is a connection-oriented protocol similar to TCP” in the application layer. UDP is the fastest transport layer protocol due to the lack of error or flow control. Virtualized Streams SCTP’s virtual streams and the protocol payload identifier let you reduce application complexity and simplify application layer protocols by making it easy to identify data types. Virtual streams benefit you by letting data types be tagged by the stream identifier, reducing the need to parse data types out of the TCP byte stream. When implementing the application protocols HTTP and SMTP over TCP, different data types are identified by text headers (MIME headers). Parsing a TCP byte stream for text headers is a complex, costly process, compared to using the SCTP stream identifier (a WORD value) in the SCTP [DATA] packet header. To set up the number of inbound and outbound virtual streams for an SCTP association, two setsockopt( ) calls must be http://www.ddj.com
(continued from page 32) made with the option name SCTP_INITMSG and SUBSCRIBE_EVENT (see Example 1). SCTP-specific setsockopt( ) calls must pass SOL_SCTP for the level parameter. When an SCTP endpoint receives a mes-
sage from an associated endpoint via the sctp_recvmsg( ) API, the sctp_sndrcvinfo structure specifies the virtual stream via the sinfo_stream member (Example 2). Other application layer protocols such as FTP require multiple connections to com-
int SCTPSock = socket(PF_INET, SOCK_SEQPACKET, IPPROTO_SCTP); //set sctp initial options sctp_initmsg tSCTPInitMsg; memset(&tSCTPInitMsg, 0, sizeof(tSCTPInitMsg)); tSCTPInitMsg.sinit_num_ostreams = 10; tSCTPInitMsg.sinit_max_instreams = 20; int nRet = setsockopt(SCTPSock, SOL_SCTP, SCTP_INITMSG, &tSCTPInitMsg, sizeof(tSCTPInitMsg)); //tell SCTP to provide information about the data messages //being sent across the SCTP association sctp_event_subscribe tSCTPSubscribe; memset(&tSCTPSubscribe, 0, sizeof(tSCTPSubscribe)); tSCTPSubscribe.sctp_data_io_event = 1; tSCTPSubscribe.sctp_send_failure_event = 1; nRet = setsockopt(SCTPSock, SOL_SCTP, SCTP_EVENTS, &tSCTPSubscribe, sizeof(tSCTPSubscribe));
Example 1: SCTP_INITMSG and SUBSCRIBE_EVENT. This figure demonstrates how to set the number of virtual streams for an SCTP association. nRet = sctp_recvmsg(SCTPSock, szBuf, &buflen, (sockaddr*)&msgname, &msgname_len, &tSCTPSndRcvInfo, &msg_flags); if( nRet == -1 ){ perror(NULL); return -1; } switch(tSCTPSndRcvInfo.sinfo_stream){ ... }
Example 2: Using sctp_recvmsg() to get the virtual stream associated with the data message. TCP and SCTP Server (LISTEN)
∗ int server = socket(PF_INET, SOCK_STREAM, IPPROTO_SCTP). The last parameter can be 0 or IPPROTO_SCTP for a TCP socket.
∗ bind(server, SockAddr, sizeof(sockaddr_in) ∗ listen(server, 10) ∗ accept(server, NULL, NULL)
municate between a client and server. FTP servers and clients could be reimplemented with the “data connection” and the “control connection” being sent over one SCTP association. The multiple “connections” would be on separate virtual streams or have unique protocol payload identifiers. Multihomed Support SCTP native multihomed support enables an endpoint to send data transparently over multiple networks by binding an endpoint to multiple IP addresses. SCTP determines the IP address it will transmit data on by analyzing the routes between the addresses associated with an endpoint. When multiple addresses are available to an association, SCTP uses a default address. If the default route fails, the SCTP endpoint switches to another address in the transport address list. This enables clients to connect to servers over completely different networks reducing dropped connections due to connectivity issues and latency on a network; see Figure 1. When using SCTP to support multiple interfaces on a single SCTP socket, the sctp_bindx( ) call is used to bind the socket to more than one interface. sctp_ bindx( ) (Example 3) differs from the regular socket bind in that it allows addresses to be bound and removed from the socket dynamically. There is an optional call, sctp_connectx( ), which enables an endpoint to connect to multiple addresses on a remote endpoint. At this writing, the LKSCTP implementation hasn’t used the sctp_connectx( ) call. Security Security is an ever- increasing concern when developing applications. When developing a network application, you must understand the security issues associated with the transport layer protocol used. SCTP has been designed to mitigate certain types of security threats.
TCP and SCTP Client (CONNECTING)
∗ int client = socket(PF_INET, SOCK_STREAM, IPPROTO_SCTP) ∗ connect(client, NULL, NULL) SCTP Server (RECEIVING MESSAGE)
Endpoint 'A'
∗ int server = socket(PF_INET, SOCK_SEQPACKET, IPPROTO_SCTP) Network 1
∗ bind(server, SockAddr, sizeof(sockaddr_in))
Network 2
Network 3
∗ sctp_recvmsg(...). The SCTP association isn't established until the first message is received.
SCTP Client (SENDING MESSAGE)
∗ int client = socket(PF_INET,SOCK_SEQPACKET, IPPROTO_SCTP) ∗ sctp_sendmsg(...). The SCTP association is established when the message is
Endpoint 'Z'
successfully sent.
Figure 1: SCTP multihomed connection between two endpoints.
Table 1: Network APIs. 34
Dr. Dobb’s Journal, September 2004
http://www.ddj.com
A common transport layer attack, Denial of Service (DoS), tries to reduce the number of new inbound connections a system can create by tying up resources through a [SYN] flood. SCTP mitigates DoS attacks by using a cookie to maintain state information during the initialization handshake. The SCTP cookie contains data, like the Transmission Control Block, used in system resources once an endpoint enters the established state. An established SCTP socket protects against random and malicious packets through validation tags. If a packet is received with an invalid validation tag, the packet is discarded. SCTP relies on other technologies to provide protection against data corruption (IP Authentication Header RFC 2402) and protecting confidentiality (IP Encapsulating Security Payload RFC 2406). If your application requires a connectionoriented protocol to communicate between endpoints, you can choose between SCTP and TCP. When developing the applications, knowledge of how the connections are initialized, terminated, and data is transferred leads to better application design and faster debugging. Initialization Establishing an association between two endpoints requires the use of network APIs in the client and server to negotiate the association and allocate system resources. The socket APIs used for establishing TCP associations are used in establishing an SCTP association. SCTP also provides an implicit association to be created between endpoints. The socket APIs have been extended to to support SCTP-specific functionality. Table 1 lists the network APIs. When client and server applications establish an association between endpoints, the endpoints enter the protocols initialization sequences. The SCTP initialization sequence has a four-way handshake where TCP has a three-way handshake. The TCP initialization sequence in Figure 2 illustrates the [SYN]/[ACK] packet ex-
change required to establish communication between the endpoints. The endpoints use the sequence numbers established in the [SYN] packets to allow TCP stacks deliver sequenced data to the application. SCTP’s initialization sequence uses a fourway handshake. The extra step in SCTP’s handshake is used to pass a cookie that maintains state and validate the endpoints; see Figure 3. The initialization steps for two SCTP endpoints A and Z is as follows: 1. Association [INIT] is sent. The Init contains endpoint A maximum inbound virtual streams, number of outbound virtual requested, and the address list of the host. 2. Endpoint Z sends an [INIT ACK]. The [INIT ACK] contains endpoint Z’s [COOKIE] packet, Cookie_Z, number of virtual outbound streams, maximum number of inbound streams, and the address list. 3. Endpoint A sends the [COOKIE ECHO] packet. 4. Endpoint Z receives the [COOKIE ECHO] and sends the [COOKIE ACK]. Endpoint Z is now in an established state waiting to receive data. 5. Endpoint A receives the [COOKIE ACK] and the association enters an established state. User data is now allowed to be sent. During the initialization sequence, the number of inbound and outbound virtual streams, along with the address lists for the endpoints are exchanged. Each endpoint specifies its maximum number of inbound streams (MIS) and requests the number of outbound streams (OS). If the number of outbound streams is greater than the MIS, the OS must be adjusted to match the MIS stream count or the association must be aborted. If an SCTP endpoint needs multihomed support, an address list must be present in the initialization sequence. There are three possibilities for how addresses are listed in the [INIT] and [INIT-ACK] datagrams:
• No IP addresses list and no hostname. SCTP will use the IP address specified in the datagram header and no multihomed support is provided. • IP address list. SCTP will use the IP addresses for its transport address list and provide multihomed support. • A hostname. SCTP resolves the hostname to IP addresses; these addresses are used in the transport address list. TCP and SCTP Bulk Data Flow Once an association has been established, data can be sent between the endpoints. Both TCP and SCTP provide reliable data transmission through similar error- and flow-control algorithms. SCTP can send data using the same APIs as TCP and UDP as well as a new SCTP-specific APIs: • send( ) and recv( ) (TCP and SCTP only). • sendto( ) and recvfrom( ). • sendmsg( ) and recvmsg( ). • sctp_sendmsg( ) and sctp_recvmsg( ) (SCTP only). The following client-server source files (available electronically; see “Resource Center,” page 5) illustrate how to use the APIs with the SCTP protocol to communicate between two applications: • server_recv.cpp and client_send.cpp illustrate how SCTP can be used like a TCP to communicate between a client and a server. • tcpserver_recv.cpp and tcpclient_send.cpp are a simple TCP client and server to illustrate that SCTP and TCP can overlap when compared to server_recv.cpp and client_send.cpp. • server_recvmsg.cpp and client_sendmsg.cpp illustrate how SCTP can send data using scatter/gather I/O APIs. sctp_ sendmsg( ) API is implemented using the recvmsg( ) and sendmsg( ) APIs. • server_sctprecvmsg.cpp and client_ sctpsendmsg.cpp demonstrate how TCP Initialization
int SCTPSock = socket(PF_INET, SOCK_SEQ_PACKET, IPPROTO_SCTP); void* pvSockAddrs; int nAddrCount = 2; pvSockAddrs = malloc( nAddrCount*sizeof(sockaddr_in)); //create addrs struct sockaddr_in* sa; sa = (sockaddr_in*)pvSockAddrs; sa->sin_family = AF_INET; ...
SYN S Create TCB SYN T / ACK S+1
sa = (sockaddr_in*)(pvSockAddrs + sizeof(sockaddr_in)); ... int nRet = sctp_bindx(SCTPSock, pvSockAddrs, nAddrCount, SCTP_BINDX_ADD_ADDR);
Example 3: Using sctp_bindx() to bind to multiple IP addresses. http://www.ddj.com
Create TCB
Dr. Dobb’s Journal, September 2004
ACK T+1
Figure 2: TCP initialization. 35
Endpoint A Start T1-init Timer
Cancel T1-init Timer
Endpoint Z
Init, Enter COOKIE-WAIT & Other Info
INIT ACK, cookie_z, & Other Info
Endpoint A
Destroy temp TCB
Start T1-init Timer, Enter COOKIE ECHO[cookie_z] Cookie Echo Build State TCB Enter Established COOKIE-ACK State Cancel T1-init Timer, Enter Established State
[SHUTDOWN] Check Outstanding Data Chunks
FIN Y
Create temp TCB and cookie_z
To understand the differences between how SCTP and TCP manage data transmission, you need to look at the protocol headers and control messages. TCP designates data being sent to an application by using the [PSH] flag in the TCP header. To ensure that data sent from an endpoint has been received, TCP uses an [ACK] that specifies which [PSH] packets have been received. [ACK]s are sent in response to [PSH] datagrams in two different circumstances: • When data has been received. • The ACK delay has been reached. TCP provides flow control and error control through the use of a receiving window. The receiving window informs the sender of the size of the buffer that is available to receive data. If the sender hasn’t received an [ACK] after filling the receiving window: • The sender must wait for an [ACK] to send new data. • Wait on the retransmission timeout to retransmit unacknowledged data. • A TCP endpoint will also retransmit data if an [ACK] for a specific data packet is continuously sent. This situation occurs if a packet is dropped or damaged. If packets are lost or damaged, an exponential back off is applied to the retransmission timer. The retransmission 36
SHUTDOWN
ACK Y+1 Destroy TCB
FIN WAIT-2 ACK Z+1
Check For Data Chunks
[SHUTDOWN PENDING] No More SHUTDOWN ACK No Data Can More Be Received Data Can Be Received SHUTDOWN COMPLETE
FIN Z Destroy TCB
Figure 4: TCP termination. timeouts are not readjusted until a packet that isn’t being requested for retransmissions [ACK] is received.
Figure 3: SCTP initialization. SCTP APIs allow communication between a client and server using multiple virtual streams.
Endpoint Z
SCTP’s model for bulk data flow uses many of the techniques that TCP has for sending data and flow control. SCTP handles dropped and delayed packets differently than TCP’s ability to manage virtual streams. If data is dropped then the [SACK] sent will contain the transition sequence number (TSN) for the last packet received. The [SACK] packet contains Gap ACK blocks specifying which packets need to be retransmitted. SCTP also uses Gap ACK blocks to report that packets where dropped because buffers filled up and the data can’t be forwarded to the application. When packet loss occurs for one virtual stream, SCTP continues to deliver data on other virtual streams while delaying the affected virtual stream. SCTP supports the ability to deliver data on a virtualized stream in an ordered or unordered fashion to an application. SCTP supports message bundling and fragmentation. SCTP fragments a message if it is larger than the smallest maximum transfer unit (MTU). SCTP doesn’t support refragmenting data chunks. If SCTP messages are smaller than the smallest MTU value, SCTP will bundle packets into a single data chunk to improve performance. When SCTP bundles packets, all control chunks are packed at the beginning of the SCTP packet. Termination Once an association has been established, data can be sent between the endpoints until the association is terminated or severed. The two network APIs that allow for the termination of an open association are Dr. Dobb’s Journal, September 2004
Figure 5: SCTP termination. shutdown(socket,how) and close(socket). The close( ) terminates the SCTP association. shutdown( ) terminates an association differently for SCTP than TCP because SCTP doesn’t have to have closed semantics. See draft-ietf-tsvwg-sctpsocket08.txt for the semantics of shutdown( ). A connection-oriented protocol uses the termination sequence to notify endpoints that data can no longer be received and that system resources should be released. The TCP termination sequence is a fourway handshake. Each endpoint sends a [FIN] and receives an [ACK] from its corresponding endpoint (see Figure 4). TCP allows for one endpoint to terminate its association and the other to still transmit data, resulting in a half open socket. SCTP’s association-termination sequence is a three-way handshake. SCTP doesn’t require a fourth step because it doesn’t support half-open socket communications. When an endpoint is entering the shutdown pending state, it will send all remaining data chunks and the [SHUTDOWN] chunk. When an endpoint receives a [SHUTDOWN], it sends the remaining data chunks and ACKS before the [SHUTDOWN ACK], refusing any new data and enters the shutdown-received state. The endpoint originating the [SHUTDOWN] datagram upon receipt of the [SHUTDOWN ACK] will send a [SHUTDOWN COMPLETE] finishing the shutdown sequence (see Figure 5). Conclusion SCTP is an exciting new transport layer protocol that offers an alternative to TCP. SCTP’s virtual streams and multihomed support can reduce application complexity and increase reliability. The similarities between TCP and SCTP APIs will ease porting and creating new applications using SCTP. DDJ http://www.ddj.com
Continuous Integration & .NET: Part II Continuous Integration and beyond… THOMAS BECK A common practice at Microsoft and some other shrink-wrap software companies is the “daily build and smoke test” process. Every file is compiled, linked, and combined into an executable program every day, and the program is then put through a “smoke test,” a relatively simple check to see whether the product “smokes” when it runs. — Steve McConnell “Best Practices: Daily Build and Smoke Test” IEEE Software, July 1996
W
hile unfamiliar to many developers, Continuous Integration is not a new concept at Microsoft, as is evident by Steve McConnell’s observation. Consequently, in this second installment of a two-part article (see DDJ, August 2004 for Part I), I examine the setup of a version-control system and configuration of a Continuous Integration tool that runs off of the version-control system in the .NET environment. I then integrate these tools into the build process to test for conformance with the Microsoft .NET Framework Design Guidelines, and finally address the issue of code-coverage testing.
CVSNT: Team Builds & Source-Code Control A source-code control system should be in place for all team development efforts. Thomas is a manager with Deloitte. He is currently working on the firm’s Commonwealth of Pennsylvania account and is located in Harrisburg, PA. Thomas can be reached at
[email protected]. http://www.ddj.com
As a matter of fact, this should be one of the first things that is put in place as part of an organized development effort. The reason I’ve waited until now to introduce version-control mechanisms in this article is twofold: • Version control is not critical to understanding the NAnt, NUnit, and NDoc examples in the first part of this two-part article. • Version control is a prerequisite for the next topic of this article — Continuous Integration with CruiseConrol.NET. Both NAnt and CruiseControl.NET integrate well with a variety of version-control systems. However, I use Concurrent Versions Systems (CVS), an open-source version-control system that runs on Linux and Windows. CVSNT is the Windows port of CVS (http://www.cvsnt.org/wiki/). CVS can be accessed from the command line using your IDE (both #Develop and Eclipse support CVS) or one of several popular open-source tools. For example, TortoiseCVS (http://www.tortoisecvs.org/) is a Windows plug-in that lets you access CVS capabilities from within the Windows File Explorer. Likewise, WinCVS (http:// www.wincvs.org/) is similar in appearance to Visual Source Safe’s user interface. When you have completed CVSNT setup, follow these steps to enter your code into version control: 1. Use the CVSNT Control Panel to add a repository named “Testing” underneath the default CVS directory. 2. Make a new module (/5-CVS, in this case) using the Password Server (:pserver) protocol within the repository that you created above. This creates a corresponding /5-CVS folder underneath your module folder. Figure 2 illustrates what this step would look like using TortoiseCVS (Figure 1 appeared in Part I). Dr. Dobb’s Journal, September 2004
3. Add all the contents of the root build directory to the CVS repository. This creates a number of additional CVS directories in your file structure. You should only be checking in source code, not the results of your build.
“You need a tool that produces code-coverage metrics” 4. When you have added the files, you need to do an initial commit of all of the files to CVS. Your code is now in version control and you can proceed to automatically check it in/out using the build program I present here. The changes to the build file to accommodate CVS are relatively minor; see Listing Six (Listings One through Five were presented in Part I). The goal of the changes is to make sure that the most current code is available prior to the build. The build-file changes amount to the addition of the update_source target, which contains the cvs-update NAnt task to update the build folder with the most current versions of our source code from CVS. When your build runs, notice that check-ins are occurring from CVS. You can test this by deleting some of the source files from your local directory and confirming that they are replaced with the versions from CVS prior to the build. The next step is to automate source control access with a continuous integration tool. CruiseControl.NET & Continuous Integration CruiseControl.NET is an open- source tool from SourceForge designed to meld 37
Figure 3: CruiseControl.NET Continuous Integration. Figure 2: Creating a new module with TortoiseCVS. several other open-source building and testing tools and completely automate the build process (http://sourceforge.net/ projects/ccnet/ and http://confluence.pulic .thoughtworks.org/display/CCNet/Download/). In fact, the goal of Continuous Integration tools such as CruiseControl.NET is to automate the reaction to build events (such as source-code modifications). In Figure 3, changes to the project’s source code trigger a build. This build causes the build site to be updated and, if necessary, results in e-mail notification of parties responsible for correcting unit testing errors. Coordination of these activities is the responsibility of CruiseControl. The source code for this section of the article is set up similarly to the source code to the previous example with one exception, namely that you are using two build files instead of one. The first build file (bootstrap) is responsible for updating the code on the build server from CVS, then invoking the main build file (CCNet). The main build file performs all the functionality of the build file from the previous section, with the exception of the CVS access now performed by bootstrap. Perform the initial addition and commit of the necessary files to a new CVS module, 6-CCNet, then perform an initial build by invoking the bootstrap build file. The first step in getting CruiseControl.NET up and running is to create a ccnet.config XML file that drives CruiseControl. This file controls the frequency of the integration cycles, defines the location of the version-control system and NAnt, details what 38
type of (if any) e-mail notifications should be associated with the build, and controls the XML logging that is used as the basis for the build web site. An example ccnet.config file is available electronically; see “Resource Center,” page 5. The file most likely needs to be tailored to accommodate the locations of specific files and programs on your filesystem. When you have the ccnet.config file in place, you can kick off CruiseControl by running the StartCCNet.bat job from the command line. The job should initiate an initial build and then continuously poll CVS until a change is made to the source code, at which point a new build will be started. Setting up the CruiseControl.NET web site is as simple as mapping an IIS virtual directory to the \web subfolder of your CCNet folder. Your continuous integration process is set up and ready to go. FxCop and .NET Framework Design Guidelines With the Continuous Integration process up and ready to go, it’s worth introducing a valuable tool that plugs right into CruiseControl. FxCop (http://www.gotdotnet .com/team/fxcop/) is an application that checks .NET managed code assemblies for conformance with the Microsoft .NET Framework Design Guidelines (http:// msdn.microsoft.com/library/default .asp?url=/library/en-us/cpgenref/html/ cpconnetframeworkdesignguidelines.asp). The FxCop tool tests for conformance in: • Library design. • Localization. • Naming conventions. • Performance. • Security. Dr. Dobb’s Journal, September 2004
Once FxCop is installed, you need to perform two steps: 1. Set up an FxCop task in the NAnt build file (see Listing Seven). 2. Merge the XML output produced by FxCop with the other input for CruiseControl (Listing Eight). Step 1 invokes the FxCop commandline utility using the exec task. A custom NAnt task for FxCop is not yet available. The second step, preparing the FxCop XML output for consumption by CruiseControl is done in the ccnet.config file. In this case, the xmllogger’s merging node (Step 2) is changed as in Listing Eight. After making these changes, create a separate CVS folder for this build and redeploy the modified ccnet.config file. Once you’ve done this and kicked off your build with CruiseControl, you should be able to use the FxCop link on CruiseControl to view your conformance with the .NET Framework Design Guidelines as in Figure 4. You can then correct your code to comply with the guidelines. It is also possible to add custom rules to FxCop to accommodate any special guidelines in place in your particular organization. Clover: How Thorough is Your Testing? With the entire testing and integration process automated, you must consider how well all those unit tests that you’re running are covering the different paths of execution within your code. To determine this, you need a tool that produces code-coverage metrics. Previously, such tools were usually only available as a part of large, usually fairly expensive, http://www.ddj.com
testing suites. Within the last few years, Clover, an inexpensive (though not open source) tool, has emerged that performs Java code-coverage testing, produces detailed reports and integrates with Ant (http://www.thecortex.net/clover/index.html). Originally developed as an internal tool at Cortex to support J2EE development, Clover is currently being ported to .NET and promises to be as powerful a tool for Continuous Integration in the .NET environment as it is in Java. A special limited 30-day licensed version of Clover is available electronically with the source code of this article. This Alpha release version of Clover has some limitations; for instance, only C# is currently supported. If the Java version is any indication, expect significant enhancements to this tool, its reporting functionality, and its integration with standard open-source tools as it moves towards a production- ready product. The process of testing your code coverage using Clover.NET can be best explained using the example of the new Clover target added to our build file; see Listing Nine. This target contains many familiar elements from previous sections and certain elements specific to Clover. Enumerated, the Clover target proceeds as follows:
ough testing is across the different modules of a project. This type of feedback, combined with the unit test results, allows for the most effective refactoring of the code and unit tests.
Figure 4: FxCop .NET guideline compliance report.
Conclusion Establishing a Continuous Integration environment using standard tools was a task that, just a few short years ago, was out of reach on a Microsoft development project. In the past several years, the number
Figure 5: Clover code-coverage report.
1. Execute the program CloverInstr to create an instrumented version of the C# file and a Clover Coverage Database (.cdb). The instrumented C# file contains special “tracers” to facilitate linking back to the coverage database. 2. Compile the instrumented C# program making sure to include a reference to the Clover and CloverRuntime DLLs. 3. Compile and run the NUnit tests against the business object DLL created from the instrumented C# file. 4. Execute the Clover HtmlReporter program to produce the HTML code-coverage report. Figure 5 is a sample report. It is worth paying attention to some of the Clover coverage features in Figure 5. The results include the code coverage for our class (BusObjCS.cs) as a percentage. If you had more than one class in your project, code-coverage results would also be summarized at the project level. The detailed coverage results for our class are below the coverage summary. Simple code highlighting indicates what was covered (line 23, six times in our tests), what wasn’t covered (lines 18 and 24), and what need not be covered (lines 1– 9). This type of visual detail lets you see exactly what is being tested, what’s not being tested, and to achieve a quick overview of how thorhttp://www.ddj.com
Dr. Dobb’s Journal, September 2004
39
and quality of open-source build, testing, documentation, and integration tools supporting the .NET environment has increased drastically. Using the examples in
this article, you should be able to weave together a Continuous Integration environment for your project that pays a hefty return on the comparatively minimal
Listing Six
Listing Seven
<exec program="${fxcop.exe}" commandline="/f:${fxcop.src} /o:${fxcop.out} /s" failonerror="false"/>
Listing Eight <mergeFiles>
c:\NETArticle\07FxCop\cs_build\BusObjTstCS.dllresults.xml c:\NETArticle\07FxCop\cs_build\BusObjCS.ccnetfxcop.xml
Listing Nine
<mkdir dir="${clover.dir}"/> <exec program="c:\program files\cenqua\clover.net\CloverInstr.exe" commandline="src\BusObjCS.cs -d ${clover.dir} -i ${clover.dir}\BusObjCS.cdb"/>
40
amount of time and resources that you need to invest. DDJ
imports="System,System.Data,System.Data.SQLClient, System.Collections.Specialized,System.XML"> <sources> <sources> <exec program="c:\program files\cenqua\clover.net\ HtmlReporter.exe" commandline="-i ${clover.dir} \BusObjCS.cdb -o report -t Test"/>
Dr. Dobb’s Journal, September 2004
DDJ
http://www.ddj.com
RFID Blocker Tags Protecting personal privacy BURT KALISKI
R
adio frequency identification (RFID) tags can be used as an advanced electronic version of the UPC barcodes that identify just about everything we purchase — from groceries to laptops. An RFID tag consists of a small integrated circuit attached to a small antenna and is capable of transmitting a unique serial number a distance of several meters to a reading device in response to a query. Most RFID tags are passive, meaning they do not contain batteries, relying instead on power obtained from the query signal itself. RFID tags are already quite common. Some popular examples include proximity cards used as replacements for metal door keys, theft-detection tags attached to consumer goods such as clothing, and the small dashboard devices for automating highway toll payments. Improvements in cost and size will encourage the rapid proliferation of RFID tags into many new areas of use. For example, a U.S. consumer goods manufacturer has recently ordered half a billion tags for use in retail environments. Over the next few years, manufacturers and retailers plan to embed or attach RFID tags to all types of products, theoretically reducing theft and making automated checkout, product returns, and inventory audits remarkably fast. Burt is chief scientist at RSA Security and director of RSA Laboratories. He can be contacted at
[email protected]. 42
While the many benefits of RFID are attractive, the technology rightfully has some consumers concerned about privacy. The expected surge in use of RFID tags in all aspects of our lives introduces serious threats. The simplest RFID tag broadcasts its ID serial number— that is, its electronic product code (EPC) — to any nearby reader. This presents a clear potential for privacy violations. What happens when a consumer wears his expensive new sneakers with a still functioning RFID tag to the store where he purchased them? Will the store read the tag again and correlate it with his previous purchase? Who’s to stop a retailer (or anyone else) from reading RFID tags on any purchased item, tracking consumer behavior or movements? What’s to protect anyone from secretly being given tags so they can be tracked or spied on by a private detective, spouse, parent, or even by an employer? Privacy has been recognized as a potential barrier in the widespread adoption of RFID technology for some time. Along with other researchers, RSA Laboratories (where I work) is developing RFID designs that promote adoption, while protecting the privacy rights of individual consumers. In this article, I explore a “blocker tag” technique that can make significant strides in the fight for privacy protection in relation to RFID tags. Protecting Privacy The most straightforward approach to privacy protection is to “kill” RFID tags before they are placed in the hands of consumers. A killed tag can never be reactivated. While the “kill tag on purchase” or deactivation approach may address many or even most instances of potential concern for privacy, it is unlikely to fit the bill in all scenarios. There are many applications for which such simple measures would be undesirable despite Dr. Dobb’s Journal, September 2004
the privacy assurances. For example, consumers may wish RFID tags to remain operative while in their possession to facilitate merchandise returns or perhaps to enable a future “electronic medicine cabinet” to verify that they have the correct prescriptions.
“An RFID reader is really only able to communicate with a single RFID tag at a time” With another approach, an RFID tag may be shielded from scrutiny using what is known as a “Faraday Cage”— a container made of metal mesh or foil that is impenetrable by certain frequencies of radio signals. Small-time thieves are already using foil-lined bags in retail shops to avoid detection when shoplifting. In the same spirit, currency notes of the future with active RFID tags could be protected with foil-lined wallets. However, RFID tags will inevitably be deployed in a vast range of applications that cannot be placed conveniently in metal containers. Faraday cages thus represent, at best, a partial solution to consumer privacy. Active jamming, a low-tech means of shielding RFID tags from “view,” is another http://www.ddj.com
High-Tech Privacy Protection A more high-tech approach would involve the use of cryptographic methods. These techniques are exceptionally challenging to design, given the cost constraints associated with the basic RFID tag. Protocols based on public-key cryptography can readily address privacy concerns, but are likely to be too expensive to implement universally in RFID
method of protecting privacy. In this scenario, consumers carry a device that actively broadcasts radio signals to effectively block and/or disrupt the operation of any nearby RFID readers. This approach may be illegal — at least if the broadcast power is too high — and is a most assuredly crude approach, possibly causing severe disruption of all nearby RFID systems (even legitimate applications).
tags in the foreseeable future. Protocols based on symmetric cryptography may become practical to implement sooner, but they must be carefully designed to preserve privacy. Consider that for a reader to determine which symmetric key to use when communicating with an RFID tag, the tag must somehow identify the key to the reader. This process itself might still enable the RFID tag to be tracked.
Blocker Tag Details ARI JUELS
T
he blocker tag devised by Ron Rivest, Michael Szydlo, and myself is itself meant to be a simple, passive RFID device, similar in cost and form to an ordinary RFID tag. However, a blocker tag performs a special function. When a reader attempts to scan RFID tags that are marked as “private” (in a manner to be discussed below), a blocker tag jams the reader. More precisely, the blocker tag cheats the tag-to-reader communications protocol in such a way that the reader sees many billions of nonexistent tags and, therefore, stalls! One of the chief design challenges in RFID systems is the prevention of radio collisions among messages transmitted by tags. If all tags were to broadcast their serial numbers simultaneously, the result would effectively be noise. That is, it would be hard for the reader to disentangle the many overlapping signals. To resolve this problem, reader-to-tag RFID communication protocols impose a careful scheduling on tag messages. One such protocol is tree walking. In this protocol, the space of n-bit tag serial numbers is viewed in terms of a binary tree of depth n. If you imagine each left branch as being labeled “0” and each right branch labeled “1,” then a path from the root to a given node at depth k specifies a k-bit serial-number prefix. Hence, each leaf of the tree corresponds to a unique tag serial number. Tags generally carry identifiers of at least 96 bits in length. The tree representing serial numbers, therefore, has a depth of at least 96. To determine which tags are present, a reader performs a kind of depth-first search on the tree. It starts at the root of the tree, asking all tags to broadcast the
Ari is Principal Research Scientist at RSA Laboratories and can be contacted at
[email protected].
http://www.ddj.com
first bit in their serial numbers. If the reader receives only a “0” transmission, it knows that all tag serial numbers begin with a “0,” and recurses on the left subtree; likewise, if the reader receives only a “1,” it recurses on the right subtree. If the reader detects a broadcast collision (that is, both a 0 and a 1), it recurses on both subtrees for the node (see Figure 1). At any given internal node of the tree, the reader performs this same operation; first, though, it transmits a command that causes only tags in the corresponding subtree to respond to its query. Once the reader has completed the process of traversing the tree and determined which tags are present, the process can communicate individually with tags by addressing them using their serial numbers. (This process of identifying individual tags is often called “singulation.”) In brief, then, to determine tag identities, the reader repeatedly asks subsets of tags to broadcast the bit value of their serial numbers in a particular position. The way that the blocker jams the reader is simple: When the reader asks a subset of tags to transmit their next bit, the blocker transmits both a 0 and a 1, simulating a broadcast collision. The result is that the reader believes that all possible tags are present. Since serial numbers are generally at least 96 bits in length, this means that a blocker can
cause a reader to perceive more than an octillion (1027 ) tags — in other words, far too many to count! For a blocking system to work productively, the blocking process must be selective. A blocker tag should only jam a reader if the reader tries to scan tags marked “private.” To achieve this, you can employ a system of zoning, in which half of the serial-number tree is regarded as “public” and the other is marked as “private.” For example, the left half of the tree, consisting of all serial numbers starting with a 0, might be treated as a private zone, while the other half is treated as a public zone. The first bit of any serial number in this case would indicate the public or private status of a tag. A blocker would naturally simulate collisions only in the “private” portion of the tree. In other words, a blocker tag would never prevent scanning of “public” tags in the left half of the tree, but would always do so in the right half of the tree. To change the zone of a tag from public to private in, say, a supermarket, it would suffice to flip the first bit of the serial number. (Like the “kill” command, this operation would require PIN protection to defend against malicious rezoning.) Many variant ideas are possible here. For instance, it is possible to introduce multiple zones with different privacy policies. DDJ
0 00
000
001
1
01
010
10
011
100
101
11
110
111
Figure 1: Tree-walking example: Solid circles at leaves represent tags. Subtrees with tags are searched recursively; the dark line shows the path of depth-first search to the leftmost tag, denoted 001. Open circles indicate collisions, where tags occur in both left and right subtrees.
Dr. Dobb’s Journal, September 2004
43
Conventional cryptography thus does not immediately suggest a generalpurpose, practical solution to the privacy concerns with RFID tags, but research into new approaches is showing promise. One approach is to go beyond protecting the data — the traditional province of cryptography — and instead look directly at the tag-to-reader interactions itself. Blocker Tags An RFID reader is really only able to communicate with a single RFID tag at a time. If more than one tag responds to a query by the reader, the reader detects a “collision” and cannot accurately read the information transmitted by the tags. The reader and RFID tags then need to engage in some sort of protocol so the reader can communicate with the conflicting tags one at a time. Such a protocol is called a “singulation protocol.” This is the premise behind RSA Laboratories simple “blocker tag” scheme for RFID tag privacy protection (for more information, see the accompanying text box entitled “Blocker Tag Details”). Blocker tags selectively exploit the standard tree-walking singulation protocol. The blocker tag does not actively engage in jamming. Rather, by participating in the tag-reading process in a noncompliant
44
manner, the blocker initiates passive jamming. When carried by consumers, blocker tags thus “block” RFID readers. To avoid
“The soft blocker is a low-cost method for honest merchants to add privacy protection to their systems” interfering with the legitimate business uses of RFID tags, a blocker tag blocks selectively by jamming only selected subsets of ID codes, such as those in a designated “privacy zone.” We believe that this approach, when used with appropriate care, provides an attractive alternative
Dr. Dobb’s Journal, September 2004
for addressing privacy concerns raised by the widespread use of RFID tags in consumer products. It is possible that the blocker tag concept be used for malicious purposes, for instance, as a tool for mounting denial-ofservice attacks. Such a blocker tag might shield either the full spectrum of serial numbers from reading or might target a particular range; for example, the set of serial numbers assigned to a particular manufacturer. A blocker tag of this form might be used to disrupt business operations or to help perpetrate petty theft by shielding merchandise from inventory control mechanisms. However, mechanisms — such as detecting blocker tags by monitoring for reports of unusually high numbers of tags — can be put in place to protect against fraudsters. Moreover, by design and intent, the blocker tag solutions that RSA Laboratories is developing are limited to privacy protection and do not facilitate such malicious uses. Our blocker tag approach to RFID privacy protection is particularly attractive because of the low cost associated with the implementation of such an effective privacy solution. • Ordinary consumer-product RFID tags may not need to be modified at all,
http://www.ddj.com
other than to have the ability to indicate whether they have been purchased or not (that is, whether they are in a privacy zone). The RFID tags don’t need any expensive cryptography. • Blocker tags themselves can be very affordable because they consist essentially of just one or two standard RFID tags, with minimal circuit modifications. If a standard RFID tag can be made for five cents, a blocker tag can probably be manufactured for roughly 10 cents in quantity. • The infrastructure behind such an implementation is manageable — limited to a PIN for each standard RFID tag that authorizes changes to privacy zones. Already, such PIN management is needed for the default remedy to privacy concerns, which is tag deactivation, to ensure that it is difficult for an unauthorized party to deactivate a tag. Blocker tags build on the same infrastructure, while leaving the option of continuing to use a tag after the point of sale, rather than killing it.
like the Platform for Privacy Preferences protocol (P3P) for privacy on the Web. The soft blocker is a low-cost method for honest merchants to add privacy protection to their systems. Also, because it is in software, it supports a flexible set of privacy policies. Conclusion The use of selective blocking by blocker tags lets consumers choose when to hide certain RFID tags from scanning, and when to reveal those same tags for scanning. By allowing RFID prefixes to be rewritten, tags can be moved in or out of privacy zones protected by various blocker tags. The scientists at RSA Laboratories believe that blocker tags are a potent and useful tool for protecting con-
sumer privacy, and recommend their incorporation into the portfolio of tools for building confidence into the promising new RFID infrastructure. For More Reading A. Juels, R.L. Rivest, and M. Szydlo. “The Blocker Tag: Selective Blocking of RFID Tags for Consumer Privacy” in 8th ACM Conference on Computer and Communications Security: ACM Press, 2003 (http:// www.rsasecurity.com/go/rfid/). A. Juels. “Soft Blocking: Flexible Blocker Tags on the Cheap.” Unpublished Manuscript. 2003. (http://www .rsasecurity.com/go/rfid/). DDJ
The RxA Pharmacy Demo At the 2004 RSA Conference, RSA Laboratories publicly demonstrated the use of blocker tags as a method for protecting consumer privacy. The demonstration took the form of a mock pharmacy, whose visitors received RFID-tagged prescription bottles filled with jelly beans offering “tranquility,” “wisdom,” and “happiness.” (“Wisdom” was the most popular, favored by about 40 percent of visitors.) After explaining the initial tradeoff between convenience and privacy engendered by RFID tags, the mock “pharmacists” then showed how technologies such as blocker tags can help, by enclosing the RFID-tag bottle in a paper prescription bag, affixed to which was a demonstration version of the blocker tag. When a purchased bottle was scanned apart from the bag, the “checkout” display indicated the bottle’s identifier (and if the prescription had been purchased, the visitor’s name or pseudonym). But when the bottle and bag were scanned together, the display read “blocked.” The demonstration offered the same “customer experience” as with the “full” blocker tag described earlier, but used another type of tag that RSA Laboratories is working on — the “soft” blocker. Although it should be possible to implement the full blocker tag in fairly straightforward fashion, the soft blocker is even easier to implement because it is a standard RFID tag. Rather than interfering with the tagto-reader protocol, though, the soft blocker relies on the reader or the application to enforce a privacy policy — somewhat http://www.ddj.com
Dr. Dobb’s Journal, September 2004
45
Optimizing Pixomatic For Modern x86 Processors: Part II Challenging assumptions about optimization MICHAEL ABRASH
I
n the first installment of this three-part article, I introduced the optimization challenges we faced with Pixomatic, the 3D software rasterizer that Mike Sartain and I developed for RAD Games Tools. This month, I start by examining the “welder”— a streamlined compiler custom designed to compile code very quickly— and look at issues such as pixel pipeline code. That said, the welder doesn’t exactly compile code. It might be better described as splicing together prewritten, hand-tuned segments of assembly language, performing fixups (like branch targeting), and conditional code generation (like selecting between texture wrapping and clamping) according to pseudocode commands embedded directly in the code stream. The bulk of the code is simply copied directly from the code stream to the execution buffer and no sophisticated analysis is performed — at worst, a few pseudocode conditionals are executed — so the process is very fast.
Michael is a developer at RAD Games Tools and the author of several legendary programming books, including Graphics Programming Black Book. He can be contacted at
[email protected]. 46
In the end, the overhead of welding proved not to be a problem at all; we have yet to see a profile where it reaches even 1 percent of the total time. Once again, it turned out that we had overanalyzed and overoptimized up front, instead of just trying it out and seeing how it worked. We also use welding in some other key places, such as handling arbitrary vertex formats, but by far the biggest application of it is in the pixel pipeline. The welded pixel pipeline code is a good example of circumstances in which good assembly code can still beat highlevel compiled code by a wide margin. In general, high-level compilers are quite good at general-purpose x86 code, especially given CPUs that can do out-oforder processing, such as the Pentium III and Pentium 4. For general-purpose code, it’s hard to beat a high-level compiler by a significant amount, and I have had the experience a couple of times recently of rewriting C code in assembly and not getting any faster at all. I should mention, however, that we recently did a pass through the C code for the Miles Sound System for the PlayStation 2, hoisting invariants out of loops and otherwise compensating for relatively poor compiler optimization to the tune of a 30 percent speedup. When we propagated the changes back to Windows, we found we had gotten nearly a 10 percent speedup. Since there were a lot of changes, I’m not sure what the key was, but I suspect it was a combination of turning multidimensional array accesses into pointer accesses and using local pointers so the compiler could tell there was no potential for aliasing. Where assembly can win big — by two to four times — is with code that’s Dr. Dobb’s Journal, September 2004
amenable to MMX and SSE because of the four-way parallelism they support and because of the extra registers they make available. True, MMX and SSE can be accessed via compiler intrinsics, but in my experience, those intrinsics rarely generate good code and often actually result in
“The welded pixel pipeline code uses MMX but not SSE registers” worse performance than normal compiled code. All those complications go away with assembly, and the full power of MMX and SSE can be tapped. On top of four-way parallelism, SSE and MMX also triple the number of available registers, and each of the new registers can contain multiple values; that’s a huge win for a processor as register constrained as the x86. In fact, the MMX register set is large enough so that no dynamic register allocation is needed in the Pixomatic pixel pipeline, and no spilling to memory is required, except in a few cases involving bilinear filtering or 24-bit z buffers with stencil. The ability to have a static register allocation not only improved performance but also greatly simplified the welder because all features can be http://www.ddj.com
(continued from page 46) turned on/off independently of one another with no register-allocation complications. The welded pixel pipeline code uses MMX but not SSE registers. This is partly because the pixel pipeline is entirely fixed point for performance reasons, and partly because integer SSE2 instructions are currently half the speed of MMX instructions. (In exchange, integer SSE2 can do twice as much work as MMX per instruction, but we couldn’t figure out any way to take advantage of that in the pixel pipeline.) Mostly, though, we didn’t use SSE registers in the pixel pipeline because we were able to do almost everything we wanted to without them, and it simplified things considerably to not have to support two different pipelines, one with SSE registers and one without. (We do, however, weld enhanced MMX instructions that were added as part of SSE— especially pshufw — into the pixel pipeline when they’re supported, by using processor feature conditionals in the welder.) We would have liked to have supported three or more textures, which SSE2 integer registers would have helped with, but then we would have had the problem of how to support that feature on nonSSE2 platforms. Table 1 shows the pixel pipeline register allocation. All scratch registers are used for various purposes. Not only are all eight MMX registers in use, but so too are all eight general-purpose registers — including ESP, since the code requires no stack. Many people don’t realize that under Windows it’s safe to use ESP as a general-purpose register because the OS switches to a system stack when fielding interrupts and context switching. A
R
G
B
63
0
Figure 1: MMX pixel format. Each field has 8 integral bits; the number of fractional bits varies throughout the pipeline. EAX EBX ECX EDX ESI EDI EBP ESP MM0 MM1 MM2 MM3 MM4-MM7
Scratch register Z-buffer pixel address Loop counter Texture 0 pointer Span-list pointer Pixel-buffer pixel address Texture 1 pointer 1/z Texture 0 coordinates (u0, v0) Texture 1 coordinates (u1, v1) Gouraud color Specular color Scratch registers
Table 1: Pixel pipeline register allocation. 48
(Note, though, that the code is no longer multithread-safe unless thread-local storage is used because ESP must be stored somewhere other than the stack so it can be restored at the end of the routine.) Register allocation is one reason why we ended up supporting only two textures; as you can see, there were no more registers to hold additional U and V coordinates and texture pointers. I’m going to digress for a minute and move up one level to show even more intensive register use. The welded pixel processing code rasterizes a set of horizontal spans, each between 1 and 16 pixels in length. That set of spans comes from the span-generation code, which calculates all the parameters for each span, including the perspective-correct texture coordinates at each end. (The pixel pipeline itself just interpolates U and V linearly; while this introduces some error, in practice it’s indistinguishable from perfect perspective correction, and, because it avoids having to do a divide per pixel, is much faster.) As I mentioned earlier, there are three versions of the span-generation code. The most interesting by far is the SSE version, which uses not 16 but 23 registers plus the stack pointer, as in Table 2. And, while the span generation code isn’t terribly register constrained, it could still have used a few more registers had they existed. Pixel Pipeline Performance Challenges There were two key performance areas that we had to address in the pixel pipeline: pixel processing and texture mapping. Once we’d decided to use MMX, pixel processing fell out nicely because there aren’t too many possible ways to use MMX to work with pixels. Figure 1 shows the pixel format Pixomatic uses. Each color component is stored as a 16-bit fixed-point value; the placement of the fixed point varies, depending on what the last and next operations are. (The variation is due to the alignment requirements of multiplication, which returns only the high or low 16 bits of the result, and also the need at times to perform saturating operations and to clamp results.) At all times, however, color values contain eight integral bits, so processing is always of 8-bit color components or better. MMX mapped quite well to pixel processing. It would have been nice to not have had to fiddle with bit placement to get multiplication to work, and it would have been even nicer to have been able to protect certain fields on any operation so that RGB could be modified without affecting alpha and vice versa. Still, those were minor inconveniences, and between Dr. Dobb’s Journal, September 2004
the parallelism, clamping, faster multiplies, and additional, larger registers, I’d estimate that MMX enabled us to perform pixel processing something like five times faster than would have been possible otherwise. In fact, the pixel pipeline is the one part of Pixomatic for which there is no fallback C code. MMX, driven by welded code, is required because performance would otherwise be so poor as to make Pixomatic useless. The other big performance aspect of the pixel pipeline is texture mapping, and this was the toughest challenge. In texture mapping, U and V (the horizontal and vertical texture coordinates) each have to be linearly interpolated, with sufficient fractional bits to allow for subtexel precision, and then the integral parts of U and V (after scaling up to the texture dimensions), have to be butted together and gotten into a general-purpose register so they can be used to address the texture. The desire to process U and V in parallel makes texture mapping a natural for MMX, but the part about butting the integral parts together is not a good fit. MMX is good at doing parallel processing of independent components, but poor at shuffling data around. This is particularly true with any granularity less than 16 bits, and the granularity for texture mapping varies from 0 to 12 bits, depending on the texture size. I beat my head against this for a while, but wasn’t able to come up with anything I was happy with. Finally, Jeff Roberts said that in cases like this on other chips, he’d found various generalized pack/shift/shuffle instructions to be the way to go, and he tended to think of the MMX pack, unpack, and especially pshufw instructions as limited versions of those. I’m not exactly sure why, but that immediately broke the logjam; Listing One shows the entire sequence used to handle texture mapping, from start to finish, in just six instructions. The first instruction causes the texture coordinates to wrap. The second, third, and fourth instructions store the combined U and V in eax. The fifth instruction loads the texel into mm7, and the last instruction advances U and V to the next texel. The key here is storing U and V so that they have enough fractional bits for subtexel accuracy, yet can be butted together with just one instruction, then right justified with a shift. Figure 2 shows how that works. In Figure 2, the code is working with a 256×256 texture, so the texture is addressed with 8-bit integer U and V values. V is stored so that its integral part is right justified to bit 48, and U is stored so that its integral part is left justified to bit 31. This allows a pshufw to butt the two integral parts together at bits 15 and 16, after which the combined VU can be right http://www.ddj.com
(continued from page 48) justified at bit 0 with a psrld; then it’s a simple matter to copy the result out to a general-purpose register and use it to address the texture. This is using MMX as it was not intended to be used. In the past, I’ve written about the importance of having a flexible mind when optimizing, and this is an excellent example of that. It’s also a good example of how you need to step back from your work and think about different approaches every so often — or get someone else to do it for you, as Jeff did for me. Remember, the best optimizer is between your ears — but you can’t weigh it down with preconceptions if you want it to do its best work. Pixel Pipeline Code Keeping all that background in mind, Listing Two shows the welded pixel pipeline for the case of one point-sampled texture modulated with Gouraud shading, with z buffering. The top part is the stepping of the interpolators; we jump into the middle of the loop for the initial iteration, to save doing a wasted stepping at the end of the last time through the loop. (This also
63
00VV.vvvv UU.uuuuuu 48 47 32 31 16 15 0 PSHUFW
63
48 47
00VV UU.uu 32 31 16 15 0 PSRLD
63
48 47
0000VVUU 32 31 16 15 0
Figure 2: Conversion from fixed-point (U,V) coordinates to a texture address. EAX EBX ECX EDX ESI EBP EDI ESP MM0 MM1 MM2 MM3 MM4 MM5-MM7 XMM0 XMM1 XMM2 XMM3 XMM4 XMM5 XMM6-XMM7
Scratch register Scanline length 1/z Scratch register Pixel-buffer pixel address Span list pointer Z-buffer pixel address Stack pointer Previous span (u0, v0) Previous span (u1, v1) Gouraud GB components Gouraud AR components Specular GB components Scratch registers 1/w u0,v0,u1,v1 1/w2 Left edge 1/w2 Left edge 1/w Left edge u0, v0, u1, v1 Scratch registers
Table 2: Span generation register allocation. 50
makes it possible to align the top of the loop without executing any nop instructions.) This is followed by the z compare, then the texture mapping code, the Gouraud code, the packing and writing of the final pixel value, and the loop control. At 20 instructions per pixel, it’s pretty compact for all it does, reflecting the good correspondence between MMX and this particular scenario. Next, Listing Three shows the welded pixel pipeline for the case of one bilinear filtered texture, plus Gouraud shading. And here we see what happens when MMX doesn’t correspond so well to a scenario. It would sure be nice to have a hardware module that did bilinear filtering because there’s just no elegant way to do it in software, despite heavy optimization and some fairly clever tricks. That reminds me of an important lesson we learned while doing Pixomatic: It’s almost impossible to know exactly what’s going on with the performance of your code nowadays. The out-of-order processing of the Pentium 4 is so complex and the tools available for analyzing performance are so inadequate (alas, VTune has regressed in this respect), that to a large extent all you can do is try a lot of approaches and keep the one that runs fastest. I mention this in the context of the bilinear filter because that was where that lesson was driven home. You see, I came up with a way to remove a multiply from the filter code — and the filter got slower. Given that multiplication is slower than other MMX instructions, especially in a long dependency chain such as the bilinear filter, and that I had flat-out reduced the instruction count by one multiply, I was completely baffled. In desperation, I contacted Dean Macri at Intel, and he ran processor-level traces on Intel’s simulator and sent them to me. I can’t show you those traces, which contain NDA information, but I wish I could because their complexity beautifully illustrates exactly how difficult it is to fully understand the performance of Pentium 4 code under the best of circumstances. Basically, the answer turned out to be that the sequence in which instructions got processed in the reduced multiply case caused a longer critical dependency path — but there’s no way you could have known that without having a processor-level simulator, which you can’t get unless you work at Intel. Regardless, the simulator wouldn’t usually help you anyway because this level of performance is very sensitive to the exact sequence in which instructions are assigned to execution units and executed, and that’s highly dependent on the initial state (including caching and memory access) in which the code is entered, which can easily be Dr. Dobb’s Journal, September 2004
altered by preceding code and usually varies over time. Back in the days of the Pentium, you could pretty much know exactly how your code would execute, down to the cycle. Nowadays, all you can do is try to reduce the instruction count, try to use MMX and SSE, use the cache wisely and try to minimize the effects of memory latency, then throw stuff at the wall and see what sticks. Speedboy We did come up with an interesting tool for dealing with the uncertainties of Pentium 4 optimization, which we nicknamed “Speedboy.” You insert a segment of assembly code that you want to optimize into Speedboy’s timing loop, add additional information to indicate which instructions are directly dependent on the results of which other instructions, and kick off the run. Speedboy then times all valid permutations of the code and lets you know which arrangement is fastest. Speedboy does in fact work as advertised but is not as useful as we’d hoped, particularly for our welded pixel processing code. The first problem is that it’s tedious and error prone to manually determine and enter the dependencies. Often, it would turn out that Speedboy’s choice for fastest code didn’t actually work properly because we’d missed a dependency. That could be fixed by writing a code analyzer to determine dependencies, but that didn’t seem worth doing once we came to understand the second and third problems. By way of introducing the second problem, let me tell you the story of the first BSP compiler I ever wrote. I got it working and then I thought, heck, computer time is free, why not have it optimize the BSP tree while I’m at it? So I added code to have it try all possible configurations and started a new compiler run. My polygon set was nothing like a real game level; it contained only 20 polygons, so I figured the run would finish in a few seconds at most. After half a minute, however, I started to wonder, and after half an hour or so, I decided to do a few calculations. It turns out that the order of a bruteforce BSP optimizer is roughly N!. Even with only 20 polygons, that works out to about 2 times 10 to the 18th. If we assume that each tree takes one microsecond to analyze (in fact it took a good bit longer than that), then it would take more than 70,000 years to optimize my little toy level. Bump it up to a big level — say, one with 30 polygons (real levels, of course, have thousands or even millions of polygons) — and I’d have been waiting for my answer well into the heat death of the universe. And I mean that http://www.ddj.com
literally; we’re talking 8 billion billion years here. I’d imagine you can guess where I’m going with this. Speedboy’s order is somewhat less than N! — but, unless there are a lot of dependencies, not all that much less. Sequences of 10–20 instructions work great; sequences of 40 or more tend to make you wish that either computers were a lot faster or that reincarnation was a viable option, depending on the number of dependencies in the code. Some of our critical code segments are short enough to run through Speedboy, but a lot of them aren’t; for example, the code for one bilinear filtered texture with Gouraud shading would probably take longer to finish than my BSP optimizer running on a 30-polygon level.
Worse still was the third problem. If you’ll recall, I mentioned earlier that the performance of code on the Pentium 4 is highly sensitive to the exact sequence of instruction execution. Well, in welded code, the exact instruction sequence is different for every one of many trillions of pipeline configurations. Furthermore, because the loops iterate only 1 to 16 times, performance can be greatly affected by the exact state of the processor when the code is entered, which varies depending on the render state and the shape and size of the triangle. Consequently, the results reported by Speedboy tended to be extremely case specific. Often, Speedboy would find an optimization that would speed up a specific case by as much as 10 percent in the test bed, but when we
Listing One pand pshufw psrld movd movd padd
mm0,[WrapUV0Mask] mm5,mm0,0Dh mm5,[WrapUV0RightShift] eax,mm5 mm7,[edx+eax] mm0,[UV0Step]
Listing Two LoopTop: add adc paddsw paddd cmp ja mov pand pshufw psrld movd movd movq punpcklbw psllw pmulhw packuswb movd LoopBottom: inc jne
esp,dword ptr [_RotatedFixed16ZXStep] ; stepping esp,0 mm2,mmword ptr [_argb7x_GouraudXStep] mm0,mmword ptr _Spans+20h[esi] sp,word ptr [ebx+ecx*2] ; z buffering LoopBottom word ptr [ebx+ecx*2],sp mm0,mmword ptr [_TexMap] ; texture mapping mm5,mm0,0Dh mm5,mmword ptr [_TexMap+28h] eax,mm5 mm7,dword ptr [edx+eax*4] mm6,mm2 ; Gouraud shading mm7,dword ptr [_MMX_0] mm7,1 mm7,mm6 mm7,mm7 ; pixel pack/write dword ptr [edi+ecx*4],mm7 ecx LoopTop
paddw pshufw psrlw psllw pmulhw pmulhw paddw paddw packuswb movq punpcklbw psllw pmulhw packuswb movd LoopBottom: inc jne
put it into Pixomatic, the benefit would vanish completely or would show up in that one case, but not in similar cases. So, in the end, Speedboy wasn’t much help with the welded pixel pipeline code. On the other hand, it was useful in nonwelded, high-repetition cases like the 2D blit code. On balance, Speedboy was a cool idea and modestly useful, but not a big win for Pixomatic. Next Month I’ll wrap up in next month’s installment with an examination of out-of-order processing, branch prediction, and how HyperThreading technology fits into the mix. DDJ
mm6,mm7 mm7,mm0,0AAh mm7,6 mm5,6 mm4,mm7 mm7,mm5 mm6,mm4 mm7,mm6 mm7,mm7 mm6,mm2 mm7,dword ptr [_MMX_0] mm7,1 mm7,mm6 mm7,mm7 dword ptr [edi+ecx*4],mm7 ecx LoopTop
DDJ
; loop control
Listing Three LoopTop: add adc paddsw paddd cmp ja mov pand pshufw psrld movd movd pslld add and paddw psrld movq psrld pand movd movd punpcklbw movd punpcklbw pshufw add and punpcklbw movq movd punpcklbw psubw psubw psubw psubw pmullw psraw pmullw
esp,dword ptr [_RotatedFixed16ZXStep] esp,0 mm2,mmword ptr [_argb7x_GouraudXStep] mm0,mmword ptr _Spans+20h[esi] sp,word ptr [ebx+ecx*2] LoopBottom word ptr [ebx+ecx*2],sp mm0,mmword ptr [_TexMap] mm6,mm0,0Dh mm6,mmword ptr [_TexMap+28h] eax,mm6 mm7,dword ptr [edx+eax*4] mm6,mmword ptr [_TexMap+28h] eax,dword ptr [_TexMap+0F4h] eax,dword ptr [_TexMap+0F8h] mm6,mmword ptr [_TexMap+40h] mm6,mmword ptr [_TexMap+28h] mm4,mm0 mm4,mmword ptr [_TexMap+48h] mm4,mmword ptr [_MMX_0x003F003F003F003F] mm5,dword ptr [edx+eax*4] eax,mm6 mm7,dword ptr [_MMX_0] mm6,dword ptr [edx+eax*4] mm5,dword ptr [_MMX_0] mm4,mm4,0 eax,dword ptr [_TexMap+0F4h] eax,dword ptr [_TexMap+0F8h] mm6,dword ptr [_MMX_0] mmword ptr [_MMX_UFrac],mm4 mm4,dword ptr [edx+eax*4] mm4,dword ptr [_MMX_0] mm6,mm7 mm4,mm5 mm5,mm7 mm4,mm6 mm6,mmword ptr [_MMX_UFrac] mm6,6 mm4,mmword ptr [_MMX_UFrac]
http://www.ddj.com
Dr. Dobb’s Journal, September 2004
51
Band-In-A-Box, Finale, & MusicXML Converting file formats using an XML dialect AL STEVENS
W
hen I retired the DDJ “C Programming” column in 2003, I thought my coding days were just about over and I would be a fulltime jazz musician and teacher. Computers would be only for music applications and, whenever possible, I would use existing software. I wouldn’t write a program unless I couldn’t find one to do what I needed. Given all the music applications for Windows and Linux, I figured I’d never have to write code again. My web site (http://www.alstevens.com/) contains little to indicate that I made my living as a programmer. Not that I didn’t enjoy programming. But I have other things to do now and all you dot-netters, see-sharpers, and ex-em-ellers can carry the banner while I pursue a different muse. I didn’t realize, however, that as I got into using my computer only for music applications, I’d find more problems waiting to be solved. This article is about one such solution — how to convert Band-In-A-Box song files into Finale notation files by using MusicXML as a porting medium. XML & MusicXML Most programmers know what XML is. If you program Internet applications, you
Al is DDJ’s senior contributing editor and a professional musician. He can be contacted at
[email protected]. 52
can’t escape it. XML is becoming the language of the Internet. A text-based markup language to describe document content and structure, XML is, among other things, the next-generation HTML, but its capabilities go far beyond describing web pages. XML can describe the architecture of virtually any kind of structured information to be exchanged between platform-independent applications. XML has been used, for example, to define the structure and control the content of relational databases. The World Wide Web Consortium (http://www.w3.org/) maintains XML and its standardization, and you can read all about XML at http://www .w3.org/TR/2004/REC-xml-20040204/. MusicXML (http://www.musicxml.org/) is a dialect of XML that describes music notation. All those cryptic lines and symbols that musicians read are implemented in XML Document Type Definition (DTD) files. The complete MusicXML DTD is at http://www.musicxml.org/dtds/. MusicXML could and should become a universal language for music applications to exchange data — if only application developers would buy into the concept of interchangeable data files. But most music notation applications — and there are many — use proprietary file formats and closed architectures. Developers do not open their file architectures lest they facilitate the porting of scores to competing notation applications. That would never do. Composers, arrangers, and musicians have other priorities, however, and we need a way to exchange data between music applications. If the Internet is to become the computing platform of the 21st century, it needs common data formats for every kind of document, and applications that aim to be Internet aware are goDr. Dobb’s Journal, September 2004
ing to have to abandon the closed architecture mentality to achieve that goal. Eventually, browser plug-ins will render MusicXML files as music notation. Until then, you need a MusicXML-friendly notation application to view the files in other than text format, and music notation on web pages continues to be portrayed only with graphical image files.
“MusicXML is a dialect of XML that describes music notation” MIDI for Notation Import/Export Rendering scores is one thing. Exchanging score data is another. Prior to MusicXML, the main way to move scores between music notation programs was to export them as Musical Instrument Digital Interface (MIDI) packets in Standard MIDI Format (SMF) files from one application and import those SMF files into the other. Most notation programs support SMF import/export. You can order the MIDI specification from http://www.midi.org/. Many books about MIDI describe the specification, too. SMF is the file format for storing MIDI data. You can find the SMF specification at http://help-site.com/local/midi.txt. http://www.ddj.com
Using MIDI to exchange notation gives less than perfect results. Much music notation has no standard representation in MIDI, a protocol that encodes music performances as event packets. MIDI packets encode what notes to play, how loud to play them, when to play them, what instruments to play them on, and what devices play those instruments’ sounds. Music notation does some of that, too, but notation was designed centuries ago for people to read and write. MIDI was designed decades ago for machines to read and write. Notation includes symbols and text meant to be interpreted by people in the context of rehearsals and performances, and these data include rehearsal numbers, chord symbols, endings, repeats, codas, cues, slurs, dynamics, expression, who’s soloing and who’s tacit, and so on. Many of these symbols are notational conveniences not represented in the typical MIDI sequence. Of the items on the list I just cited, the ability to move chord symbols between music notation applications is the one most relevant to the problem I address here. Because of these notational requirements, when you use an SMF file to export a score from one notation application to another, you must apply a lot of manual fix-up to restore the imported score to its original configuration. MusicXML addresses this problem. It contains markup tags for everything imaginable in a musical score. A score exported to MusicXML from one application retains virtually all its information when imported into another application. The only thing that might differ is how the two applications render the score with respect to music fonts, page layout, and so on. If you want to convert an SMF file into music notation, the best program for the job is MidiNotate (http://www.notation .com/). MidiNotate, however, is not a music notation editor. It doesn’t let you do much in the way of correcting the notation, and it does not export to MusicXML, so you can’t easily move MidiNotate’s notation data into a notation editor program. Scanning for Notation Import/Export Some notation programs let you scan printed music and convert the scanned bitmap into the application’s notational format. Likewise, programs such as PhotoScore (http://www.neuratron.com/photoscore .htm) and SharpEye (http://www.visiv.co .uk/) convert scanned scores into MusicXML, which you can then import into some notation editors. Any program that can print music notation can indirectly generate a graphical bitmap. I do it by printing to a file in http://www.ddj.com
PostScript format and using GSView (http:// www.ghostgum.com.au/) to convert the file to a TIF. I convert the TIF by using SharpEye, which treats the TIF as a scanned image and converts it to MusicXML. These programs work well, but they do not convert one item of notational information that I require in my work. They do not convert chord symbols. Band-In-A-Box I regularly use a program called Band-InA-Box for Windows (http://www.pgmusic .com/). If you are a musician, you’ve probably heard of it. Band-In-A-Box generates automatic musical accompaniments. It uses MIDI technology to produce accompaniments with drums, bass, piano, and strings. You enter a sequence of chord symbols and select a playback style, and Band-InA-Box does the rest. Figure 1 shows the Band-In-A-Box screen with the old swing tune “Back Home Again In Indiana” loaded and ready to play in the ZZJAZZ style. Band-In-A-Box is notorious for its nonstandard user interface and disorganized options procedures. But it’s the only program that does what it does, and it does it well. Until recently, Band-In-A-Box was available on the PC platform only as a 16bit application. The latest version, BandIn-A-Box 2004 is a Win32 application, but it still has that quaint UI that its loyal user base has come to love. I use Band-In-A-Box to practice in my studio and as a teaching tool. It provides a ready-made rhythm section for students learning to play jazz. Many musicians use Band-In-A-Box on the bandstand, others use it for practicing, and some use it in the studio to build backing tracks. The piano/bass/drums trio MP3s on my web site use Band-In-A-Box bass lines, which, in its jazz styles, can sound realistic after some tweaking with the Sonar sequencer program (http://www.cakewalk.com/ Products/SONAR/). Musicians find a wealth of standardtune files in Band-In-A-Box format available for download all over the Internet. Almost any popular tune from the early part of the previous century until the present time is available somewhere in BandIn-A-Box format. These files usually include the melodies of the tunes and often the lyrics. This enormous online resource is a boon to working musicians because BandIn-A-Box lets you transpose a tune to any key on either clef and print its music notation in lead sheet format. (See the accompanying text box entitled “Lead Sheets and Fake Books.”) I won’t say where you can find such files lest I alert those dedicated RIAA lawyers and divert them from their untiring quest to lock up impoverished stuDr. Dobb’s Journal, September 2004
dents who use university computers to share online punk/rap/hip hop/rock/crap MP3s, but a word to the wise is “Google.” Bear in mind that the software in this article is intended for personal use only or to help people publish lead sheets for which the publisher fully intends to pay the required royalties. Playing the Charts As any club date musician knows, when a band reads a tune in the concert key of B-flat (the “concert” key is the original key the tune was written in and is usually pitched to suit the average male vocal range), the piano player reads the tune in B-flat, trumpet and tenor sax players read it in C, alto and baritone saxes in G, and trombone and bass in C but from the bass rather than the treble clef. Reading a tune can require as many as four different lead sheets depending on what instruments are in the band. Then there are the chick singers who, because of their vocal range, must sing tunes in nonoriginal keys. They need even more lead sheets for each of the musicians to read. A working musician who prepares for a gig needs a way to rapidly print out readable lead sheets in multiple keys and clefs. Unless you want to carry a trunk full of fake books to the gig. Assuming the fake books are available. Which they aren’t. We used to have thick fake books with a lot of tunes, most of which nobody would ever play. When is the last time you heard a request for “The Naughty Lady of Shady Lane?” The tunes in those old fake books were in the concert key on the treble clef. Most were notated by hand, printed illegibly, distributed underground, and had lots of mistakes. All players read the same notation, and horn players transposed in real time. We shouldn’t need those books now because music notation software abounds. All we need is to get every known tune in the Western world into lead sheet format with a music notation program. Then we can print it in any key on any clef for any musician. Notation Editing Every known tune in the Western world is somewhere in a Band-In-A-Box file. And Band-In-A-Box has a notation editor feature. You enter the melody lines of a tune by using the notation editor or by playing the melody on a digital piano. Almost everybody uses the digital piano because the Band-In-A-Box notation editor is lacking, to put it mildly. Consequently, the melody notation in most of those thousands of freebie tune files reflects playing errors, player interpretations, and quantizing variations. You’d think someone who was about to release such a file for public consumption would use the notation editor 53
Figure 1: Band-In-A-Box. to fix up the notation, but they don’t because the notation editor itself discourages people from using it. But, given a melody line and the chord changes, Band-In-A-Box can print a lead sheet. Figure 2 shows the first of several measures of Band-In-A-Box’s notation for “Back Home Again In Indiana.” There’s not much wrong with the notation in Figure 2, but few of the tunes you find online are anywhere nearly as clean. And the Band-In-A-Box editor has some annoying conventions that discourages one from trying to clean up the notation. It improperly decides that a note should have a particular sharp or flat accidental, using an A-flat where the harmonic context of the tune calls for G#, for example. Worse, it will make that decision in a key where A-flat is part of the key signature and needs no accidental notation whatsoever. Or it will decide that an F ought to be an E-sharp or something silly like that. It assigns double barlines where you don’t want them. I could go on and on. These are only some of its contrary, arbitrary notational decisions that you cannot change. When you mention such deficiencies to Band-In-A-Box fans, they say, “Band-InA-Box is not supposed to be a notation editor. Use if for what it is best at.” Well,
duh. If it’s not supposed to be one, why even put the feature in? Oh, well, another problem to solve. With a MusicXML export of Band-In-ABox notation, I could import a tune’s notation into Finale and correct it to my heart’s content. But Band-In-A-Box does not support MusicXML.
Figure 2: Band-In-A-Box notation.
Finale What to do? I have all these songs with notation that cannot be easily corrected. They can, however, be exported to SMF files. I have the full-featured Finale notation editor program that supports both MusicXML import/export and SMF import/export. How do I bring the two applications together to get that large catalog of lead sheets into a manageable format? Remember as you look at these examples that my objective is to have lead sheets with melody lines and chord symbols that can be edited with a real music notation editor. And I want to do it a lot of times with a lot of tunes. Reaching that goal ought to involve as little manual effort as possible. Band-In-A-Box plays tunes based on programmed styles. These styles represent the kind of music that fits the tune. They define lines of notes for the bass to play, chords for the piano to play, rhythmic patterns for both, and, of course, drum patterns for the drums to pound out. There are hundreds of styles available in .STY files, and you can even take a tune intended to be played as a country ballad, for example, and play it as a hip hop tune. You can. I won’t. I exported an SMF file of “Back Home Again In Indiana” with the ZZJAZZ style in effect from Band-In-A-Box and im-
Figure 3: Band-In-A-Box’s MIDI output.
Figure 4: Band-In-A-Box’s MIDI output with the Export Style Imported into Finale.
54
Dr. Dobb’s Journal, September 2004
ported it into Finale to see what it looks like and how easy it would be to convert to a lead sheet. The imported notation shows some Band-In-A-Box stylemaker’s notion of what to play for each chord, not necessarily enough information to infer the actual chord symbol. Figure 3 shows the imported SMF data in notation format. As you can see, the bass line is all over the place; it does not always convey the chord’s root value, which you need to infer a lead sheet chord symbol. The piano part has altered and substitution chords and inversions selected by the style. Those piano chords are, of course, generated in a comping style with voicings, embellishments, and rhythmic interpretations that real piano players use when improvising accompaniments. Because they simulate piano accompaniments, the piano chords rarely unambiguously state the chord itself and rarely occur in a measure precisely where the chord symbol ought to be on a lead sheet. Obviously, a stylistic rendering of a tune such as Figure 3 shows is information overload on the one hand and yet insufficient information for my purposes on the other. Exporting with Style To get an SMF file with the melody and sufficient harmonic information to represent the current chord symbol at the proper place, I developed a Band-In-A-Box export style by using Band-In-A-Box’s Stylemaker process. Building the style took less than five minutes because the export style plays only one bass note, the root, in the bass part and only one chord in the piano accompaniment part for each chord change. The style plays a very simple version of the tune. Kind of like when you learned “Heart and Soul” on the piano when you were a kid. When you load the style to play with a tune that you want to export, you must allow Band-In-A-Box to reconfigure the melody to eliminate Band-In-A-Box’s socalled “swing” feeling. A lead sheet should have a plain, unadorned melody line. Musicians apply their own interpretations during performances. Figure 4 shows “Back Home Again In Indiana” with the export style exported by Band-In-A-Box to an SMF file and imported into Finale. If you used that style for playback in Band-In-A-Box, you would hear a rather boring rendition of the tune. The style is only for exporting information; it is not meant to be used for playback. Unless, of course, you really enjoy listening to boring music, in which case I can recommend some of the hip hop styles for your listening pleasure. You must apply some manual effort to get Band-In-A-Box to render the exported http://www.ddj.com
SMF file correctly and to tell Finale how to import it. You have to mute all BandIn-A-Box tracks other than piano, bass, and melody, tell Band-In-A-Box not to embellish chords during playback, tell it to delete all melody choruses other than the first one, and to export only one chorus. Then you have to tell Finale to import the SMF file by quantizing to the sixteenth note resolution and to space notes evenly in every beat to neutralize some of the tune’s original input interpretation. Because of the simplicity of the export style, the SMF file that Band-In-A-Box renders as shown in Figure 4 has the complete melody but only sufficient harmonic information (a bass note for the root and a piano chord for the chord’s other notes) to describe the chord and only when a chord changes. This is enough information to build a Finale lead sheet. If I had only a few such tunes to convert, I could use Finale’s Chord/Two-staff Analysis command to click on each chord and create a chord symbol, move the chord symbols to the melody staff, and delete the bass and piano staves. That’s simple enough, but it takes enough time for each tune that you wouldn’t want to do it for a large catalog of tunes. The melody track is important for this project. The Band-In-A-Box file needs a melody track. You can’t build a lead sheet without a melody for the lead instrument to play. If all you want is a chord sheet, Band-In-A-Box’s lead sheet notation is sufficient. Converting SMF to MusicXML This problem cries for a software solution to convert such SMF files into a MusicXML file. The solution would parse the bass and piano staves to infer chord symbols, which it adds to the remaining melody staff, and it would eliminate those staves in the converted MusicXML file. With that capability you have your choice of several notation editors that support MusicXML. At the present, music notation applications that support MusicXML are mostly those that support a software plug-in architecture, and their MusicXML plug-ins are typically developed by third parties. Most notable of those applications is Finale (http://www.finalemusic.com/), and recent versions of Finale include a light version of the MusicXML Dolet plug-in (http://store.recordare.com/doletfin1.html). Finale is the leading music notation program. and versions 2000 through 2004 support the MusicXML format. Sibelius (http://www.sibelius.com/) is another full-featured notation editor that supports MusicXML with a Dolet plug-in (http://store.recordare.com/doletsib1.html). http://www.ddj.com
Alternative Solutions My first plan was to write a Finale plugin that would process the imported score of Figure 4 and create a lead sheet. I abandoned that plan after a careful reading of the Finale plug- in API documentation. I found no way in that API to add chord symbols to a score. That doesn’t mean the API doesn’t support it; I just couldn’t find it. I did find an overly complex, underdocumented API so typical of the contemporary programming environments that nudged me into an early retirement last year.
My second plan was to write a script by using the new FinaleScript feature introduced with the latest version of Finale. I upgraded specifically to get that feature. The script would step through the score and apply the Chord/Two-staff Analysis command I mentioned earlier. I abandoned plan two when I could not find commands in the script language for working with chord symbols. My third plan was to write the program I just mentioned. I considered the knurly problems of converting MIDI data to notation, figuring out based on MIDI delta clock values and NOTE ON/NOTE OFF
Figure 5: ChordBuilder data flow diagram.
Lead Sheets and Fake Books
A
“lead sheet” is a sheet of music that includes a tune’s melody in music notation, chord symbols above the melody line, and, sometimes, the song’s lyrics below the melody line. Root around in your grandma’s piano bench and find some old sheet music for a popular song of years gone by. Sheet music has all that information plus a simple piano arrangement in bass and treble clef staves below the melody line. Many lead sheets from the old days were made by cutting and pasting sheet music to eliminate the piano arrangement staves. Professional musicians use lead sheets to play improvised interpretations of songs in their own performance styles. The melody line tells them what the melody is. The chord symbols tell a piano player what chords to play along with the tune. Chord symbols also tell lead instrument players the harmonic context to apply to improvised solos. Whereas most sheet music arrangements are three or more pages of musical notation, the typical song
Dr. Dobb’s Journal, September 2004
can be represented on a lead sheet with only one page. Lead sheets are essential tools for professional musicians who do not always play from fully scored arrangements. A “fake book” is simply a bound volume of lead sheets, usually in alphabetical order by song title. Fake books are usually published as collection of tunes that fit a theme. Jazz tunes, Latin tunes, old standards, and so on. The name reflects the old idea that musicians who read lead sheets or who play “by ear” are “faking” it rather than reading formally arranged parts. The phrase is thought to have originated as a putdown by classically trained musicians but, with popular usage, has become a respectable reference to the ability to play without needing full notation. A popular fake book compiled by a Berkeley teacher for jazz students is called the “Real Book,” a whimsical nod to the old idiom. —A.S.
55
Figure 6: Song imported from ChordBuilder’s MusicXML output. Step 1 2 3 4 5 6 7 8
Procedure Band-In-A-Box loads the song file and associates it with the export style. Band-In-A-Box exports the tune as a Standard Midi Format (SMF) file. Finale imports the SMF file and converts it to notation. Finale exports the notation in MusicXML format. ChordBuilder reads the MusicXML file that Finale exported. ChordBuilder converts the data to lead sheet format and writes a new MusicXML file. Finale imports the MusicXML file that ChordBuilder created and converts it to notation. Finale saves the notation in its own file format.
Table 1: ChordBuilder procedures. events what duration to assign a note, whether it is tied to another note, whether it ought to be dotted, and so on. It isn’t as easy as it looks, and all the notation programs have already done it. Why not leverage their work? ChordBuilder And so I arrived at plan four, the program that this article describes. The program, with the unimaginative name, ChordBuilder, converts to MusicXML. I am pretty good at writing text-parsing programs, having written several programming language and script interpreters, which is why I figured I could get this plan working before any of the others. To use ChordBuilder, follow Figure 5, a data flow diagram. You begin with a Band-In-A-Box file that contains the chords and melody of a tune you wish to export to a music notation program. That file is represented by the icon in the upper left corner of Figure 5. You wish to end with a lead sheet notation file in Finale’s notation file format. That file is represented by the icon in the lower right corner of Figure 5. Table 1 lists the steps in Figure 5. Indiana.xml (available electronically; see “Resource Center,” page 5) is built from the first five measures of “Back Home Again In Indiana.” As you can see, MusicXML is a verbose language because it uses XML markup tags to define even the smallest entity of music notation. ChordBuilder reads the exported score, parses the chord data from the bass and piano parts, inserts chord symbol markup into the melody part, and writes a new version of the score in MusicXML format with only the melody part and its new chord symbols. Indiana[1].xml (available 56
electronically) is the output from ChordBuilder. You import the output from ChordBuilder into Finale to view, edit, and print a lead sheet. Figure 6 shows “Back Home Again In Indiana” in its lead sheet format in Finale. The MusicXML Library I built ChordBuilder from the ground up by designing and coding all its MusicXML classes. I did not have to do it that way. The MusicXML Library (http://libmusicxml .sourceforge.net/) is an open-source project that supports a C++ class library for processing MusicXML data. If I was writing a complex MusicXML program, I might get on board with the MusicXML Library project, and I’d have to ensure that everything complied with the GNU Library or Lesser General Public License. ChordBuilder, however, is a relatively simple text parsing program, and it just seemed less trouble to design a few simple classes than to climb someone else’s learning curve and buy into their rules. Most of my code in the DDJ “C Programming” column was open source, but it wasn’t Open Source, and I’m not ready to jump into that fray from the comforts of a mostly noncoding retirement. About the Program ChordBuilder is a command-line program with only one command-line parameter. You must specify the name and not the extension of the MusicXML file that ChordBuilder reads. The extension has to be .XMl. ChordBuilder builds a file with the same name and a [1] suffix. So indiana.xml becomes indiana[1].xml. ChordBuilder also reads the SMF file that Band-In-A-Box exported. ChordBuilder exDr. Dobb’s Journal, September 2004
pects that file to have the same filename. In this example, that would be indiana.mid. ChordBuilder has to read that file to extract the tune’s title. For some reason, although virtually all SMF-processing programs reserve the META_SEQTRKNAME event in track 1 for the song title, the Finale SMF import feature does not import that title, so it does not get exported to the first MusicXML file. I wrote the program in Standard C++ with no platform dependencies. There is a version of Band-In-A-Box for the Mac and, if music notation programs for the Mac support MusicXML import/export, perhaps you can port ChordBuilder to the Mac. I compiled and tested ChordBuilder with Microsoft Visual C++ 6.0 and MinGW GCC 3.2.1 running under the Quincy 2002 IDE (http://www.alstevens .com/quincy.html). ChordBuilder’s source code is available electronically. The download includes the export.sty file that implements the export style for Band-In-A-Box and a Quincy project file. You can download an executable version of the program along with the style file and instructions for using the package with Band-In-A-Box and Finale on the Windows platform. The download file is found at http://www.alstevens.com/midifitz/. The program’s organization is a straightforward C++ class design. Two classes, musicXMLin and musicXMLout handle reading and writing the MusicXML files. Several structures defined in the musicdata.h source file organize music notation data for internal processing. The ChordBuilder class encapsulates the parsing of MusicXML input data and the generation of MusicXML output data. To extract the song title from an SMF file, I used a MIDIFile class that I developed several years ago and published in the “C Programming” column. All console output is isolated in a chordbuildermessage function in main.cpp. If someone wants to port ChordBuilder to a GUI, this function is where you would capture exception and warning messages from the program. The messages themselves have sufficient unique text data that a program can determine what to do with them. Although the program is reasonably portable among Standard C++-compliant compilers, portability was not my motivation for writing ChordBuilder as a command-line program in Standard C++. I did it because that’s how I like to write programs. I never much liked writing GUI programs for any platform, especially with MFC, and I’m retired now and can write programs any way I want. DDJ http://www.ddj.com
WINDOWS/.NET DEVELOPER
C#, COM Objects, & Interop Services Developing ActiveX controls and COM objects in C# SHEHRZAD QURESHI
T
he company I work for sells microfluidic instrumentation intended for use in pharmaceutical and biotechnology laboratories. Typically, there are a large number of machines and instruments that reside in these labs, with differing models from various companies, each performing a different set of functions. Integrating these disparate functionalities into a cohesive process and methodology falls under the banner of what is known as “lab automation.” As you’d expect, each machine or instrument runs some form of control software, and the industry has gradually settled on ActiveX controls as the preferred component technology that serves as the glue that connects all the pieces together. The dominant software architecture is a master scheduling application that is in control of the entire process, which invokes methods and gets/sets properties on the individual ActiveX “driver” components. This master application more often than not takes the form of a Visual Basic 6 (VB6) executable, as in Figure 1. As the industry has settled on the VB6/ActiveX paradigm of distributed lab automation, it is simply not feasible from a business point of view to completely jettison the COM- centric paradigm of component-oriented development and embrace .NET controls lock, stock, and barrel. Since many of our customers use VB6, Shehrzad is an engineer at Labcyte Inc. and author of the forthcoming book Embedded Image Processing on the C6000 DSP (Kluwer Academic Publishing, 2005). He can be contacted at shehrzad_q @hotmail.com. http://www.ddj.com
we must componentize our control software such that a VB6 client can talk to it. After all, the marketing department isn’t going to be pleased if they find out a potential sale is lost because engineering has completely transitioned to .NET and, thus, a potential VB6 customer has no means of integrating our instrumentation software into their production line. Fortunately, Microsoft has given some thought to backwards compatibility, and through the machinations of a .NET technology called “.NET Interop,” it is indeed possible — and even advantageous — to develop COM objects in .NET’s paradigmatic language, C#. Much of our control software requires unimpeded access to the underlying hardware. As such, our existing ActiveX controls are implemented using (unmanaged) C++. As anyone who has coded ActiveX controls using the Active Template Library (ATL) knows, developing these controls using the wizards provided by Visual Studio can be a laborious and arduous task, involving the modification of a multitude of source files, such as .cpp, .h, and Interface Description Language (IDL) files just to implement simple controls, like the one that I describe here. I wanted to leverage the power of .NET in our software development strategy, but until recently lacked the expertise (and, more importantly, the impetus) to make that jump. That impetus was handed to me when I was faced with the daunting prospect of developing several lab instrumentation ActiveX controls, on an extremely tight deadline, for a line of products my company had acquired through a merger. I then explored the possibility of using .NET interop to implement ActiveX controls in C# as a means of expediting this development. What I found was somewhat of a revelation to me — that, in fact, developing ActiveX controls and COM objects in general is straightforward in C#, and by properly utilizing .NET Interop, you can develop ActiveX controls that can be used by a VB6 client or any other COM-compatible language, provided the .NET Framework is available on the client machine. Moreover, in my estimation, the removal of ATL style “wizard-driven” coding (because hand coding such an implementation in C++ with-
out the wizard is extremely time consuming to say the least), leads to an advantage when developing ActiveX controls in C#. The code is far more concise and easier to digest and maintain. In this article, I first present an ActiveX control and VB6 client application, then examine the process of implementing this control using both C++ (ATL7) and C# in Visual Studio .NET 2003.
“C# lets you leverage the power of the .NET Framework and provides a bridge to the future” The ActiveX Control and Sample Client Application The COM object I use here is about as simple as I could conceive. It publishes a single method, Add( ), which accepts two double input parameters and returns the sum of those two inputs. The sample client application is a VB6 executable and is equally simple (Figure 2). The form has text edit boxes corresponding to the two input parameters and the result of the Add( ) method is placed in another edit box when the button in the form is clicked. ActiveX Implementation in C++ This particular implementation uses Unmanaged C++ and ATL Version 7. An alternative method of crafting ActiveX controls is to utilize the Microsoft Foundation Classes (MFC) framework, but Microsoft recommends ATL if size and speed are the main criteria. The ActiveX controls discussed here are of the in-process variety (as opposed to out-of-process), meaning that they reside within DLLs. The first step in creating the ActiveX control is to create a new ATL project in
Dr. Dobb’s Journal Windows/.NET Supplement, September 2004
S1
Visual Studio (Figure 3). The next step is to use the wizard to provide the skeleton for the control. Switch to the class view of the project (View|Class View), and add an “ATL Simple Object” class to the project (Project|Add Class); see Figure 4. After selecting the option to add an ATL simple object to the project, the wizard VB6 Application
Instrument 1 ActiveX Instrument 1
Instrument 2 ActiveX Instrument 2
Instrument 3 ActiveX Instrument 3
Figure 1: Typical lab automation software architecture.
Figure 2: Sample VB6 client application.
S2
prompts for a variety of options. Provide a class name and use the default options. The act of adding this ATL object to the project inserts a rather copious amount of code into the numerous C++ source files and generates Globally Unique Identifiers (GUID) for the interface and object classes. At this point, the wizard has constructed a fully functional ActiveX control (admittedly a useless one that has neither methods nor properties). Now you can insert the Add( ) method into the control. Again from the class view of the solution, select the interface class whose name begins with a capital “I” and is followed by whatever object name you selected earlier when you inserted the ATL object. Bring up the Add Method Wizard via Project|Add Method, and you can configure the signature of the new method. Figure 5 shows the contents
Figure 3: Creating a new ATL project.
of the Add Method Wizard after creating the signature for the Add( ) method. The class method that implements Add( ) returns an HRESULT and C++ clients, when invoking any COM method, can expect an HRESULT return value and should use the FAILED macro to test for success or failure. Clients coded in higher level languages, like Visual Basic or C#, can expect the ActiveX methods to return the parameter decorated with the [out,retval ] attribute (which, in this case, is a double and is the summation of the two inputs). The act of using the Add Method Wizard hides much of the gory details of implementing ActiveX methods in C++ from you. In essence, what is happening is that the wizard adds the method signature to the IDL portion of the project, adds a method declaration to the interface class, adds a similar method declaration to the implementation class, and finally adds a shell of the new method to the implementation class’s .cpp file, which of course is left for you to fill in. When all is said and done, the C++ project for this trivial ActiveX control consists of 10 source files, not including transient source files that the MIDL compiler generates. Visual Basic 6 Client (C++) The VB6 client application has already been introduced. After creating the form
Dr. Dobb’s Journal Windows/.NET Supplement, September 2004
http://www.ddj.com
in the VB UI designer, a reference to the new component must be added to the Visual Basic project. Add a reference to the control from within the Visual Basic IDE via Project|Reference, then browse to the location of the DLL that contains the new ActiveX control. Double clicking on the ActiveX DLL inserts the reference into the Visual Basic project. Listing One implements this simple application. What you have done to this point is nothing new, rather it could easily have been accomplished in almost the same way using Visual Studio 6 in 1999. What is new is to implement this same control using C# and .NET Interop. ActiveX Implementation in C# Using a COM object from .NET is straightforward. Once you add a reference to the COM object, Visual Studio automatically generates a Runtime-Callable Wrapper (RCW), which provides a thin veneer around the COM object and performs data-specific marshaling to/from the legacy COM domain and new .NET domain. What I implement is the converse of this, and the task is made substantially easier as the ActiveX control described here does not have a GUI aspect to it. If that were the case, then you could not use the steps described here as the MSDN documenta-
http://www.ddj.com
tion emphatically states that you “cannot register Windows Forms controls as ActiveX controls or create them using CoCreateInstance.” In contrast to the C++ version, the C# ActiveX implementation is much less wizard-driven. However, the result is far more concise, and easier to read and understand. The initial step is to create a C# class library project by selecting File|New|Project, choosing Visual C# Projects and the Class Library Template (see Figure 6). There is no ActiveX template and you’ll be creating the control from scratch. Listing Two is the C# version of the control and is probably a tenth of the size
of the C++ version in terms of lines of code. The C++ ATL object was named SimpleActiveX, so the interface class is ISimpleActiveX. The corresponding C# interface class is named ICSharp_ActiveX and decorated with two attributes. .NET attributes let you extend the metadata that
Figure 5: Inserting the Add() method into the ATL object.
Figure 4: Adding an ATL object to a C++ project.
Figure 6: Creating a C# class library.
Dr. Dobb’s Journal Windows/.NET Supplement, September 2004
S3
accompanies an assembly by annotating the source code with relevant fields. In this case, the first attribute is the GUID associated with the interface object. .NET Interop and, of course, the COM runtime need these GUIDs to perform their magic. The ATL wizard took care of generating GUIDs (and sprinkling them throughout various source files), while here I manually generate a GUID and link it to the event class using an attribute. GUIDs can be generated using either the command-line tool guidgen.exe or via the Create GUID item under the Visual Studio .NET Tools menu. The second attribute decorates the Add( ) interface method and is reminiscent of the IDL code inserted into the C++ project by the ATL wizard. In fact, it is merely embedded IDL and the same syntax is followed (for example, a help string could be associated with the method via the IDL helpstring keyword). Likewise, the class object that implements this interface requires a GUID and is decorated with a GUID attribute. The ClassInterface attribute controls whether a separate class interface is generated for the attributed class (in this case CSharp_ ActiveX_Class). While this option can be useful for debugging, the recommended value for this attribute is ClassInterfaceType.None, as it mitigates versioning problems. The Add( ) method is implemented and, except for a few postbuild steps, the COM object has been fully implemented. All of the attributes used in this example reside in the System.Runtime.InteropServices namespace. By adding the using declaration at the top of the source file, you don’t need to use the fully qualified name for all of the attributes. Before this assembly can be used as a COM object into a pre-.NET language, the object must be registered for COM Interop (Project|Properties, select Build under the Configuration folder and set Register for COM Interop to True). What this does is generate a COM-callable wrapper (CCW), which wraps the .NET class library we
have just implemented. The COM client (in this case, the sample VB6 application) interacts with the CCW, which in turn forwards method invocations and performs data marshaling between itself and the underlying .NET class library. For the CCW to work correctly, the .NET assembly must be strongly named or the .NET runtime can’t identify it. To sign the assembly with a strong name, I use the sn.exe utility, which is in the .NET SDK. By adding a postbuild step to the C# project, you can seamlessly integrate this postprocessing into the build process. The steps required to accomplish this are: 1. Open the AssemblyInfo.cs file, which consists of a variety of attributes, and update the AssemblyKeyFile attribute with a name referencing a key file. In this case, the attribute was updated as: [assembly:AssemblyKeyFile("..\\..\\bin\\ Debug\\CSharp_ActiveX.snk")]. 2. Add a postbuild event by opening the project’s properties, selecting the Build Events item under the Common Properties folder. The command to run is the sn.exe utility and pass it the key file specified in Step 1: sn -k CSharp_ActiveX.snk. 3. The final build step is to register the .NET assembly so that it can be used with COM. The C++ ATL project uses the regsvr32.exe utility to manipulate the registry such that the COM runtime is able to instantiate the object at runtime by running regsvr32.exe as a postbuild step. In a similar fashion, I use the .NET SDK regasm.exe utility to perform the equivalent steps, which entails reading the metadata accompanying the assembly and making the necessary registry entries such that CoCreateInstance( ) and their ilk continue to function correctly. To add this to the project, the command “regasm CSharp_ActiveX.dll” is added as a postbuild step. This completes the C# implementation of the ActiveX control. Upon building the project, the assembly is compiled,
Listing One
Private Sub Form_Load() ' instantiate control Set CppControl = New Cpp_ActiveX.CSimpleActiveX End Sub
Listing Two
S4
Visual Basic 6 Client (C#) The VB6 client code (available electronically; see “Resource Center,” page 5) is virtually identical to the previous VB6 listing, sans the variable names pertaining to the COM object being changed. The only other change concerns how to import the reference to the COM object. Recall that with the C++ ActiveX control, we added a reference to the ActiveX control by pointing Visual Basic to the DLL that contained the control. For the C# ActiveX control, when adding the reference, import the type library (.tlb) file instead, which is one of the outputs of the C# build process. Conclusion The advantages of using C# to develop COM objects are two-fold. C# lets you leverage the power of the .NET Framework and provides a bridge between older technology and the future. For example, in C/C++, dealing with BSTRs and the like can be cumbersome, even though ATL helps alleviate some of the issues. Many of these issues melt away in the C# domain. Also, the source code is easier to read and digest than the corresponding C++ version. Wizard-generated code, with all of its mysterious macros, transient and temporary files, registration scripts and such lend an air of black magic to the code. The C# implementation is cleaner, and by extension easier to maintain. One downside to using .NET interop is the overhead involved with data marshaling between the managed and unmanaged domains. However, through careful design, this performance hit can be mitigated. In practice, I have found the benefits to the implementation strategy described in this article to be significant — as long as the component is not shuttling voluminous amounts of data, the gains in readability should offset any potential performance issues. DDJ
/// <summary> /// COM object Interface /// [Guid("3D6E75CD-C44F-46fd-9723-F833B366129F")] public interface ICSharp_ActiveX { [DispId(1)] double Add(double a, double b); } /// <summary> /// The object that implements the above COM interface /// [ Guid("1DBB9AEB-9333-408f-925C-4DE11599DEEF"), ClassInterface(ClassInterfaceType.None) ] public class CSharp_ActiveX_Class : ICSharp_ActiveX { public double Add(double a, double b) { return a+b; } }
‘Declaration of the ActiveX control Dim CppControl As Cpp_ActiveX.CSimpleActiveX Private Sub btnAdd_Click() ' grab data from input text boxes Dim a, b As Integer a = CDbl(txtA.Text) b = CDbl(txtB.Text) txtResult.Text = CStr(CppControl.Add(a, b)) End Sub
using System; using System.Runtime.InteropServices; // for DispId & Guid attributes namespace CSharp_ActiveX {
built, and registered such that it can now be used as a COM object.
}
DDJ
Dr. Dobb’s Journal Windows/.NET Supplement, September 2004
http://www.ddj.com
WINDOWS/.NET DEVELOPER
Improving .NET Events Strategies for generating events RICHARD GRIMES
E
vents are a useful paradigm: Objects can notify other objects when some event occurs — perhaps a change in state or maybe an error condition has occurred. Some other code — handler code — can register its interest to be notified that this condition has occurred. When the event is raised, the handler code is called. But while .NET provides mechanisms for event-based code, there are some shortcomings in this mechanism. Luckily, .NET provides the facilities for writing your own event mechanism, thereby making it possible to address these problems.
Delegates As an example of generating and handling events, consider the user-interface events provided by the Control class and derived classes for handling Windows messages. In essence, the Control class contains a method equivalent to a Win32 windows procedure that accepts messages sent to the Win32 window attached to the Control object and handles each message by raising an appropriate event. A method written to handle a Windows message adds itself to the list of handler code. When the Windows message is sent to the window, the event is raised by calling all of the methods added to the handler list for the event. To enable this, .NET provides two facilities: a mechanism to inRichard is the author of Programming with Managed Extensions for Microsoft Visual C++ .NET 2003 (Microsoft Press, 2003). He can be contacted at richard@ richardgrimes.com. S6
voke methods called “delegates,” and a formalized mechanism for a class to identify the events it can raise. Of all the .NET languages, only managed C++ lets you create C-style function pointers, but even C++ cannot create a function pointer to a .NET method. The reason is that .NET methods have a special calling convention called _ _clrcall that is not a supported calling convention. So managed C++, like all other .NET languages, must use a delegate to invoke a method. A delegate is a compiler-generated class that derives from the MulticastDelegate class. Listing One shows how to declare a delegate called OneParam that is used to call methods that take a single parameter, a 32-bit integer, and return no values. As you can see, the delegate instance is initialized with a pointer to a method that has the correct prototype. This is the only time that you can access a pointer to a managed method. The compiler performs type checking when it uses this pointer to initialize the delegate through its constructor and so it ensures that a delegate will only be used to invoke a method with the correct parameters. This method can be an instance method, static method, and it can be a member of any class. You can initialize a delegate with a private method of one class, in another method in that class, and pass the delegate to another class where it can still be invoked. Furthermore, you can pass the delegate to unmanaged code and .NET provides the thunking code to allow the unmanaged function to invoke the managed method. Clearly, delegates are clever objects. If you take a look at the IL generated for this, you see that the constructor of the delegate class looks like this: .method public hidebysig specialname rtspecialname instance void .ctor(object 'object', native int 'method') runtime managed { } // end of method Del::.ctor
The runtime-managed modifier indicates that no IL exists for the constructor and that the runtime provides the code. The parameters are interesting: The first parameter indicates the object that contains the method that will be invoked, and C# conveniently infers this from the parameter that you pass. The second parameter is typeless and is effectively a .NET equivalent of a void∗ pointer. But again,
“There is no concept of invoking the delegates in the MulticastDelegate object in parallel”
the compiler performs type checks to ensure that the method used to initialize a delegate is of the right type. This information is stored in the delegate object and can be accessed through the Target and Method properties. Each delegate is multicast, which means that you can combine two delegate objects to create a third object that contains the first two. To do this, the MulticastDelegate class has a linked list, and when you combine two delegates, their linked lists are chained. Delegates can be invoked synchronously (Invoke, which C# calls for you when you invoke the delegate as if it is a method, as in Listing One) or they can be invoked asynchronously. However, “asynchronously” means with respect
Dr. Dobb’s Journal Windows/.NET Supplement, September 2004
http://www.ddj.com
to the code that invokes the delegate and not with respect to the individual delegates within the multicast delegate. There is no concept of invoking the delegates in the MulticastDelegate object in parallel. The default invocation mechanism will walk the linked list and invoke each delegate in turn, synchronously, waiting for an invocation to complete before invoking the next delegate in the list. If you invoke a multicast delegate asynchronously, all that happens is that this invocation is performed on another thread. Also, note that if one delegate in a multicast throws an exception when it is invoked, then the entire invocation of the multicast delegate is aborted. This is a problem when you consider events (which I will do next) because an event invites anyone to add a delegate to provide a handler for the event. When the event delegate is invoked, rogue code provided by some third party could throw an exception. If this happens, no other handlers for the event will be called. If the exception is still left uncaught, then the object could be killed. This highlights the tight coupling between .NET classes that raise events and the classes that handle them. Events The support for events comes in two forms. First, there must be a delegate field for each event and these hold the delegates that will be invoked when the event is raised. Second, there must be a formalized mechanism to allow code outside of the class to add delegates initialized with handler methods to this event delegate. .NET compilers provide their own code to do this and, with C#, this is done with the event keyword as shown in Listing Two. In this code, I have used a framework-provided delegate called EventHandler, but events can be defined for any delegate type. The event keyword adds a private delegate field and it adds public methods to add a delegate and remove a delegate from this delegate field. In addition, so that other code can tell that this class can generate events, the compiler adds some .NET metadata to indicate methods that are used to add and remove delegates from the event. The C# compiler allows you to call these event methods using the += and -= operators. When the event is raised, the delegate is invoked, and the default mechanism in C#, and the other .NET languages, is to invoke the delegate synchronously on the thread where the event was raised. Again, it is important to point out that all of the delegates in the multicast delegate are invoked serially. For Windows Forms classes this is a useful side effect because GUI code should only be called on the process’s GUI thread and the event handlers http://www.ddj.com
for controls usually access the control or other controls on the form. However, for nonGUI code, you may find this a restriction, particularly if the delegates are for methods on remote objects or methods that take a long time to perform. The Control class defines 57 events, indicating that, by using the default event mechanism, a Control object would have 57 delegate fields even though you will use perhaps only a handful of these. This clearly represents a waste of memory, so the designers of the Control class override the default action of the event keyword and provide a more efficient mechanism to store event delegates. The Control class derives from the Component class, which has a property, Events. This property is a collection of delegates of type EventHandlerList. The entries in this collection are identified by “key” objects; each key identifies a delegate for a specific event, and each of these delegates can contain one or more event handler delegates. The keys are objects, not a simple integer index; indeed, the actual value of the object is immaterial because the collection uses the identity of the object as the key. On first sight, this does not appear to solve the space issue because it implies that the Control class will need to maintain a key object for each event type. However, the key object can be the same for each event of the same type for each Control object, so the Control class provides these key objects as static (noninstance) objects. The space expense is paid by the class rather than the individual Control objects. C# lets you provide your own event methods and Listing Three shows how you can use these methods with an EventHandlerList collection. Attempts to Overcome the Issues Tightly coupled events are useful, particularly when events are handled within the application domain because they are lightweight. However, even though the event-raising code and handler code are tightly coupled, the providers of each code are likely to be completely unconnected. This means that even if your code is completely error free, if you invoke a delegate provided by someone else, their code can kill your object. There are a couple of ways to mitigate this risk. The first is the simplest: Use the [OneWay] attribute on the method that a delegate will invoke. The [OneWay] attribute adds metadata to the method, which is read by the .NET runtime when it invokes the method. This attribute indicates that the method does not return any values, so invocation code does not make any effort to retrieve return values. An exception is treated as an extra return parameter of the method, so
Dr. Dobb’s Journal Windows/.NET Supplement, September 2004
S7
[OneWay] tells the runtime to ignore the exception. However, this attribute is not suitable in all situations. The attribute only works when a call is made across a context boundary; hence, .NET remoting is used and can intercept method calls. In many cases, your code will run in the default context, so no interception occurs and the [OneWay] attribute will be ignored. Another problem is that the attribute is attached to the handler method by the author of that method; in other words, the person who writes the code that you regard as being a suspect has to mark his code to tell the runtime to ignore any exceptions it may throw. This is unlikely to happen. Another way to protect your code from errant event handlers is to invoke the event delegate explicitly. To do this you need to get access to the linked list of delegates in a multicast delegate. The MultiCastDelegate class does this through the GetInvocationList method. This returns an array of Delegate references. Listing Four shows how to use this to invoke the delegates within a try block to catch any exceptions that are thrown. The example contains no code in the catch handler; hence, the exceptions are ignored. Ignoring exceptions is not necessarily the best action — the client code may want to report that one of the delegates has generated an exception; however, to do this there must be some mechanism to collect all the exceptions that have been thrown and return them back to the client. To do this, I define a new exception class (see Listing Five) that maintains a list of ExceptionData objects that contain information about the exception and the delegate that generated the exception. This exception class has a method to add new exception information to the MultiException object, and to determine if exception information has been added to the object. The class also overrides the ToString method to add information about each exception in the collection. Finally, I have also provided a property that gives access to this collection of ExceptionData objects. Listing Six shows how this exception class is used. When a delegate throws an ex-
ception, the exception and delegate are added to the MultiException object in the exception handler but the exception is not rethrown. I think that this action is reasonable because the delegates are not related to each other, so an exceptional condition in one handler does not necessarily reflect that all the handlers are invalid, and it certainly does not indicate that the code invoking the delegates has some invalid state. When all delegates have been invoked, the code checks the MultiException object to see if any exceptions had been thrown and, if so, the MultiException object that contains those exceptions is thrown. Note that this exception class works fine if the exception is rethrown within the same context. In a future article, I’ll explain why the exception cannot be thrown across a context boundary and how to fix this problem by providing serialization code. The client that called the event- generating method can catch the exception and use the information in it. For example, Listing Seven shows exception handler code that prints out information about the MultiException that was caught and the exceptions thrown by the delegates. This code then accesses the errant delegate via the Exceptions collection and removes this delegate from the event. In effect, this is selfhealing code because once an individual delegate has thrown, action is taken to ensure that the delegate is never called again. Finally, it is worth pointing out that you cannot use this technique with Control objects. The methods on the Control class have names beginning with “On” (for example, OnClick for the Click event). These methods are virtual so you can provide an override, which will be called by the Control object’s windows procedure. However, the static key objects, used to access the event handlers in the Events property, are private and hence unavailable to your derived class. AppEvent.cs (available electronically; see “Resource Center,” page 5) is an event mechanism that catches delegate-thrown events such as those described here. DDJ
More .NET on DDJ.com ASP.NET2theMax: Textbox AutoEncoding Improves Security The standard TextBox control does not verify input text by default, and this creates a security hole that a cross-site scripting attack can exploit. Here’s a custom TextBox control that will “autoencode” text input. Richard Grimes on .NET: Notification and Delegate Lifetimes COM and .NET use two different approaches to event handlers and object lifetime management. Not surprisingly, the .NET approach is simpler, but does it provide all of the features you need? Available online at http://www.ddj.com/topics/dotnet/.
S8
Dr. Dobb’s Journal Windows/.NET Supplement, September 2004
http://www.ddj.com
Listing One
{ EventSomethingHappened = new object(); } // Declare the event and the custom event methods public event EventHandler SomethingHappened { add { events.AddHandler( EventSomethingHappened, value); } remove { events.RemoveHandler( EventSomethingHappened, value); } } // Helper method to get the event delegate and // invoke it protected void RaiseSomethingHappened(EventArgs args) { EventHandler d; d = (EventHandler)events[EventSomethingHappened]; d(this, args); } public void DoSomething() { // Code... RaiseSomethingHappened(new EventArgs()); }
public class Test { // Tell the compiler to generate a delegate class delegate void OneParam(int x); void Proc(int x) {/* code */} public void InvokeCode() { // Create a delegate object and initialize with a method pointer OneParam d = new OneParam(Proc); // Invoke the delegate d(42); } }
Listing Two public class Test { // Indicate that the class can generate the event public event EventHandler SomethingHappened; public void DoSomething() { // Code here... // Now raise the event if (SomethingHappened != null) SomethingHappened(this, new EventArgs()); } } public class UseCode { void InformMe(object sender, EventArgs args) { /* handle event */ } public void UseTestObject() { Test t = new Test(); t.SomethingHappened += new EventHandler(InformMe); t.DoSomething(); } }
Listing Three public class Test { EventHandlerList events = new EventHandlerList(); // This is the 'key' object for our event static object EventSomethingHappened; static Test()
http://www.ddj.com
}
Listing Four protected void RaiseSomethingHappened(EventArgs args) { EventHandler d ; d = (EventHandler)events[EventSomethingHappened]; Delegate[] dels = d.GetInvocationList(); // Invoke each delegate individually foreach(Delegate del in dels) { EventHandler eh = del as EventHandler; try { eh(this, args); } catch(Exception){} } }
Dr. Dobb’s Journal Windows/.NET Supplement, September 2004
(continued on page S10)
S9
(continued from page S9) Listing Five class MultiException : Exception { // Helper class to hold info about the exception public struct ExceptionData { public Exception theException; public Delegate theDelegate; public ExceptionData(Exception e, Delegate d) { theException = e; theDelegate = d; } } // Field used to collate exception data ArrayList exceptions = new ArrayList(); public MultiException(string str) : base(str) {} public void Add(Exception e, Delegate d) { exceptions.Add(new ExceptionData(e, d)); } public bool HasExceptions { get {return (exceptions.Count != 0);} } public ExceptionData[] Exceptions { get { return (ExceptionData[])exceptions.ToArray( typeof(ExceptionData)); } } // Return information about all the exceptions public override string ToString() { if (exceptions == null) return base.ToString(); StringBuilder sb = new StringBuilder(); sb.Append(base.ToString()); sb.Append(Environment.NewLine); for (int idx = 0; idx < exceptions.Count; idx++) { ExceptionData ed; ed = (ExceptionData)exceptions[idx]; sb.Append(String.Format( "{0} on {1} threw an exception:", ed.theDelegate.Method.Name, ed.theDelegate.Target.GetType())); sb.Append(Environment.NewLine); sb.Append(ed.theException.ToString()); sb.Append(Environment.NewLine); } return sb.ToString(); } }
Listing Six protected void RaiseSomethingHappened(EventArgs args) { MultiException exceptions = new MultiException( "delegate(s) thrown exceptions"); EventHandler d; d = (EventHandler)events[EventSomethingHappened]; Delegate[] dels = d.GetInvocationList(); foreach(Delegate del in dels) { EventHandler eh = del as EventHandler; try { eh(this, args); } catch(Exception e) { exceptions.Add(e, eh); } } if (exceptions.HasExceptions) throw exceptions; }
Listing Seven try { RaiseSomethingHappened(new EventArgs()); } catch (MultiException me) { Console.WriteLine("\n{0}", me.ToString()); foreach(MultiException.ExceptionData ed in me.Exceptions) { this.SomethingHappened -= (EventHandler)ed.theDelegate; } }
DDJ S10
Dr. Dobb’s Journal Windows/.NET Supplement, September 2004
http://www.ddj.com
WINDOWS/.NET DEVELOPER
Building MFC Dialogs at Runtime Sharing classes leads to less clutter ADRIAN HILL
D
ialogs are generally defined for GUIbased applications using a visual tool to design the appearance of the dialog, and an automatic code generator to generate most of the source code associated with the dialog. Apple’s Macintosh and Windows-based PCs both use this approach. For MFC-based Windows applications, you can use Visual Studio’s resource editor to position the dialog elements, then use ClassWizard to generate the corresponding C++ class code. The resource editor saves the layout information for all the application’s dialogs in a single resource (.rc) file, and saves the corresponding resource ID symbol definitions in the resource.h file. ClassWizard generally writes the C++ class code for each dialog to a separate .cpp and .h file. While this process is fairly easy to learn and use, the resulting code has several limitations, including: • The class encapsulating the dialog cannot be reused in other applications by simply including the dialog’s .cpp and .h files in the new application’s build. The dialog resources (in the application’s .rc and resource.h files) must also be copied, either manually or using the resource editor, from one application to Adrian is a founder of InPhase Technologies, a Lucent Technologies/Bell Labs spinoff that is developing holographic data storage products. You can contact him at adrianhill@ inphase-tech.com, with subject “DDJ.” http://www.ddj.com
the other. This process is tedious, particularly if the class is to be reused in several applications. If the dialog is subsequently changed and the changes are required in all applications, the modifications have to be made to each application in turn. Overall, this procedure is more like the bad old days of C programming than an object- oriented method. • The process of copying and pasting resources from one application to another can lead to an insidious bug. Since the resource IDs are effectively at global scope (they are all #defined in resource.h), clashes are possible if resource IDs copied from one application already exist in the destination application. The application with the resource ID conflict may crash, with no obvious reason. (This has only bitten me once, but it hurt.) • Only one developer can edit an application’s resource (.rc) file at a time. This limitation exists because the resource editor updates several #defines in resource.h (for example, _APS_NEXT_RESOURCE_VALUE and _APS_NEXT_CONTROL_VALUE) each time new resources are added. If two developers try to check in updated .rc and resource.h files to a source-code version-control system (such as CVS), the process fails because of conflicts in the values of these symbols. By contrast, multiple developers can work on different parts of a .cpp file concurrently, without clashing. • MFC’s ClassWizard will generate a new class for each dialog, no matter how simple the dialog is, or how similar the dialog is to other dialogs in the application. This produces “class clutter” that, though not serious, serves to muddy the class view in a large project and adds to code bloat. In this article I provide a class for defining MFC-based dialogs, which overcomes
these limitations by not using resources in the application’s .rc and resource.h files. (The complete source code and related files for the class are available electronically; see “Resource Center,” page 5.) Instead, dialog box templates are built in memory at runtime. The details are encapsulated in class Dynamic_dialog, together with its helper class, Dialog_item.
“An MFC-based dialog is constructed by creating a dialog box template that describes the dialog and its controls” You define a dialog in C++ source code, either using an instance of Dynamic_dialog directly or using a class derived from it when the dialog requires additional message map functions. To leverage as much existing MFC functionality as possible, Dynamic_dialog is derived from MFC’s CDialog class. In particular, it was simple to make all the familiar data exchange and data validation (DDX/DDV) functionality available with very little additional code. Example 1 is the code for a simple dialog with two edit controls, and the OK and Cancel buttons (Figure 1). Even without seeing the class declaration, the intent of the code is clear. Inside the Dynamic_dialog Class In general, an MFC-based dialog is constructed by creating a dialog box template that describes the dialog and its controls, such as edit boxes and pushbuttons. The
Dr. Dobb’s Journal Windows/.NET Supplement, September 2004
S11
dialog box template is then used to construct an instance of a CDialog-derived class (MFC provides class CDialog to encapsulate the basic functionality of a dialog box). When you use the resource editor to define a dialog, the dialog box template is compiled from the contents of the .rc file as part of the build process. In the Dynamic_dialog class, rather than building the templates from the resource file at compile time, I generate the dialog box template in memory at runtime. This approach is mentioned, but not described, in Jeff Prosise’s Programming Windows with MFC, Second Edition (Microsoft Press, 1999) and Michael Blaszczak’s Professional MFC with Visual C++ 6, Fourth Edition (Wrox Press, 1999). Blaszczak notes that this method is “fraught with pointer arithmetic and alignment tomfoolery.” However, by encapsulating the template details in the Dynamic_dialog and Dialog _item classes, the apparent complexity is addressed once and presents no further disadvantages.
control definition typically consists of an instance of the DLGITEMTEMPLATE struct followed by a Unicode string specifying the control’s initial text (see “Templates in Memory,” Platform SDK: Windows User Interface, MSDN Library CD, July 2001). The dialog box template has to be constructed with particular word alignment in order to work correctly, which is probably why this approach is not widely used. There is an example of building dialog templates in memory in “DLGTEMPL: Creating Dialog Templates Dynamically” (MSDN Library CD, July 2001), but it is incomplete and somewhat unconvincing because it implements most of its dialogs using standard (.rc) resources. The Dynamic_dialog class has a CArray data member that contains one Dialog_item instance for each control in the dialog. The final dialog template is assembled by the Dynamic_dialog::Build_template_buffer and Dialog _item::Write_to_template_buffer functions (in Dyn_dialog.cpp and Dialog_item.cpp).
Dialog Templates in Memory A dialog box template in memory consists of a template header followed by control definitions for each control in the dialog. A standard template header consists of an instance of the Windows SDK DLGTEMPLATE struct, typically followed by a Unicode string for the dialog box title, the font size, and another Unicode string for the dialog box font name. Each
Using the Dynamic_dialog Class The Dynamic_dialog class supports the five most common dialog controls: buttons, combo boxes, list boxes, edit controls, and static text. The DDlgTest application in the code accompanying this article has examples illustrating the use of all these controls. Look for the Dynamic_dialog _test namespace functions in DynDlgTest.cpp. Listing One (available electronically) shows the interface to Dynamic_dialog. You build a dialog by constructing an instance of Dynamic_dialog, then adding controls to it. The tab order of controls is determined by the order in which controls are added to the dialog. Call DoModal to display the dialog, just as you would with a class generated by ClassWizard. The overall dialog appearance can be set using the functions with names having the “Set_dialog_” prefix. Functions to add controls to the dialog have names starting with “Add_.” Most of
Figure 1: Dialog generated by the code in Example 1.
int exposure_time_ms = 8; // Two different numeric types. double laser_power_mw = 40.0; Dynamic_dialog dlg("Camera Image Capture", 155, 100); dlg.Add_OK_button(20, 70, 50, 14); dlg.Add_Cancel_button(85, 70, 50, 14); dlg.Add_edit_control("Exposure time", 15, "ms.", 70, 20, 45, 12, &exposure_time_ms, 1, 100); dlg.Add_edit_control("Laser power", 15, "mW.", 70, 40, 45, 12, &laser_power_mw, 0.1, 200.0); if (dlg.DoModal() == IDOK) { // ... more code ...
Example 1: Code for a dialog with two edit controls and two buttons. S12
these functions have a pointer argument value to associate a variable with the control. All the “Add_” functions have parameters specifying the size and position of the controls. Edit controls, list boxes, and combo boxes often have associated static strings to tell the user what the control’s contents are for. For convenience, the “Add_” functions for these controls take a pointer to a string to be displayed to the left of the control, and a second pointer to another string to be displayed to the right of the control (see Figure 1 and Example 1). If a pointer is 0, then no string is displayed in the corresponding position. Static text strings can also be added to a dialog using the Add_static_text function, but supplying these strings in the same call as their edit control (or list box or combo box) is convenient and makes the intent of the code line self documenting. For numerical values, the Add_edit_control function is implemented as a template function. If you require range checking (data validation) on the variable, then supply the value limits as the parameters min and max. If not, set these two parameters to the same value. Rather obviously, checkboxes are added with the Add_checkbox function. The state of the checkbox is reflected in the value of the associated bool value passed to this function. As far as MFC is concerned, checkboxes are simply another form of button. You add a group of radio buttons with a call to Add_ first_radio_button for the first button in the group, followed by a call to Add_radio_button for each additional button in the group. Call Add_group_box to complete the set of radio buttons. The variable associated with a group of radio buttons is passed as a parameter to Add_ first_radio_button. Buttons and Message Maps MFC uses a message map to associate a call to a dialog member function with a dialog event, such as clicking on a button or updating an edit box. For example, for a dialog produced with the resource editor, ClassWizard would generate a message map like this (with comments removed): BEGIN_MESSAGE_MAP(File_dlg, CDialog) ON_BN_CLICKED(IDC_INPUT_BROWSE, OnInputBrowse) END_MESSAGE_MAP()
to associate function OnInputBrowse with a button click on the button with resource ID IDC_INPUT_BROWSE. The first line of the message map indicates that this message map is part of class File_dlg, (continued on page S15)
Dr. Dobb’s Journal Windows/.NET Supplement, September 2004
http://www.ddj.com
(continued from page S12) which is derived from CDialog. BEGIN_MESSAGE_MAP, ON_BN_CLICKED, and END_MESSAGE_MAP are all MFCdefined macros. There is no need to generate message map entries for the nearly ubiquitous OK and Cancel buttons because the CDialog base class already provides the necessary mapping. These buttons can be added to a Dynamic_dialog instance with the functions Add_OK_button and Add_Cancel_button. Typically, I make an Add_OK_button call immediately after constructing the dialog (see Example 1) so that the OK button has focus when the dialog is displayed. If users immediately press Enter, the dialog acts as if the OK button had been clicked. To add other buttons to a dynamic dialog, you derive a class from Dynamic_dialog and add the button with a call to Dynamic_dialog’s Add_ pushbutton function. You must also provide a message map for your derived class, with an entry for each button. The resource ID specified in the message map must match the value passed as a parameter to Add_pushbutton. As with resource-based dialogs, you provide the function associated with the button in your derived class. An example of adding a pushbutton in a Dynamic_dialog-derived class is in the Pushbutton_dialog class (Push_dlg.cpp and Push_dlg.h) in the DDlgTest application. Browse buttons that let users select a file or directory by popping up a second,
standard dialog for file or directory selection are common in many applications’ dialogs. When users click OK in the second dialog, the file or directory name is copied back to an edit control in the primary dialog. To avoid the need to derive a class from Dynamic_dialog and add a message map every time I wanted this functionality, I decided to implement it directly in the Dynamic_dialog class. The message map and associated functions in Dynamic_dialog are shown in Example 2. I implemented four message map entries; the number can easily be extended if you anticipate needing more than four Browse buttons in a single dialog. To add a Browse button to a dialog and associate the button with the filename contained in an edit control, call function Add_Browse_button immediately after the Add_edit_control call controlling the filename. The first parameter to Add_Browse_button determines whether the secondary dialog is a standard CFileDialog, or a dialog to allow a directory to be selected. An example of a dialog with three CString edit controls with associated Browse buttons is in function Dynamic_dialog _test::Browse_test in the DDlgTest application. Message map notifications are not limited to pushbuttons. The sample application also contains an example of a dialog with two combo boxes where the contents of the right-hand combo box (in this case, a choice of cities) depends on the
// ----- Message Map ----BEGIN_MESSAGE_MAP(Dynamic_dialog, CDialog) ON_BN_CLICKED(Dynamic_dialog::e_first_browse_idc, OnBrowse0) ON_BN_CLICKED(Dynamic_dialog::e_first_browse_idc + 1, OnBrowse1) ON_BN_CLICKED(Dynamic_dialog::e_first_browse_idc + 2, OnBrowse2) ON_BN_CLICKED(Dynamic_dialog::e_first_browse_idc + 3, OnBrowse3) END_MESSAGE_MAP() void Browse(int button); // ----- OnBrowse? functions ---void Dynamic_dialog::OnBrowse0() { void Dynamic_dialog::OnBrowse1() { void Dynamic_dialog::OnBrowse2() { void Dynamic_dialog::OnBrowse3() {
Browse(0); Browse(1); Browse(2); Browse(3);
} } } }
Example 2: Message map in Dynamic_dialog for implementing Browse buttons.
selection made by the user in the left combo box (in this example, a choice of states). The dialog code is in class Derived_dialog (Derived.cpp); the dialog is in Figure 2. The message map for this class is: BEGIN_MESSAGE_MAP (Derived_dialog, Dynamic_dialog) ON_CBN_SELCHANGE (Derived_dialog::e_box_id, OnSelchangeState) END_MESSAGE_MAP()
Conclusion My primary aim in writing the Dynamic_dialog class was to let classes that use dialogs be shared among various projects, without the need to cut-and-paste resources from project to project. To implement simple dialogs, create an instance of Dynamic_dialog and add the required controls with calls to the member functions. Call DoModal, as you would for a dialog generated with the resource editor and ClassWizard, to display the dialog. If users click OK (or presses Enter) to close the dialog, then the variables associated with the dialog controls are updated. When a message map is needed for a dialog, derive a class from Dynamic_dialog and add the controls for the dialog in the constructor of the derived class. As before, call DoModal to display the dialog. I chose the order of parameters passed to several of the Dynamic_dialog functions to allow common default values, and also to be close to the order of the corresponding values in the definition of dialog controls in a resource file. This makes it fairly simple to write a converter to generate C++ code from .rc dialog resources that already exist in a project. Another advantage of building dialogs at runtime is that the current state of the program can be used to decide what controls to display. This can result in dialogs that are less cluttered than resource-based dialogs by avoiding the need to gray out or hide dialog items, for example, which are not relevant to the current state of the application. An obvious extension of the Dynamic_dialog approach is a class to automatically size a dialog and position the controls in it, given a list of controls to be displayed. This, in turn, offers the prospect of dialogs that resize themselves depending on the differences in lengths of strings in different languages such as German and Spanish. Acknowledgements Thanks to Charles Stanhope and Martin Pane, both at InPhase Technologies, for their insightful comments.
Figure 2: Sample dialog with two combo boxes where selecting a state in the left combo box changes the contents of the right combo box. http://www.ddj.com
Dr. Dobb’s Journal Windows/.NET Supplement, September 2004
DDJ S15
PROGRAMMER’S TOOLCHEST
The Qt Designer IDE Everything but the compiler DAVE BERTON
A
mong the many GUI programming libraries available, both free and licensed, the Qt C++ Library from Trolltech (http://www.trolltech.com/) has a strong following. And with the advent of Qt Designer 3.3.1, Qt developers now have a feature-rich IDE with which to design and code GUI applications. The lack of compiler support is the only significant drawback in an otherwise polished and useful tool. The original Qt Designer that appeared in Qt 2.0 did one thing, and did it well — WYSIWYG interface design. This filled a void in the world of Qt development, which at the time, required most of this code be hand written. The Designer in Qt 3.3.1 goes beyond simple GUI building, supporting full project management, source-code editing, resource management for images and database connections, and a full suite of design controls. Designer produces tight, reliable code, and does so for all platforms supported by Qt —Windows, Linux and other UNIX platforms, and Mac OS X. Because Qt is a C++ library, all code produced by Designer is in C++. To support the advanced event handling of Qt, there are some additional preprocessing steps necessary when compiling applications based on Qt. This may seem like unwanted overhead but, in practice, these steps are integrated neatly into the compilation process, and you are rarely bothered by any of the details. Designer works with Qt project files (.pro) directly so that it can be used on an as-needed basis to design and implement the GUI portions of your applica-
Dave is a professional programmer responsible for the original design of the Qt SQL database layer. He can be contacted at
[email protected]. http://www.ddj.com
tion, and it does so in a straightforward and understandable way. Designer’s project management gives you control over application include paths, linking with other libraries, and platform-specific settings; see Figure 1. Designing The original method for designing user interfaces is still available in Designer 3.3.1, and is often the fastest and most straightforward way of using Designer in C++ GUI projects. To fully appreciate what is going on while designing interfaces, an understanding of the Qt object model is necessary. In Qt, all classes inherit from QObject and all visual classes inherit from QWidget. QObject provides the framework for interobject communication and also provides memory management for child objects. QWidget (which is derived from QObject) provides event handling, painting routines, and window functionality. Designer exploits the metainformation available in the Qt object system and, through limited introspection, can make the method names and parameters of Qt classes available to designers. Interfaces are saved in .ui files that are preprocessed by a User Interface Compiler (uic) during compilation. The uic generates regular C++ classes that correspond to the interface you designed. You can extend the behavior of these generated classes by inheriting from them normally. An Address Book application (Version 1 is available electronically; see “Resource Center,” page 5) uses this method of design. The Address Book application displays a list of contacts, and lets you alter and insert contact information using QLineEdit widgets. The latest Designer improves upon this basic method of design by now providing integrated source-code editing of your interfaces, which avoids the extra step of inheriting from Designer-generated classes and significantly speeds up development time. The actual process of building interfaces in Designer is what you would expect from a modern IDE, and Designer supports all widgets provided by Qt (Figure 2). One interesting feature is the automatic layout Dr. Dobb’s Journal, September 2004
tools that let you drop a bunch of controls onto a window, then be able to quickly organize and manage layouts with a minimum of fuss (Figure 3). In addition, widget properties can be edited in groups with the property editor after selecting them in Designer (Figure 4). If you’re familiar with modern GUI builders, you will be comfortable using Designer to build interfaces. You can select different widget types and place them on your windows, grouping and organizing them in any way. You can also set widget
“C++ GUI application development with Qt is fast, maintainable, and flexible” properties, define tool tips, adjust layouts, initialize list boxes, and so on. There are a few annoyances, including the lack of a default action when placing new widgets— pressing Enter usually does nothing, and occasionally double clicking a new widget also does nothing. While right clicking always gives you access to editing basic widget properties, it is annoying to constantly have to resort to this when a reasonable default action should be made available in the main design window. Signals and Slots At the heart of Qt is the signal/slot mechanism, which provides a clean abstraction of event handling. In Figure 5, Designer provides visual signal/slot editing that makes this mechanism intuitive while designing. Even if you have never programmed with Qt before, Designer exposes the most powerful parts of the Qt GUI library to you in an organized fashion. In Qt, signals (which loosely correspond to events) are generated by objects at runtime. These signals are regular class 57
methods that are marshaled by the Qt metaobject system and connected to other widgets via slots (which are also regular class methods). This is a flexible and powerful system for event handling, and it is used extensively throughout the Qt library. An extra precompilation step is necessary using the Meta Object Compiler (moc), which generates the metaobject code used internally by Qt when connecting signals to slots. There are advantages and disadvantages to implementing event handling in this way versus, for example, a template and function objectbased approach (see, for example, http:// libsigc.sourceforge.net/). However, the approach taken by Qt has proven itself over time to be flexible, portable, and fast enough, especially when it is considered
58
that the cost of emitting signals as implemented by Qt is minor when compared to the cost of other code within GUI programs. Creating new slots for your interface can be done in a number of different ways within Designer. While creating signal/slot connections, new slots can be defined for the window to handle them. Also, using the Signal Handler property editor, which displays all the signals generated by the selected widget, you can quickly define new slots that are automatically given reasonable names by Designer. In either case, empty virtual methods are automatically generated by Designer in the corresponding window’s source code. While previous versions of Designer stopped at that point, the latest Designer lets you edit
Dr. Dobb’s Journal, September 2004
the source code for these new slots while you are designing. For example, the bulk of the behavior in the Address Book example is handled by connecting various signals with slots implemented in the AddressBookImpl class, which is the topmost widget in the application. As a developer, you don’t need to worry about all of the internal plumbing implemented by Qt to support signals and slots. Once you understand the concept behind signals/slots, this abstraction speeds up development. The Integrated Editor Designer’s integrated source-code editor is pretty good. It provides syntax highlighting, automatic indentation, tab completion, and limited method name completion. Designer does its best to maintain synchronization between source code and GUI design by adding new methods in the Object Explorer when source code is edited, and automatically creating skeleton code when new methods are added via the IDE. There are a few situations where source code can fall out of sync with the visual design, most notably when you change the class name of a top-level widget. Because Designer cannot keep up with these types of changes, you get errors during compilation and must go back and figure out exactly where Designer lost track. It’s best to design new windows, name controls, and finish the layout before getting into source-code editing. Familiarity with Qt programming is helpful, but not necessary, when editing source code from within Designer. The Qt library is known for its intuitive class hierarchy and method names, so it’s usually easy to accomplish what you want even without a good working knowledge of the library. For example, in the Address Book application, automatically updating line edit controls whenever a new list item is selected is easy to do in the AddressBook::newItemSelected( ) slot (Example 1). The code you are editing for a window is actually a file included by the generated implementation of the widget. The window definition for the Address Book application is contained in addressbook.ui. At compile time, uic generates the corresponding .h and .cpp files that correspond to the design. The custom source code that was added to the AddressBook class is actually contained in addressbook.ui.h, which is included by the implementation of AddressBook in Figure 5. The name of the file containing the custom code for the AddressBook class is a bit of a misnomer, since this file is actually included directly in the implementation file, addressbook.cpp. It would have been more clear to name this file addressbook.ui.cpp. In any case, this included file relies on the http://www.ddj.com
class declaration generated by Designer; therefore, it is important to make sure that any methods added by hand into addressbook.ui.h are also recognized by Designer and added to the list of methods for the AddressBook class; otherwise, it will not compile. Normally, the sourcecode editor within Designer does a good job parsing the code and recognizing both new methods and changes to existing methods. Occasionally, though, the source code can get out of sync with Designer’s definition, so it pays to keep an eye on the Object Explorer while editing source code to make sure it stays up to date. Hopefully, the next version of Designer will address some of these synchronization issues, since editing the source code for slots directly within Designer is by far the most useful feature of this tool. The entire Address Book application, written completely within Designer instead of using inheritance (see Version 2), is available electronically. Resource Management Designer not only manages multiple window designs for a project, but also image and database resources. Consolidating all images from a project makes accessing image resources simple. Images are added to the project’s Image Collection, and behind the scenes Designer generates all the code necessary to include the images into the compiled application, as well as registering the names of the images into the application’s MIME factory so that they can be retrieved with a single function call: addButton->setPixmap ( QPixmap::fromMimeSource( "logo.xpm" ) );
Internally, this static method searches the internal list of registered MIME types until it finds the “logo.xpm” resource that Designer generated. This could all be done by hand, of course, but Designer relieves you of the burden and generates reasonable code. An important new feature in Qt 3.x is the SQL module. Designer has full support for not only adding database connection information to your projects, but also interactively allows you to design with data-aware controls, including data tables, browsers, and views. The Qt SQL module provides uniform access to a variety of databases by presenting a single API that is implemented by a variety of database-specific plug-ins for back ends such as Oracle, MySQL, PostgreSQL, and ODBC. The data-aware widgets in Qt (including QDataTable, QDataView, and QDataBrowser) give you access to efficient data displays as well as inline-editing of data directly from the database. Designer supports these widgets as fully as any other control, which makes development of http://www.ddj.com
Figure 1: Qt Designer 3.3.1.
Figure 2: Populating a window with widgets.
Figure 3: Using layouts to organize widgets.
Figure 4: Editing widgets in a group. Dr. Dobb’s Journal, September 2004
59
nontrivial database applications with Designer just as easy as those created with database RAD tools, such as Microsoft Visual Basic. The main difference being, with Qt, the generated applications run natively on a wide variety of platforms with a simple recompile. Custom Widgets and Templates Adding your own widgets to Designer is usually necessary in any nontrivial program. Designer supports this by maintaining a list of custom widgets that can be added to windows like any other control, although it appears in the design window as an anonymous gray box (Version
3, available electronically, is another implementation of the Address Book application using a custom widget). Just adding your widget’s class name and its relevant signals, slots, and properties to the Custom Widget dialog is enough to let you visually design with it. Designer generates the proper code to include your widget in the interface, and all you need to do is add your custom widget’s implementation to the project. Designer even supports custom widget plug-ins. This requires more coding on your part, but once completed, your custom widget will appear normally in the design window instead of as a gray box. This may be
void AddressBook::newItemSelected( QListViewItem * item ) { // update line edits if ( item ) { firstLineEdit->setText( item->text(0) ); lastLineEdit->setText( item->text(1) ); emailLineEdit->setText( item->text(2) ); } }
Example 1: Updating line-edit controls whenever a new list item is selected.
Figure 5: Editing signals and slots. GUI Design: addressbook.ui
Generated by uic
addressbook.h
Generated by uic
addressbook.cpp #include
addressbook.ui.h
Conclusion The environment in which C++ developers work has a big impact on productivity. Many are quite happy using emacs and a few terminals to develop their applications. Others prefer the bells and whistles of, for example, Microsoft Visual Studio. Between these two extremes lies Qt Designer. Using Designer, C++ GUI application development with Qt is fast, maintainable, and flexible. You have access to all that the Qt library offers including sophisticated widgets, intuitive event handling, and database support. Designer manages entire C++ projects, and it can be customized and extended with user-defined widgets and templates. Designer can even be used on larger projects in such a way that the GUI design is a separate and distinct task from developing application functionality, which may be useful in some development shops. The resulting code generated by Designer compiles and runs natively on almost any platform. Where Designer falls short is its lack of support for compilation. All other major IDEs support the compile/edit cycle extremely well, but Designer is remarkable in that it completely skirts this issue. If you are comfortable with compiling from the command line, you can appreciate this because Designer can essentially stay out of your way. However, it would be a major advantage to new developers if the option existed to launch a compile from within Designer, with the ability to track down compile errors and fix them from within the Designer IDE. Hardcore developers would still have the option to skip this step if they choose, but having the option to compile from within Designer would fill the last remaining hole in an otherwise useful and powerful GUI design tool. DDJ
Figure 6: Files generated by Designer. 60
helpful for those who have separate developers working on GUI design as opposed to nonGUI application functionality, since they will be able to work with custom controls just as if it were any other control within Designer. In practice, it is usually sufficient to give Designer the top-level description of your custom widgets and skip the whole plug-in process. Another helpful tool Designer supports are templates. These can be created for entire window designs, or can be created to provide base class functionality for entire groups of windows or other container widgets. As the name implies, templates provide preconfigured widget layouts that save you the trouble of recreating common layouts (Designer ships with some sample templates including Dialog and Main Window).
Dr. Dobb’s Journal, September 2004
http://www.ddj.com
EMBEDDED SYSTEMS
Bluetooth & Remote Device User Interfaces A protocol for building UIs for small devices RICHARD HOPTROFF
I
originally developed the FlexiPanel protocol in 2002 as an IrDA infrared protocol for engine tuning in hobbyist racing cars. The engine was controlled by a microcontroller that needed to be finetuned with a number of preset values. Incorporating an onboard user interface (such as an LCD display and keyboard) to enter the tuning parameters was impractical — it would be too expensive, heavy, bulky, and fragile. At the time, PDAs with infrared capability were becoming popular, so it seemed logical to use one for the user interface. The FlexiPanel protocol was thus conceived as a way to let embedded systems create UIs on any suitable device that users might have at hand. However, the protocol’s limitations, and its wider potential, soon became apparent. IrDA might be wireless but, in practice, the communicating devices must be held steady within a one-foot (30 cm) range of each other and within line of sight. On the other hand, emerging radio technologies, such as Wi-Fi, Bluetooth, and ZigBee (a low-power protocol), offered 330-foot (100 m) range. Nor did they need exterior real estate — a tremendous product design advantage in terms of cost, aesthetics, and reliability in hostile environments. Consequently, we decided to migrate the technology to Bluetooth, on the grounds that it was widely implemented on remote devices and suited the ad hoc nature of the connection between appliances and remote handhelds. In addition,
Richard is a development engineer at FlexiPanel Ltd and coauthor of Data Mining and Business Intelligence: A Guide to Productivity. He may be contacted at
[email protected]. http://www.ddj.com
Intel had just committed incorporating Bluetooth into its Centrino 2 chipset. While Wi-Fi and ZigBee remain viable transport layers, they currently are not implemented widely enough on handheld devices. At the same time, it became clear that the FlexiPanel protocol’s potential went far beyond engine tuning. It provides a Human Machine Interface for smart dust (intelligent devices comprised of a microcontroller and a few external components), thus far extending their range of application. It offers a UI for “headless” (no screen or keyboard) single-board computers. This promises cost reductions for point-of-sale systems, computer-operated machine tools, and the like, while freeing operators from control panels. In traditional Windows applications, the protocol offers the possibility of a supplementary, roaming UI. Though rarely vital to Windows applications, roaming capability is a liberating addition to any application that might have anything to say to users while away from their desk. In short, the FlexiPanel Bluetooth Protocol is a remote user-interface service for computers, electrical appliances, and other machinery. A FlexiPanel server resides on the application and holds a UI database that reflects the appliance’s human-machine interface needs. A FlexiPanel client can connect at any time, read the database, and display the UI. Users may then control the application from the client device. Using Bluetooth, clients can be up to 330 feet away without need for line-of-sight communication. FlexiPanel clients have been implemented on a range of PDAs and cellphones and are freely available (http:// www.flexipanel.com/DownloadsIndex.htm). Like many higher level protocols such as OBEX file exchange, FlexiPanel sits on top of the RFCOMM serial port emulation layer of the Bluetooth protocol stack (Figure 1). It is not part of the official Bluetooth Standard. However, the Standard is relatively open in that anyone is free to create FlexiPanel clients, and FlexiPanel server licenses are royalty free (http:// www.flexipanel.com/RemoteControlsAPI/ index.htm). As with all emerging technologies, early Bluetooth devices suffered from clunky Dr. Dobb’s Journal, September 2004
operation and compatibility problems. This was particularly true of service discovery— finding out what Bluetooth devices are in range and the services they provide. Many 2002-era devices were limited to preconnection over a virtual serial port, rather than providing service discovery APIs. In the last year, Microsoft’s support for Bluetooth in Windows Sockets has made service discovery a one-click process that even Amazon boss Jeff Bezos would be proud of.
“From the application’s perspective, FlexiPanel is a GUI service just like any other” A User Interface Service From the application’s perspective, FlexiPanel is a GUI service just like any other. It can create controls and subsequently change the properties of those controls. If users modify a control, the application is sent a notification message. The application doesn’t need to know that the UI is displayed remotely. Conversely, just like a web browser, the FlexiPanel client is generic and does not need to know anything about the server it is providing a UI for. (Client software is generally free, too.) Table 1 lists the types of control that may be created. Under the hood, the FlexiPanel protocol is based on just 12 basic types of message passed between client and server (Table 2). The main differences between the FlexiPanel protocol and regular UI services are: • The nature of the client’s UI may be unknown. The controls displayed will always be logically correct but appearances may vary between different client devices. Compare the same slideshow 61
controller UI on a PDA (Figure 2) and a cellphone (Figure 3). If the client device can be anticipated in advance, certain additional preferences can be requested, such as a particular control layout (Figure 2) or keyboard accelerators (Figure 3). • The connection might be broken at any time, for example, if the client’s batteries fail or the client goes out of range. The appliance must enter a failsafe state if connection is lost at a critical moment. • FlexiPanel servers might be small, lowcost devices, such as Parallax’s FlexiPanel peripheral for BASIC Stamps (http://www.parallax.com/), based on a low-end 8-bit microcontroller (Figure 4). Consequently, system requirements on the server side must be extremely lean and communication succinct (unlike XML). The remote client device takes over as many responsibilities as possible. For example, a server is not required to buffer any I/O, manipulate any float-
ing-point numbers, or make any conversions between single-byte characters and Unicode. Bluetooth is a multipoint communication protocol. It is possible for a FlexiPanel server to manage UIs on up to seven remote devices at once. This is only implemented on high-end systems such as Windows servers, but is useful for applications such as restaurant table service, where several wait-staff can connect to the same server at once. A quick disconnect mode is also available, where servers close the connection immediately after connection has been established and control panel information sent. This lets servers send UIs to a large number of clients connecting asynchronously, although the clients have no chance to send any messages back to the server. The FlexiPanel protocol plays no role in authentication, encryption, error detection, or power management. These
are expected to be managed by other layers of the Bluetooth library. Embedded Systems Example In the following example, I use the FlexiPanel server C library to create a remote control panel for a data-logging embedded system. The library is intended to work with the lowest level embedded controllers possible. Therefore, I make no assumptions within the library about serial port buffering or multithreading support. The data logger records the value of a proximity sensor. It has no UI of its own and relies on FlexiPanel to communicate
Application
Audio
SDP Service Discovery
Remote UI
FlexiPanel
OBEX
File Exchange
RFCOMM Serial Port Emulation
L2CAP Logical Link Control
Baseband Link Manager, Link Controller & Radio
Figure 1: FlexiPanel in the Bluetooth protocol stack.
Figure 2: Powerpoint presentation controller UI on a PocketPC.
Logical Control
Example Depiction on a Remote Client
Function/Value
Button Latch
Button Check Box Radio Button Static Text Edit text Progress Bar Slider Table Column Chart Line chart Date Time Picker List Box Popup Menu Client-Specific Dialogs Message Box Client-Specific Dialogs Client-Specific Dialogs
Single-press event. Binary value.
Text Number Matrix Date Time List Section Password Message Blob* Files*
Character string. Integer or fixed-point value. 2D array of numeric values. Seconds to years plus day of week One of n selection. Arranges controls in a hierarchy. Controls access UI. Alerts users. Exchanges binary data. Exchanges files.
Table 1: Controls provided by the FlexiPanel protocol. (∗Optional — a client device is not required to implement this control.) 62
Figure 3: Powerpoint presentation controller UI on a Smartphone.
Dr. Dobb’s Journal, September 2004
Figure 4: FlexiPanel server implemented on an 8-bit microcontroller. The Bluetooth radio is mounted on the reverse side of the board. http://www.ddj.com
with the outside world. The distance measured is displayed as a digital readout on a number control and as a historical log in a matrix control. The data logger (Figure 5) is based on an 80186-based FlashCore embedded controller from Tern (http://www.tern.com/). A GP2D12 analog proximity sensor (Sharp Electronics) is connected to an A/D input and serves as the data source. A BlueWAVE Bluetooth serial module from Wireless Futures (http://www.wirelessfutures .co.uk/products/) is connected to a serial port to enable connection to FlexiPanel remote clients. The data logger’s job is simple and few lines of code are required to implement it (Listing One). All calls to the FlexiPanel server library begin with the prefix FBVS. Like most FlexiPanel applications written for embedded controllers, it consists of fixed sections: initialization, user interface definition, main program loop, client message processing, and an ungraceful disconnect handler. • Initialization. The Bluetooth stack and FlexiPanel server are initialized. This includes giving the UI a name. This name appears in a client’s UI in the list of servers available for connection. • User Interface Definition. Sandwiched between calls to FBVSStartControlList and
http://www.ddj.com
FBVSPostControls, each control is defined by a call to an FBVSAdd… function. Each time FBVSPostControls is called, the previous UI definition is replaced; the UI can thus be changed at any time. FBVSSetOption lets an application request the way a control is depicted on a specific remote device. In Listing One, a pointplot chart is specified if the client is a PocketPC or a Windows computer.
• Main Program Loop. In the main program loop, the FlexiPanel library is polled for messages from client devices. Every second, the sensor is sampled and the sampled value is written to both the digital readout and historical log controls. In addition, every few seconds a ping test is made to check whether a client device is still in range. If a client was connected but
Figure 5: Data logging embedded system using the FlexiPanel protocol to create UIs on PocketPCs.
Dr. Dobb’s Journal, September 2004
63
example, the server clears the matrix control of old data. FBVSN_ClientData, where the remote device modified a control, users press a button, for instance. In this particular application, neither of the controls is modifiable by the client and so nothing needs to be done. Usually, however, the application would be expected to respond to any user interaction at this point. FBVSN_ClientDisconnected, where a remote device is disconnected from the server. In this example, the message is ignored. FBVSN_Abandon, an error occurred. This message has only ever been witnessed during application- device
contact was unexpectedly lost, control is passed to the ungraceful disconnect handler. • Client Message Processing. When clientrelated events occur, the FlexiPanel server library posts notification messages to the application (Table 3). The data logger uses the FBVSGetNotifyCode function to collect these messages. Most notifications are informational and useful only for providing a local indication of the connection status. The four messages that the developer should always consider are: FBVSN_ClientConnected, a remote device connected to the server. In this
Figure 6: Main order screen in the OrderMaster application.
Figure 7: An appetizers subscreen in the OrderMaster application.
Message
Originator (Client/Server)
Purpose
Greetings Goodbye New Control Panel Control Modified Ping Ping Reply Ack New Server Props Update Files Profile Request Profile Reply
Either Either Server Either Either Either Either Server Server Server Client Server
Establishes a connection. Closes a connection. Sends control descriptions to client. Modifies a control's value. Presence check request. Presence check confirmation. Acknowledge receipt of message. Server initializing. Modifies a control's properties. Downloads requested files. Requests device-specific layout advice. Downloads device-specific layout advice.
Table 2: FlexiPanel client-server message types. Notification
Meaning
No Notify Client Connected Client Data Client Disconnected Got Pinged Got Ping Reply Got Ack Got Profile Request
No notable events have happened. Client connected to server. Client modified a control. Client disconnected from server. Client pinged server and server pinged back. Server pinged client and client pinged server back. Client acknowledged receipt of message. Client asked how to lay out controls and server replied.
Table 3: Notification messages that a FlexiPanel server can send to an application. 64
Dr. Dobb’s Journal, September 2004
development due to a readily apparent programming error. In this example, as a precaution, control is passed to the ungraceful disconnect handler if this message is received. • Ungraceful Disconnect Handler. The application must provide for the possibility that the connection is lost unexpectedly. This might occur because the remote device has gone out of range or its batteries are worn out. In this example, no action is necessary. In applications that control machinery, an emergency stop procedure should be implemented. Windows .NET Example The OrderMaster console application uses the FlexiPanel .NET library to create a remote control panel for a restaurant ordertaking system. A Windows server computer with a printer is located near the kitchen and prints out the orders for the kitchen and the checks for the customers. It needs to be Bluetooth equipped and, therefore, will require a USB Bluetooth adapter (ideally a class 1 adapter with a 330-foot range). The wait-staff carry remote devices for taking orders. The application is targeted specifically for PocketPC clients. The UI has been designed using a fat thumbs approach, where the controls are large enough that no stylus is required. In keeping with the FlexiPanel philosophy, however, any FlexiPanel client would be able to connect. A waiter who lost his PocketPC could always use his cellphone to take orders. Indeed, it would be possible for customers to place orders and print out their checks themselves if they had an appropriate device. Figure 6 is the OrderMaster main screen. At the top is a text control for displaying a summary of the current order. Below it are section controls that drill down to specific sections of the menu. At the bottom is a list control to select the table being served and buttons to order the check, clear the order, and confirm the order. Figure 7 is an Appetizer subscreen. At the top is the section control that returns to the main screen. Below it are number controls for setting order quantities for individual items on the menu. The code required to implement OrderMaster is available electronically; see “Resource Center,” page 5. Since much of the code is repetitive and would in practice be replaced with loops working from a menu database, code for subscreens other than the appetizers has not been implemented. The FlexiPanel .NET namespace is called RCapiM. It consists of static function calls for management of the user interface service and classes for each of the 12 control types. Like the data logger, this http://www.ddj.com
application consists of the same five sections: initialization, UI definition, main program loop, client message processing, and ungraceful disconnect handler. • Initialization. The _tmain( ) entry point in midway through OrderMaster (available electronically) begins by initializing the FlexiPanel library, setting up the Bluetooth port, giving the UI a name, and setting up a timer to ping the remote device regularly. Since a PocketPC client is anticipated, several options are requested specifically for it. It should hide its usual navigation buttons and it should regularly ping the server (the computer, that is, not the waiter!). • User Interface Definition. The UI is defined by creating controls and attaching them to an ArrayList called RemoteForm. The RemoteForm is then sent to the FlexiPanel::PostControls static function in order to display the control panel. If the application needs to respond when the user modifies a control, a delegate is added to the OnClientModify event for that control. Since the delegate will be called from a dif-
ferent thread to the main thread, the delegate contains static pointers to controls it needs to access. A delegate is also added to the static OnClientNotify event so that the order can be cleared if a Client Disconnected notification is received (see Table 3). • Main Program Loop. In the main program loop, nothing happens except for waiting for the instruction to quit. All further activities are in response to events. • Client Message Processing. The following delegates respond to events and are managed within the EvtHandler struct: OnClientNotify— if the client notifies it is disconnecting, the order is cleared; OnButCheck— if the Check button is pressed, the check is printed; OnButClear— if the Clear button is pressed, all order quantities are set to zero; OnButConfirm— if the Confirm button is pressed, the order is printed for the kitchen, then all order quantities are set to zero; OnCtlModify — if the Table list box is modified or one of the menu section controls is opened or closed, the status text box is updated; OnPingTimer— the client is pinged. If contact is lost, an Un-
Listing One }
// Bluetooth module is connected to serial port SER1 extern COM ser1_com; #define BTH_BAUD 12 // 115,200 baud for SER1 #define BTH_INBUFF 1024 // SER1 Input buffer size #define BTH_OUTBUFF 1024 // SER1 Output buffer size unsigned char ser1_in_buf[BTH_INBUFF]; // SER1 Input buffer unsigned char ser1_out_buf[BTH_OUTBUFF]; // SER1 Output buffer // UI constants #define CID_ATOD 1 #define CID_CHART_TY 2 #define NUM_CTRL 2 #define NUM_OPTION 2 // function prototypes void ProcessFlexiPanelMessages(void); void HaltProcesses(void); }
void main(void) { int16 loopcount; // initialize serial port (calls serial I/O library) s1_init( BTH_BAUD, ser1_in_buf, BTH_INBUFF, ser1_out_buf, BTH_OUTBUFF, &ser1_com ); // initialize FlexiPanel library and give the UI a name FBVSInit( NULL, NULL, 1, &ser1_com, NUM_OPTION ); FBVSSetDevNameAndCharSet( "Tern Demo" ); // initiate control panel description FBVSStartControlList( NUM_CTRL ); // numeric display of distance FBVSAddNumber( CID_ATOD, CTL_NUM_FIXEDPOINT, "Range", 0, 0, 0, 2, -2, "%% m", NULL); // graphical display of distance log FBVSAddMatrix(CID_CHART_TY, CTL_MTX_DATA_TY | CTL_MTX_Y_FIXEDPOINT | CTL_MTX_Y_2BYTE, "Data Log", 30, 0, NULL, 1, NULL, "Range", "Time", "%% m", "%HH%:%mm%:%ss%", 2, -2, 0, 0, NULL); // suggest how chart might be displayed FBVSSetOption( PPC_DEV_ID, CID_CHART_TY, PPC_ATT_STYLE, PPC_CST_MATRIX_POINTS ); FBVSSetOption( WIN_DEV_ID, CID_CHART_TY, WIN_ATT_STYLE, WIN_CST_MATRIX_POINTS ); // complete control panel description FBVSPostControls( ); // start control panel service FBVSConnect(); // main program loop loopcount = 0; while (1) { // ensure each loop takes around 10ms delay_ms(10); // discover whether client has sent any messages ProcessFlexiPanelMessages(); // every 3 seconds, ping loopcount ++; if (loopcount == 300) { loopcount = 0; // ping if ( FBVSIsClientConnected() && FBVSIsPingSupported() && FBVSPing() ) {
http://www.ddj.com
}
graceful Disconnect Handler procedure is followed. • Ungraceful Disconnect Handler. The application clears the order and waits for the client to reconnect. Future Developments Born almost by accident while trying to optimize engine performance, the FlexiPanel Protocol has evolved into a patented userinterface service for a range of devices from tiny microcontrollers to .NET applications. Future development plans include: • Embedding the protocol directly inside Bluetooth chipsets. This will lower the cost sufficiently so that even basic electrical appliances such as light switches can create remote user interfaces. • Provision of a local Bluetooth to an HTTP bridge so that a web browser anywhere in the world might connect to a FlexiPanel server. • Provision of a local Bluetooth to telecoms bridge so that a FlexiPanel server can be accessed by anyone dialingin using a touch-tone phone. DDJ
// lost contact with remote device; continue cautiously HaltProcesses();
} // every second, log proximity sensor if (loopcount%100==0) { int16 range; DateTimeU dt; // read A/D (calls A/D library) range = fb_ad16( 0xc6 ); // update numeric display FBVSSetNumberControlData( CID_ATOD, range ); // update chart SetToCurrentTime( &dt ); FBVSAddMatrixControlData( CID_CHART_TY, &range, &dt ); // send updated time to client FBVSUpdateControlsOnClient( ); }
void ProcessFlexiPanelMessages(void) { // check for message switch (FBVSGetNotifyCode()) { // nothing has happened case FBVSN_NoNotify: break; // Client has connected; clear the contents of the matrix control case FBVSN_ClientConnected: FBVSSetMatrixControlData( CID_CHART_TY, 0, NULL, NULL ); break; // Client has modified a control. in this app, no controls are // modifiable by the client, so nothing to do case FBVSN_ClientData: break; // Client has disconnected case FBVSN_ClientDisconnected: break; // following messages are informational only and will be ignored case FBVSN_GotProfileRequest: case FBVSN_GotPinged: case FBVSN_GotPingReply: case FBVSN_GotAck: case FBVSN_IncompatibleVersion: break; case FBVSN_Abandon: // Error; generally only gets here during development HaltProcesses(); Reset(); break; } } void HaltProcesses(void) { // In this function, anything controlled by the embedded // controller is put in a fail-safe state. // nothing being controlled in this app, so nothing to do }
Dr. Dobb’s Journal, September 2004
DDJ 65
ECLIPSE
Eclipse & General-Purpose Applications A universal platform, Java IDE, integration tool, and more TODD E. WILLIAMS AND MARC R. ERICKSON
W
hen building desktop applications, you need to start with good design and architecture. Because there is no universally accepted desktop-application framework, most developers design their own architecture, then build it into a framework themselves. However, the costs of this approach are a considerable expense, time, debug effort, support, and aggravation expended on solving a problem that is peripheral to building the business functionality of the application. Another approach is to find a framework that accommodates your needs to simplify and accelerate project development. A “wish list” for such a framework would (among other things): • Implement a clear, consistent, and cohesive architecture. • Support development and execution on all the major desktop platforms (Windows, Mac OS X, Linux, QNX Photon, Pocket PC, HP-UX, AIX, Solaris). • Have UI response that is “snappy,” while maintaining the platform’s native lookand-feel. • Provide a variety of widgets, both standard (buttons and checkboxes) and extended (toolbars, tree views, progress meters). • Provide extensive text processing that include editors, position/change manTodd is vice president of technology at Genuitec and Marc is a principle at Communications & Media Arts. They can be contacted at
[email protected] and
[email protected], respectively. 66
agement, rule-based styling, content completion, formatting, searching, and hover help. • Support the use of platform-specific features (ActiveX, for instance) and legacy software (if necessary). • Enable product branding for the application. • Contain an integrated help system. • Manage user configuration and preferences. • Support remote discovery and installation of application updates. • Support internationalization and national language translation. • Support flexibility with features for adding new functionality. Just to complete this wish list, we might as well throw in that it’s created and maintained in an open-source community, royalty free, and licensed to provide worldwide redistribution rights. Although these requirements may sound like a pipe dream, Java application developers may already have this application framework installed on their machines — Eclipse. But Isn’t Eclipse a Java IDE? The short answer to the question of whether Eclipse is a Java IDE is, well, yes and no. According to the the Eclipse Project (http://www.eclipse.org/) FAQ: The Eclipse Project is an open-source software development project dedicated to providing a robust, full-featured, commercialquality, industry platform for the development of highly integrated tools.
So, by definition, Eclipse is an open platform for tool integration, not an IDE. The issue has been confused because a complete industrial-strength, full-function Java IDE is included with the Eclipse platform, in the form of plug-in components that extend Eclipse’s basic framework facilities. Eclipse provides the framework for combining disparate tools into a single integrated application with a seamless user interface. New tools are integrated into the Eclipse platform and user interface Dr. Dobb’s Journal, September 2004
through plug-ins that extend Eclipse’s facilities and provide new functionality. Additionally, Eclipse plug-ins can extend other plug- ins. When an Eclipse-based application initializes, it discovers and activates all plug- ins configured for the workstation. The Eclipse platform is literally the sum of its parts because it is capable of performing any function that has been added to it by the plug-ins it currently contains. Because being able to write and test plugins is central to Eclipse, the Eclipse platform is bundled with a plug-in development environment (PDE) and a set of Java development tools (JDT) to support it. The Eclipse
“Eclipse provides the framework for combining disparate tools into a single integrated application” developers clearly trusted the power of the frameworks they created. The entire development environment is just another set of tools integrated into the platform using the standard plug-in techniques. The Eclipse platform itself was created using the Eclipsebased Java IDE (initially in beta form). Since it’s open source, you can inspect the code and understand in detail exactly how the frameworks are used. Eclipse Framework Overview As an “IDE for anything, and nothing in particular,” Eclipse embodies an extensible design that maximizes its flexibility as an IDE platform. However, the Eclipse architecture defines sets of layered subsystems that let it be used as a framework for a portable desktop application (or suite) that is not an IDE. These subsystems include: http://www.ddj.com
(continued from page 66) Extensibility Model. Requirements change over time, so developers often expend effort designing applications that are flexible and extensible. Eclipse is built around a highly flexible and extensible plug-in model to enable any type of tool to be added to the platform. If you begin to think of a desktop application as a tool, or set of tools, it becomes apparent that your application functions and facilities can be added into an Eclipse-based desktop as a set of plug-ins, just as Eclipse’s native Java IDE capabilities have been. Content Model. Eclipse provides a content model built around the concept of a workspace into which tools (applications) can be installed. The tools operate on resources that are organized into projects within the workspace. Projects contain a tree structure of resources, which are folders and files containing any type of content. The core platform provides a large number of extension points that allow customization of all aspects of resource lifecycle management. The hierarchical, categorized nature of the content model lends itself to many types of desktop applications with a bit of thought. For example, a simple e-mail client could be built upon a workspace that contains a single project associated with the user’s e-mail account. The user’s project could contain folders for the common functional e-mail elements such as inbox, outbox, and sent items. Each of these folders could contain the corresponding set of e-mail messages as project resources. Widgets on Steroids. The Eclipse platform contains the Standard Widget Toolkit (SWT), which is implemented natively on all supported Eclipse platforms. SWT contains a large set of events, layout managers, and widgets. When a supported platform does not contain a native widget that is supported by Eclipse, such as a toolbar on Motif, an emulated widget for that platform is provided. SWT also interacts with native desktop features, such as drag and drop. Additionally, SWT can use OS-specific components, such as Windows Active/X controls, if such functionality is more desirable than full platform portability. So far, SWT has been proven on the Windows Win32 and PocketPC, Photon, Motif, and GNU window managers, covering deployment platforms from high-end workstations to embedded devices. User-Interface Framework. To build a graphical interface, SWT may either be used directly or through JFace, the user interface framework of the Eclipse platform. JFace includes dialog, preference, progress reporting, and wizard frameworks as well as image and font registries that make userinterface creation straightforward. 68
The Eclipse platform supports a multiwindow, MDI-like user-interface presentation. On top of JFace and SWT, the Eclipse workbench provides a framework for building perspectives, editors, and views that provide the structure for user interaction. Editors handle resource life-cycle interactions such as creation, editing, saving, and deleting. Views are used to provide supplementary information about an object with which the user is interacting. Examples include outline, pending tasks, and property views. A perspective is a stacked, tiled, or detached arrangement of views and editors. Only one perspective is visible within a window at a time, but you may open multiple windows to view multiple perspectives simultaneously. The Eclipse user-interface framework is extensive, flexible, and powerful. And, even if it doesn’t do everything you need, it can easily be extended for much less investment in time and resources than designing and building your own. Update Manager. Historically, one of the biggest problems associated with desktop applications is the support cost incurred to package, distribute, maintain, and upgrade the application as new versions are released. This cost increases when a large and dispersed user community uses the application. Component maintenance and upgrade facilities were part of the design of Eclipse from the beginning. To control ongoing cost and remove maintenance issues that could become barriers to project development and deployment, the Eclipse platform contains a flexible update manager. The update manager can be configured to perform both initial installations of new components or updates to existing components from a remote server. As you release new versions of your application or add-on components, distribution can be as easy as packaging them using Eclipse facilities and placing them on your update server. Help System. Every professional desktop application has a help system, and Eclipse is no different. However, Eclipse’s help system isn’t simply built from a static group of HTML files that tell you about Eclipse. Rather, it is a framework for providing both searchable and contextsensitive help that is open for extension by documentation plug-ins. Once your application is complete, everything is available for constructing, packaging, and shipping a complete, custom, context-sensitive help system without third-party tools. Eclipse as an Application Framework Eclipse satisfies the full function and facility wish list mentioned earlier, while providing the program development environment for building the project as a series of Eclipse-style plug-ins. You’ll have an apDr. Dobb’s Journal, September 2004
plication that is architecturally sound, extendible for future enhancements, interoperable with other plug-ins (even those created by others), and can upgrade itself remotely. The main question then becomes how much of Eclipse do you need? An application can be built upon the Eclipse framework by removing functions that you don’t want and then adding functions that you do. Removing Eclipse functions is the easy part; just take out plug-ins that provide unneeded features. If you’re not building another IDE, a good place to start is to remove all the JDT, PDE, and VCM (version-control management) plug-ins. With that starting point and a bit of experience, you can evaluate components and continue to remove unnecessary features by removing the corresponding plug-ins. Once you pare down the plug-ins to the bare minimum, you’ll find that Eclipse still has a few development capabilities represented in the UI. To remove these from the framework, slightly more invasive techniques have to be used. The workbench UI plug-in provides the base UI capabilities of Eclipse. Removing extensions from the plug-in’s XML descriptor is an easy way to further reduce visibility to the extra features that your application doesn’t require. After all, unnecessary plug-ins are removed and you’ve minimized the extensions in the workbench UI, there is still one more avenue available to further reduce the base framework size. Because Eclipse really was intended to be a framework for integrating development tools, a few of Eclipse’s low-level development concepts are built into the workbench UI plug-in directly rather than being provided as extensions. The easiest way to remove them is to comment out undesirable features in the source code and rebuild the plug-in. While this is unfortunate, keep in mind that “unwriting” a few bits of code is easier and faster than writing an entire application framework. Clearly, if you take this approach, you’ll need to be able to repeat the changes when migrating to new versions of Eclipse source code, at least for a little while. The good news is that the Eclipse team has recognized that such new and innovative uses of Eclipse should be supported at the platform level. Eclipse 3.0 specifically targets enabling the use of Eclipse as a rich client platform. Even if you need to make other changes to Eclipse, the Common Public License that controls access and use of the Eclipse platform lets you make and distribute commercial derivative works from its source. There is no license requirement to remain compatible with Eclipse or donate changes back to the open-source project, but it’s clearly to your advantage to stay as standard as possible. http://www.ddj.com
Once unnecessary functions have been removed from the framework, building applications is simply a matter of writing your own plug-ins, adding features to the basic Eclipse framework, and branding them with your own logos. For a large application, consider writing it as multiple custom perspectives and supporting views. If you have a suite of small applications, each one can be a single perspective. Or, you can use Eclipse as a portal to integrate all your organization’s homegrown applications. SWT Overview The Standard Widget Toolkit is designed to provide portable UI facilities that directly call the window manager of the underlying operating system. This differs from the approach taken by Swing technology, which emulates user interfaces bit by bit, then presents the entire interface bitmap to the underlying window manager for display to the user. Speed and responsiveness result from letting the operating system’s window manager do the work. SWT itself is a thin layer that conforms the API calls for standard widget-like lists, buttons, and text boxes into a transportable interface. There is no separate peer layer as in the AWT class library. The SWT widget directly calls the underlying operating system’s window manager. Platform integration is not just a matter of look-andfeel. Tight integration includes the ability to interact with native desktop features such as cut-and-paste or drag-and-drop, integrate with other desktop applications. SWT establishes a functionally rich least common denominator of user interface widgets in three ways: • Features that are not available on all platforms can be emulated on platforms that provide no native support. For example, the Motif widget toolkit does not contain a tree widget. SWT provides an emulated tree widget on Motif that is API compatible with the Windows native implementation. • Features that are not available on all platforms and not widely used have been omitted from SWT. For example, the Windows calendar widget is not provided in SWT. • Extended window manager features that are specific to a platform, such as Windows ActiveX, are accommodated through separate, well-identified packages. The design point for SWT provides all of the widgets that were necessary for integrating typical development tools. SWT has to remain simple to run successfully on multiple operating systems. Sometimes, the underlying behavior of the window manager impacts the way that applications paint and SWT generally lets the http://www.ddj.com
OS “win.” Functional portability is established by ensuring that all but the lowest level direct interface calls to the OS are written in Java. This has proven itself as SWT has been ported to new environments. The calls across the Java Native Interface to the operating system’s C/C++ APIs are straightforward, making it easier to accommodate new platforms. Most users like the fact that the resulting interfaces look just like others rendered by their workstations. To bring some consistency to user interfaces implemented for the Eclipse Workbench, SWT includes support for creating custom widgets, leading to a specific application look and feel. This consists of a small set of carefully designed components that are generally useful. This includes extended support sensing userselected default color selections and border widths using mechanisms provided by the operating-system window manager. Since SWT is royalty free open source (licensed under the Common Public License), you can also review the actual source code (also at http://www.eclipse .org/). There are separate versions for each supported operating system/windowmanager combination. If, after evaluation, you decide that you don’t want to use Eclipse as an application framework, but want to use the SWT widget set as a replacement for Swing or AWT, you can do that, too. Just add the SWT jar file to your application’s classpath, place the SWT shared library on your library path, and build your SWT user interface. You’ve now got a completely new UI that makes your application look like a native on whatever platform it is running. Be forewarned that the programming model for SWT is different than Swing or AWT, but once you begin to use SWT, you realize that the differences are actually a benefit. SWT applications start by creating a “Display” representing an SWT session. A “Shell” is created to serve as the main window for the application. Functional and display widgets are created within the shell. Characteristics and states of the widgets are initialized and event listeners are registered. When the shell window is opened, the application consumes the event dispatch loop until an exist condition is detected, typically when the main shell window is closed by users. At that point, the display must be disposed of. The Display represents the connection between SWT and the underlying platform’s GUI system. Displays are primarily used to manage the platform event loop and control communication between the UI thread and other threads. For most applications, you can follow the pattern just described. You must create a display before creating any windows, and you must dispose of the display when your shell is closed. Dr. Dobb’s Journal, September 2004
A shell is a “window” managed by the OS platform window manager. Top-level shells are those that are created as a child of the display. These windows are the windows that users move, resize, minimize, and maximize while using the application. Secondary shells are those that are created as the children of another shell. These windows are typically used as dialog windows or other transient windows that only exist in the context of another window. When your application creates a widget, SWT immediately creates the underlying platform widget. This eliminates the need for code that operates differently depending on whether the underlying OS widget exists. It also allows a majority of the widget’s data to be kept in the platform layer rather than replicated in the toolkit. This means that the toolkit’s concept of a widget lifecycle must conform to the rules of the underlying GUI system. Some widget properties are set by the operating system at the time a widget is created and cannot be changed. For example, a list may be single or multiselection, and may or may not have scroll bars. These properties, called “styles,” must be set in the constructor. In some cases, a particular style is considered a hint, gracefully ignored on platforms that do not support it. The style constants are located in the SWT class as public static fields. A list of applicable constants for each widget class is contained in the API Reference for SWT. SWT explicitly allocates and must explicitly free operating-system resources. In SWT, the dispose( ) method is used to free resources associated with a particular toolkit object. If you create the object, you must dispose of it. When the user closes a shell, the shell and all of its child widgets must be recursively disposed. It’s possible in SWT to register a disposal listener for each widget that can automatically free associated graphic objects. There is one exception to these rules: Simple data objects such as Rectangle and Point do not use operating-system resources. They do not have a dispose( ) method and you do not have to free them. A Control is a widget that typically has a counterpart representation (denoted by an OS window handle) in the underlying platform. The org.eclipse.swt.widgets package defines the core set of widgets in SWT. Conclusions Eclipse can be “both a floor wax and a dessert topping.” It is a complete, universal tool-integration platform, a platform for building IDEs for any language, a Java IDE, a better Java UI widget set, and a portable, a la carte application framework. Serving a broad array of project development needs, Eclipse has something for everyone. DDJ 69
ECLIPSE
Writing Plug-Ins for the Eclipse C/C++ Development Tools Project A platform for building specialized C/C++ tooling DOUG SCHAEFER AND SEBASTIEN MARINEAU-MES
L
ike a growing number of developers, we’ve come to appreciate the many useful features found in the Eclipse Java development environment (JDT). These features include a content-assist facility that offers highly accurate code completion, a structured search that uses semantic information about a program to accurately locate definitions and references, and a refactoring engine that can improve the structure of code without changing its behavior. Together, these features have raised our Java productivity. That said, we also long for an environment that can bring the same benefits to our C++ chores. As it turns out, the Eclipse CDT Project (http://www .eclipse.org/cdt/) was established to do just that— provide a fully functional C/C++ IDE for the Eclipse platform. The CDT Project got started when a team of developers at QNX Software Systems realized the power and extensibility of the JDT, and decided to bring similar functionality to the C/C++ community. As with other Eclipse projects, the CDT is written mostly in Java and runs on a range of development hosts, including Windows, Linux, Solaris, and QNX Neutrino platforms. It also supports a variety of target processors and operating systems. Doug is a senior software developer for the Rational division of IBM and Sebastien is an operating systems manager for QNX Software Systems. Sebastien is also the project leader for the Eclipse CDT. They can be contacted at
[email protected] and
[email protected], respectively. 70
The CDT is, in essence, a platform for building specialized C/C++ tooling (see Figure 1). It adds various extensions to Eclipse’s many XML-based extension points, letting tool creators adapt the CDT’s edit/build/debug/launch capabilities to their particular toolchains, code-generation tools, and code-analysis products. As a result, users can choose from a basket of tool offerings that can be customized to address particular development needs. How, exactly do you go about customizing the CDT? To illustrate, we create an Eclipse plug-in that takes advantage of the CDT’s built-in parser and code model to provide a qualitative assessment of the developer’s coding style. Enforcing Encapsulation Writing object-oriented applications has become second nature to many developers, thanks, in large part, to C++. Nonetheless, an early advantage of C++ continues to haunt the object-oriented purist — its compatibility with C. While this compatibility has eased the adoption of objectoriented techniques in the C world, it also lets programmers get away with sloppy object-oriented style. For instance, many developers let data members of a class be accessible, even though this violates the OO principle of encapsulation. To help you avoid such stylistic improprieties, our plug-in marks declarations of public data members as errors. To illustrate several concepts, we implement our public data checker as an Eclipse builder. A builder is the object provided by any plug-in that participates in the Eclipse build framework, and is invoked when files are saved (provided that automatic builds are turned on) or when users select either Build or Rebuild from the Project menu. At first, a builder may not seem the obvious choice to implement our checker. However, we want to emulate compile-time checking, and by creating a plug-in invoked at build time, we can achieve that goal. Creating the Plug-In To create the plug-in, we use the Eclipse Plug-in Development Environment (PDE), Dr. Dobb’s Journal, September 2004
which provides a number of wizards and editors (http://www.eclipse.org/pde/). First, launch the PDE’s new project wizard and select the new Plug-in Project template in the Plug-in Development section. The wizard asks you for the name of the plugin — call it “org.eclipse.cdt.stylist”— and lets you modify options that configure code generated for the plug-in. To keep things simple, click Finish to accept the defaults for the remaining options.
“The PDE provides a test environment called the ‘Runtime Workbench’” The wizard generates several files including plugin.xml, the manifest file that tells Eclipse about the new plug-in (see Listing One). Using this file, you declare where components of your plug-in plug into the Eclipse platform. Again, the Eclipse interconnection mechanism is based on extension points (well-known interfaces defined in XML) to which the plug-in contributes extensions. This non-Java declaration offers a significant benefit: The plug-in is loaded only when it’s needed. Given that an Eclipse installation can have hundreds of plug-ins, this feature can greatly improve startup times. To allow access to features in the Eclipse resource plug-in and CDT core, ensure the plug-ins org.eclipse.core.resources and or.eclipse.cdt.core are in the dependencies list. Creating Menu Items Because plug-ins are loaded on demand, you need a way to get the plug- in rolling. In the plugin.xml editor (on the Extensions pane), you see an Add button. This button launches a wizard that allows you to create Eclipse components. Select the Extension Wizards pane, then http://www.ddj.com
select the “Popup Menu” template. Specify where you want the menu item to appear and what names to show on it (see Figure 2). To have the menu item appear on CDT projects only, declare org.eclipse.cdt.core .model.ICProject as the target object class. That way, the item only shows up in views where objects of this type are shown; that is, projects displayed in the C/C++ Projects view of the C/C++ Projects Perspective. For the class name, use EnablePublicDataChecking (see Listing Two). Once you’ve defined the menu item, feel free to test it out. The PDE provides a test environment called the “Runtime Workbench” that launches a new copy of Eclipse with the plug-ins in your workspace enabled. You can do this now to create a CDT project and see the menu item active on that project. The PDE generates a skeletal implementation of the menu item that, when selected, pops up a message dialog, proving that things are hooked up.
hurry, simply check the list of builders applied to the project, as shown in the project properties. You should see the builder
Figure 1: Components of the CDT UI, including the project view, the C/C++ editor, the outline view, and task markers.
in the list of builders, bearing the name you gave to the extension.
Figure 2: Creating the Popup menu.
Creating the Builder Next, create the builder. Again, this is done by creating an extension point. Since the PDE does not provide a template for creating builders, you need to use a schemabased generator. Click the Add button again and, in the Extension Points pane, select the org.eclipse.core.resource.builders extension and click Finish. In the Element Details Section of the editor, set the ID and name as in Figure 3. Use the Extensions editor to add a new builder and run element to the extension. In the Details section for the run element, you see a property named class. Click the class hyperlink and you see another wizard that lets you generate a new Java class. This class provides a skeletal implementation for the builder, so let’s call it PublicDataChecker (see Listing Three). Once you’ve defined much of the plugin’s structure, you can start hooking things together. In the PublicDataChecker class, add a static method, addBuilder, which adds the builder to a project. To do this, take the ID string you assigned to the builder extension and add it to the list of commands that uses the project. The ID string allows Eclipse to refrain from loading the plug-in until you load a project associated with the plug-in. At this point, you need to implement the menu item in the EnablePublicDataChecking class. Start by implementing the selectionChanged method to capture the list of selected items. Then, implement the run method to call the PublicDataChecker’s addBuilder method to add our builder to the selected project. To test whether the menu item works correctly, you can use the Eclipse debugger to step through it. But if you’re in a http://www.ddj.com
Dr. Dobb’s Journal, September 2004
71
tire project resource hierarchy and call checkFile on all header files; that is, files ending in .h. On incremental builds, Eclipse provides a resource delta that describes the changes since the last build. In this case, we call checkFile on all header files that appear as new or changed in the delta list.
Figure 3: Extensions.
Figure 4: Stylist plug-in at work.
Implementing Checking Behavior Now that the builder is hooked up, you can think about how its behavior should be implemented. Essentially, you want the builder to walk through each file that has changed since you last checked, or through all files on a project rebuild, and look for classes with public data members. When the builder finds one, it alerts users by adding a line to the Problems view. By double-clicking on that line, users can jump to where the definition is located in the file. To add information to the Problems view, you must define a marker. This is a tag that records a file offset that can be used for bookmarks or reporting errors. To define the marker, create another extension, this time to the org.eclipse.core.resources.marker extension point. Eclipse markers are arranged in a somewhat complex inheritance hierarchy. To have the marker placed in the Problems list and to show in the C/C++ Project View, subclass the marker from org.eclipse.cdt.core.problem. Doing this also lets the marker persist across Eclipse sessions. With the marker defined, you can focus on implementing the builder. The Eclipse build mechanism provides the builder with a handle to the project you need to build and, for incremental builds, a list of files that have changed since the last build. To implement the logic for checking member visibility, however, you need information about code contained
in the files. To access this information, you can take advantage of the parser provided by the CDT and of the CDT’s core interface — the C code model. Given a file, the CDT creates an instance of the code model for that file and returns an ITranslationUnit element. From that point, you can walk the code model, find the classes and member variables, and do your checking. To implement checking in the builder class, start by implementing the checkClass method. This method takes both the IFile object, which represents a file, and the IStructure object, which represents a class in that file. In the checkClass method, create an ICElementVisitor that visits the class and its children, looking for data members that have public visibility. If the ICElementVisitor finds such a member, it creates an instance of the marker and attaches it to the file object. Next, add the checkFile method, which finds all of the classes in a file and calls checkClass. This method calls the CDT to create an ITranslationUnit object that contains the results of the parse on that file. Any existing markers you’ve put on the file are removed, and another ICElementVisitor is created to walk the ICElement tree, looking for the classes declared in the file. Finally, you need to implement the build method that provides the main entry point to the builder. You can choose from two main types of build, full or incremental. On full builds, we walk the en-
Listing One
<export name="*"/> <requires>
72
Conclusion This example just scratches the surface of what you could do with the APIs offered by the CDT, including APIs that let tool vendors seamlessly integrate their tools. Over time, the CDT’s code model will become more complete and offer the ability to make changes to the code. In addition, the CDT team is developing a refactoring engine that will let you automate complex code changes across many files in a project — much like the JDT refactoring engine does today. With the dual-headed thrust of being both a multiplatform C/C++ IDE and a platform for C/C++ integrating tools, the CDT is building momentum in the industry. C and C++ development is still going strong, and the CDT is becoming an invaluable tool in the C/C++ developer’s tool belt. DDJ
<extension id="publicDataChecker" name="Public Data Checker" point="org.eclipse.core.resources.builders">
Testing and Running That’s all there is to it (see Figure 4). You can now test the builder by creating a C++ project, creating a class and a method, and playing with the visibility on that method. Just one potential problem: To run the checker, you must first build after each change. If that becomes annoying, you could always implement an alternative design. For instance, you could implement a “listener” to the Eclipse resource management system and automatically run the checking each time a file is saved. Or, you could create a menu item and force the user to manually invoke the checking.
class="org.eclipse.cdt.stylist.PublicDataChecker"> <extension point="org.eclipse.ui.popupMenus">
<menu label="CDT Stylist" path="additions" id="org.eclipse.cdt.stylist.menu1"> <separator name="group1">
Dr. Dobb’s Journal, September 2004
http://www.ddj.com
return true;
<extension id="publicDataProblem" name="Illegal public data" point="org.eclipse.core.resources.markers"> <super type="org.eclipse.core.resources.problemmarker">
} } ); } return null; } private void checkFile(final IFile file) throws CoreException { ICElement cfile = CModelManager.getDefault().create(file, null); if (!(cfile instanceof ITranslationUnit)) return; // Remove existing markers file.deleteMarkers(PROBLEM_ID, false, IResource.DEPTH_ZERO);
Listing Two
// Look for the classes in this file ITranslationUnit unit = (ITranslationUnit)cfile; unit.accept(new ICElementVisitor(){ public boolean visit(ICElement element) throws CoreException { if (element.getElementType() == ICElement.C_CLASS) { checkClass(file, (IStructure)element); // return false since checkClass already visited the kids return false; } return true; } });
package org.eclipse.cdt.stylist.popup.actions; import ... public class EnablePublicDataChecking implements IObjectActionDelegate { /** Constructor for Action1. */ public EnablePublicDataChecking() { super(); } /** @see IObjectActionDelegate#setActivePart(IAction,IWorkbenchPart) */ public void setActivePart(IAction action, IWorkbenchPart targetPart) { } IStructuredSelection selection; /** @see IActionDelegate#run(IAction) */ public void run(IAction action) { if (selection == null) return; Iterator iter = selection.iterator(); while (iter.hasNext()) try { PublicDataChecker.addBuilder(((ICProject)iter.next()). getProject()); } catch (CoreException e) { // Report the error } } /** * @see IActionDelegate#selectionChanged(IAction, ISelection) */ public void selectionChanged(IAction action, ISelection selection) { if (selection instanceof IStructuredSelection) this.selection = (IStructuredSelection)selection; else this.selection = null; } }
Listing Three
} private void checkClass(final IFile file, IStructure cls) throws CoreException { cls.accept(new ICElementVisitor() { public boolean visit(ICElement element) throws CoreException { if (element instanceof IField) { IField field = (IField)element; if (field.getVisibility() == ASTAccessVisibility.PUBLIC) { IMarker marker = file.createMarker(PROBLEM_ID); marker.setAttribute(IMarker.SEVERITY, IMarker.SEVERITY_ERROR); marker.setAttribute(IMarker.MESSAGE, StylistPlugin.getResourceString(PROBLEM_MSG)); ISourceRange range = field.getSourceRange(); int start = range.getIdStartPos(); marker.setAttribute(IMarker.CHAR_START, start); marker.setAttribute(IMarker.CHAR_END, start + range.getIdLength()); marker.setAttribute(IMarker.LINE_NUMBER, range.getStartLine()); } } return true; } }); } }
DDJ
package org.eclipse.cdt.stylist; import ... /** @see IncrementalProjectBuilder */ public class PublicDataChecker extends IncrementalProjectBuilder { /** * */ public PublicDataChecker() { } // These ids must match the ids in their respective extensions private static final String ID = StylistPlugin.ID + ".publicDataChecker"; private static final String PROBLEM_ID = StylistPlugin.ID + ".publicDataProblem"; private static final String PROBLEM_MSG = "publicDataProblem"; public static void addBuilder(IProject project) throws CoreException { IProjectDescription desc = project.getDescription(); ICommand[] builders = desc.getBuildSpec(); for (int i = 0; i
http://www.ddj.com
Dr. Dobb’s Journal, September 2004
73
ECLIPSE
Contributing to Eclipse Every programmer is a toolsmith KENT BECK AND ERICH GAMMA
E
clipse is an open-source Java platform for integrated development tools. However, Eclipse — originally developed by IBM, then donated to the eclipse.org open-source project — is more than just a Java IDE. The whole of Eclipse is built out of layers and layers of “plugins,” with only one layer specific to Java programs. While the plug-in-based architecture seems a purely technical design decision, it turns out to have important social implications:
• Every programmer is potentially a toolsmith, since extending Eclipse is both relatively easy and highly leveraged. • Distributing and updating tools is easy. • Creating opportunities for extension makes it possible for toolsmiths to enable others to further extend new tools. Kent is director of the Three Rivers Institute and author of Extreme Programming Explained. Erich is with the IBM OTI Labs (Switzerland), leads the Eclipse Java Development tools project, and is a member of the Eclipse and Eclipse Tools project management committees. He also is a coauthor of Design Patterns. They can be contacted at
[email protected] and
[email protected], respectively. 74
In this article, we examine Eclipse plugins and how they address these issues. Everything Is a Plug-In When designing IDEs, you are faced with a set of pretty nasty contradictions: • If you make it easy for others to change the internals, your interfaces and implementations have to be so general as to prohibitively expensive. You’d like to keep the system cheap by providing a fixed set of functionality, but… • You can’t possibly build all the tools most folks need because programming styles differ widely. So you have to give folks the ability to change things, but… • An IDE is big, so you need scads of upto-date documentation for potential extenders, which slows development of new features, but… • Competitors can shave off a little flexibility and introduce features faster than you do. Your attempt to publish details about your internals has slowed your ability to grow the design of the system, which also eventually chokes the flow of new features. Most IDEs slice this conundrum by providing a large, monolithic core with limited access to points where programmers can add functionality. Eclipse applies a combination of open source, careful segregation of public API and private implementation, and a pure plug-in-based architecture to provide more extensibility to programmers; see Figure 1. When Eclipse boots, all that initially loads is a small kernel that knows where to find plug-ins and how to load them. As Dr. Dobb’s Journal, September 2004
needed, Eclipse loads plug-ins that know how to interact with the operating environment, plug-ins for the basic user interface, the Java tooling plug- ins, and whatever plug-ins you’ve written.
“All of Eclipse is built out of extensions to published extension points” Obviously, since everything is a plugin, there are lots of plug-ins in Eclipse. A key challenge of this architecture is its scalability to hundreds of plug-ins. Extensions and Extension Points As Figure 2 illustrates, the menu items that appear when you pop up a menu can be extended by plug-ins. Extensions need to know where and how they appear in the system. Extension points are the dual of extensions. An extension point is like a powerstrip, giving anyone the ability to plug-in new functionality. You tell Eclipse about extensions in the plug-in manifest file. In the manifest, you describe extensions in XML. If you want to add a new item to the menu in Figure 1, you can do so by declaring Example 1 in the http://www.ddj.com
(continued from page 74) plug-in manifest. When reading this, Eclipse notes that when the time comes to pop up a menu, if an object with the type IFile is selected (defined by the attribute objectClass), it should add a menu item labeled “Hello.” When “Hello” is selected, it should create and invoke a HelloAction (defined by the attribute class), using the usual menu-action protocol. Remember that all of Eclipse is built out of extensions to published extension points and all of those points are available for you to extend. You can add a new compiler, an interface to a new code repository, or even whole new applications as plug-ins. Looked at from this perspective, Eclipse isn’t a Java programming environment (or a programming environment at all)— it is a place to declare extension points and add extensions.
plug-ins into a declarative part (the plugin manifest) and an imperative part (Java code in a JAR file). Each plug-in is represented as a directory. In the directory is a manifest file named “plugin.xml” that contains the declarative description of the plug-in. Most
Cook’s Tour The goal of the Eclipse architecture is to scale to hundreds of plug-ins and for system startup to be proportional to the number of used plug-ins and not to the number of installed plug-ins (O(1), not O(# plug-ins)). The most important architectural decision to meet this goal is splitting
plug-ins also contain code (referred to in the manifest) in one or more JAR files. All the plug-ins usually sit in the plugins directory. When Eclipse boots, it reads all the manifests and creates a map of all the plug-ins. The code is not loaded at boot time. Reading a JAR file and loading its contained classes takes significant time. When your whole environment is built out of tens or hundreds of plug-ins, reading them all at boot time would result in glacial start-up
Eclipse
Extensible IDE Plug-ins
“The goal of this architecture is to preserve fast booting independent of the number of plug-ins”
Plug-ins IDE Runtime
times and a monstrous memory footprint. Consequently, Eclipse waits until a contribution is invoked before loading the code for its plug-in. In the example, we could monitor and see that our HelloAction class wasn’t loaded until the Hello menu item was selected. Even just popping up the menu with the item on it isn’t enough to cause the class to load. By design, the manifest contains enough information to display a contribution to users without invoking any contributed code. The goal of this architecture is to preserve fast booting independent of the number of plug-ins. The cost of loading the code is spread out over the initial uses of features in the system. Little Tools Eclipse enables programming-as-toolbuilding by providing a rich set of raw materials— in this case, Java APIs— from which to construct tools. Wise tool-building programmers are powerful, though, because their time is leveraged. For instance, we were working in a Seattle hotel room the other day when we discovered we needed to know what the available built-in icons were and what they looked like. Two minutes later we had the answer. Figure 3 is the view we built that displays the standard icons and their names. In a way, Eclipse reminds us of Emacs, in that folks are writing plug-ins for email, news, Google, and chat. The resulting programs may not be any great shakes as standalone programs, but don’t have to be. By taking advantage of the fact that
Figure 1: Eclipse architecture.
Figure 4: Spider for Eclipse.
Figure 2: Extending the context menu.
Figure 3: An Eclipse view.
<extension point="org.eclipse.ui.popupMenus">
Example 1: Declaring an extension point in the plug-in manifest. 76
Dr. Dobb’s Journal, September 2004
Figure 5: Erich’s handle.
Figure 6: Using the PDE. http://www.ddj.com
they are inside a programming environment, you can create a communication tool that is more useful to programmers than any general-purpose tool, no matter how rich. There are already many plugins available (see http://www.eclipse.org/ community/index.html). Enabling The Eclipse story isn’t done with contributing to an extension point. You need to also think about how to structure your plug-in so other people can add their extensions. You do this by declaring your own “extension points.” For example, suppose Kent wrote a tool, “Spider and Eclipse,” to draw diagrams of objects. When using this tool to analyze our quick hack icon view, we can see: SampleView has a TableViewer, which is configured with our ImageContentProvider that fetches the list of icons and the ImageLabelProvider that presents an icon as an image in one column and a name in the second; see Figure 4. “Spider for Eclipse” is just one example of an introspection plug-in that helps you gain insight into Java programs. When you click on an object in Spider, a set of handles pops up around the object. In Figure 4, you see three handles attached to the TableViewer:
points paid off immediately for Kent. • Erich didn’t have to communicate or coordinate with Kent in any way to add the functionality he wanted. The declaration of the extension point and the interface that defines how a handle is treated were enough. • If someone else wants to add their own special handle, they can do so without coordinating with either Erich or Kent. • Kent, however, is deeply constrained about the changes he can make to his interface. The XML definition can be extended but not changed, and the common Handle interface can’t be easily changed at all. All this extension/extension point configuration was done with the help of the
Plug-in Development Environment (PDE). Kent could have helped Erich even further by defining a schema for the handles extension point. Then PDE would have even guided Erich when defining his handle extension; see Figure 6. Publishing All this talk of tool writing brings up another point — how do tools get distributed in Eclipse? First, the simple way of zipping up a plug-in and making it available on the Web for others to download and unzip still works. However, there is a better way. Installing plug-ins should be straightforward. In particular, given the modularity of Eclipse where everything is a plug-in and there are many plug-ins, it is critical
• Expand a field. • Expand an attribute (no-argument method.) • Delete an object. Now let’s see how to make the Spider extensible. Erich sees the Spider View and says, “Sauglatt! I’d like a handle that opens an editor on the source code of the object.” Kent has thought about this in advance. He has opened the plug-in for such extensions by defining an extension point for handles: <extension-point id="handles" name="Handles"/>
Spider’s own handles are written using this extension point as well. Erich can now implement a plug- in and declare the presence of his handle, as in Example 2. After installing and loading the new plug-in, his handle appears as in Figure 5. To implement the behavior of his handle, he writes a ShowSourceHandle class including the code like Example 3 (using the Eclipse APIs provided by the Java tooling plug-ins). Now when users click on the Show Source handle, the source code appears. Here are a few things we notice about this scenario: • Kent didn’t have to do anything specific to prepare for Erich’s extension of Spider. The work required to write the original tool as extensions of extension http://www.ddj.com
Dr. Dobb’s Journal, September 2004
77
that the set of plug-ins is still manageable. Otherwise, an Eclipse installation ends up as a morass of plug-ins. The implementation of a tool can range from a single plug-in up to dozens of plug-ins. To simplify the plug-in download, installation, and management, Eclipse has “features.” A feature consists of a set of plug-ins. Features are the unit of download and installation. Example 4 shows a simple feature description for distributing the Spider plug-in. You won’t have to remember all the details of the XML. PDE provides tools for all these plug-in development tasks. The Eclipse Update Manager (Help|Software Updates|Find and Install…) uses the feature information to install or disable a feature. By doing so, you are protected from ending up with inconsistent Eclipse installation configurations.
Once you have the plug-in written and the feature described, you can publish your tool for easy installation on an “update site.” An update site is a URL addressable location — for example, a directory on your web site (in our case, it will be http://www.javaspider.org/). The storage mechanism for an update site is simple. It consists of features and plugins packaged as JARs whose names include a version identifier (1.0.1, for instance). The content of the site is described in a site.xml file. Example 5 is a simple site.xml for an update site offering the Spider feature. Now, all you have to do to make your tool available is to create JARs for your plug-ins and the feature and update or create a site.xml file. Eclipse supports signing feature JARs. Figure 7 shows the
<extension point="org.eclipse.contribution.spider.handles">
Example 2: Declaring the presence of a handle. public void mouseUp() { String qualifiedName=selectedObject.getQualifiedClassName(); IJavaProject[] projects; projects= JavaCore.create( ResourcesPlugin.getWorkspace(). getRoot()).getJavaProjects(); for (int i= 0; i < projects.length; i++) { IJavaProject project= projects[i]; IType type= project.findType(qualifiedName); if (type != null) { JavaUI.openInEditor(type); return; } } }
Example 3: Implementing behavior of a handle.
<requires> ... ...
Eclipse Feature Updates dialog you use to install or update features. With the Update Manager, you can create bookmarks for sites you are interested in. Once you’ve selected a feature, you can install it at the touch of a button. Tools are usually continuously refined and improved. Therefore, the Eclipse update manager supports you in finding and installing the latest and greatest feature updates, including automatically checking for updates when Eclipse starts up. Conclusion We began this article with a list of conundrums for writers of development environments. How does Eclipse resolve these seemingly contradictory constraints? Clearly, Eclipse is easy to extend. Because plug-ins were used to build all of Eclipse, the extension point/extension mechanism is polished by long use. Because Eclipse is built in terms of extension points, all those points are available for tool builders to use. Eclipse is also well documented. Here is where Eclipse breaks with traditional tools. The online help provides a thorough overview of the available extension points, and Javadoc provides an outline of the APIs. Much of the burden usually assumed by printed documentation, though, is carried by the availability of source. As an extender, you will spend a fair amount of time just reading how other folks have already used an extension point. While this may not be the most perfect form of documentation theoretically, practically speaking, it’s fantastic. You know the examples are up to date and, since you use the same extension points to write tools as Eclipse uses internally, there are always examples. Finally, Eclipse comes with a full-blown Java IDE that helps you with searching and navigating the source. Acknowledgments We appreciate the gentle, nurturing editing of Andrei Weinand, Greg Wilson, John Wiegand, and John Kellerman. DDJ
Example 4: Feature description for distributing the Spider plug-in. <site>
Example 5: Simple site.xml for an update site offering Spider. 78
Dr. Dobb’s Journal, September 2004
Figure 7: Eclipse feature updates dialog. http://www.ddj.com
ECLIPSE
Tools for Domain-Specific Modeling Generating code from models STEVEN KELLY
B
y generating full code directly from models, domain-specific modeling (DSM) uses graphical modeling languages to build narrow ranges of applications 5–10 times faster than hand coding. To date, however, the only way of doing this that works is to make both the modeling language and generators domain specific. Attempts at completely generic modeling languages and generators haven’t succeeded because raising the level of abstraction always means sacrificing fine control and complete generality, for the more important end of productivity. Different camps have different views on just how specific DSM languages should be. At one end of the scale, the OMG (http://www.omg.org/) would like everyone to use unadulterated UML. However, the OMG has tacitly admitted that full code generation from UML is not going to happen, and is pinning its hopes on modeldriven architectures (MDA). At its most basic, this involves transforming one UML model into another UML model, possibly several times and possibly automatically, then automatically generating substantial code from the final model. MDA proponents envisage higher forms of MDA incorporating elements of DSM, and these may offer hope. In these, the base UML can be extended with domain-specific enhancements, or even replaced with metamodels based on Meta Object Facility (MOF), a UML subset intended for use as
Steven is chief technical officer at MetaCase. He can be contacted at steven.kelly@ metacase.com. http://www.ddj.com
a language for describing modeling languages. However, my experiences with the former have revealed that current tools lack the necessary extensibility, and no tools support the latter. In short, UML is not domain-specific, and trying to build domain-specific models in a UML tool is, at best, like trying to write English in a Spanish version of Word. The GUI labels are all wrong, and the tool keeps trying to correct your input into something valid for its language. Moreover because the tool is parsing your input according to the wrong language, any attempt at translation or code generation will be fraught with difficulties. This explains in part the lack of adoption of DSM, despite its promise — existing CASE tools simply cannot support it. Without tool support, any modeling language is largely useless, and certainly no code can be generated. Building a CASE tool for your own modeling language is prohibitively expensive. Even for a simple language, full CASE support would take man-years to build from scratch. That is where tools for building DSM editors come in. These tools let you build a completely new modeling language, editor, and code generator for a domain. The tools can be divided into two classes — code frameworks such as Eclipse’s EMF and GEF and metaCASE tools such as MetaEdit+ from MetaCase (the company I work for). The frameworks are just that: pure code, with utility functions and classes useful for building graphical CASE tools. On the other hand, metaCASE tools implement generic graphical CASE behavior, and users supply the concepts and symbols via the tool’s GUI. EMF & GEF While the main focus of Eclipse has been on its IDE for Java programmers, two major Eclipse tool projects offer help for modelers, too. The Eclipse Modeling Framework (EMF) lets you input your desired data model, then generates tableDr. Dobb’s Journal, September 2004
based editors and an XMI schema for such models. The Graphical Editor Framework (GEF) supplies functions and classes useful for specifying graphical editors for Eclipse data. Although you can use
“The Eclipse Modeling Framework lets you input your desired data model, then generates table-based editors and an XMI schema” EMF and GEF separately, building DSM support requires both — and, of course, Eclipse. There is no support for building standalone editors. The main function of EMF (http://www .eclipse.org/emf/) is to provide a data entry and storage environment following a schema that you supply. EMF does not use the OMG MOF standard, although the design of the Ecore data model (a metamodel for describing models and runtime support and reflective API for manipulating EMF objects) it uses was influenced by MOF. In the EMF team’s experience of building modeling tools, MOF was found to lack necessary features. Your schema, or metamodel, can be fed to EMF in several formats. The native format is an XMI file, but EMF can also read Rational Rose class models, annotated Java files, or XSD files, providing these follow its restrictions. 79
(a)
(b)
Figure 1: Logic example implemented in Eclipse. Based on this input, the EMF.Codegen code generator facility can generate a bare bones editor for data following the schema. The editor uses classes from the EMF.Edit framework to provide standard table and property sheet views. If the generated editor is not sufficient, you can add your own code. Providing it is marked correctly, the generator will not overwrite your code when the schema is updated, although neither will it update it to reflect the schema changes. You can also build your own code directly on top of the EMF.Edit framework, and that indeed seems to be the way many developers go. For instance, a Norwegian telecom project (http://www.pats.no/) has been working on cellphone service engineering. The original solutions used a State Machine pattern with both state actions and legal transitions hand-coded in Java. To reduce the work and improve the chances of validating the resulting system, they wanted to have the state machine represented in a model. For the simpler parts of the system, they were able to use the EMF- code- generation framework on annotated Java and even an existing XSD file. For the more tablebased state editor, they are hand-coding on top of the EMF.Edit framework. Currently, that part is 4000 lines of code and requires about another three months’ work. The rest of the editors and property sheets are another few thousand lines of code with significant parts generated. In total, the project has taken about six man-months so far. EMF only provides part of the solution for DSM — data storage, property sheets, tree, or table-based browsing, and a code generation framework. GEF (http://www .eclipse.org/gef/) provides the graphical support needed for building a diagram editor on top of the EMF framework. Diagrams bring two vital additions to the modeling experience. First, the human brain is better at quickly interpreting and remembering a graphical diagram than text, trees, or tables. Second, diagrams show multiple relationships between objects much better than text or table formats. While a tree format can show one 80
Figure 2: LED object type and symbol definition in MetaEdit+. simple kind of relationship well, it cannot handle object reuse or other relationships. GEF is not particularly designed to take advantage of EMF. The only thing they really share is their integration with the Eclipse change notification. GEF uses a Model-View-Controller pattern, where the Model can be an EMF model or something entirely different. While using GEF, the intention is thus that the Model is largely ignored and the main work happens in the Controller. Such a heavy Controller in an MVC framework is somewhat unusual. In many ways, it might have been better to establish an MVC framework entirely within GEF, with its own true Model to represent facts like the coordinates of a model element. This graphical Model could then have received change notifications from the EMF model. The GEF Controller is called an “EditPart” and, for each EMF model element class, you normally need to create a corresponding EditPart class. EditParts have a Figure, which is their graphical view, implemented in the lower level Draw2D graphical framework. Designing a symbol for your DSM language thus consists of writing Java code for its individual lines and curves, which can be painstaking work. Often, a second Figure is necessary for display when a model element is being moved; for example, to show the symbol grayed out. EditParts respond to events by way of an EditPolicy: Most EditParts require their own EditPolicy class. The job of the EditPolicy is to turn the event request into a Command. GEF uses the Command pattern to implement an undo stack— all changes to data must happen through Commands, and each Command must store its own undo information on the stack and implement an undo method. Unfortunately, while EMF also has the same pattern, they are implemented in different namespaces and cannot be used together. Instead, you must maintain EMF undo information separately from GEF undo information, and try to maintain consistency between them. Dr. Dobb’s Journal, September 2004
MetaEdit+ MetaEdit+ was built on the principle that all CASE tools are essentially the same — you can put objects on a diagram, fill in their properties, connect them with relationships, and move them around. All that really changes between different modeling languages is what the object types look like, what properties they have, and how you can connect them. The MetaEdit+ toolset includes generic CASE behavior for objects and relationships, including a Diagram Editor, Object and Graph Browsers, and property dialogs. DSM developers need only specify their modeling language: for example, creating a new object type, giving it a name and choosing which property types it has. A vector-based Symbol Editor lets you define your object and relationship symbols or reuse existing symbols. There is no need for hand coding, nor is any CASE tool code generated. The MetaEdit+ editors simply follow the defined language in a similar way to how Word follows its templates. In addition to the CASE editing functionality, MetaEdit+ includes XML import and export capabilities, an API for data and control access to MetaEdit+ functions, and a generic code generator. The code generator uses a domain-specific language (DSL) that lets you specify how to walk through models and output their contents along with other text. This makes defining code generators straightforward, with one line of a code generator definition corresponding to several lines in the scripting languages sometimes used for this purpose. As the generator has no preconceptions about the modeling language, code language, or framework the code runs on top of, you have complete freedom to produce the best code possible from the models. Comparison For comparison, I rebuilt an Eclipse sample application— a simple Logic Gate DSM language (available electronically; see “Resource Center,” page 5) that lets you connect AND, OR, and XOR gates and link them to form circuits with LED displays and voltage http://www.ddj.com
sources (Figure 1). The example is pure GEF, with no EMF for data storage or editing (there were essentially no data values to edit). While there was no code generation from the Logic models, about 5 percent of the code added on some simulation behavior. A connection wire could show its true/false status by changing color, and an LED display could show the values of its inputs. In a normal DSM scenario, it would be more likely that code could be generated from the model, and running that code could show the values with different input conditions. The GEF Java code for the sample was 332 KB (120 files, over 10,000 lines). Listing One presents part of the code for the LED display type definition, while Listing Two shows part of its display code. To make a cleanroom implementation in MetaEdit+, I looked at the resulting editor rather than its Java code. The basic concepts of the DSM language were clear from the type palette, and by playing with the example model, I found out how the gates could be connected. For instance, there was a distinction between in and out ports on each gate, and connections had to be from an out port to an in port. These kinds of rules are one thing that
distinguishes DSM editors from drawings in PowerPoint or Visio. It took about 15 minutes to specify the eight object types along with their port and connection types and rules. With my limited graphical skills, drawing and finetuning the symbols in the MetaEdit+ Symbol Editor took an additional 45 minutes. Figures 2(a) and 2(b) show the LED object type and symbol definitions. The total of one hour included building the same example model as in Eclipse: a useful step in MetaEdit+, which lets you test a language while building it. The resulting editor (Figure 3) was essentially identical to that in GEF, apart from omitting the simulation behavior. That could be added using the MetaEdit+ API with no more code than it took in GEF. Graphical behavior in MetaEdit+ was somewhat better, as connections followed objects as you dragged them (GEF showed only the object outlines while dragging). MetaEdit+ also included all its other behavior: browsers, XML import/export, HTML and Word export, multiuser repository, and so on. Conclusion The EMF framework offers a good solution if you want to add a measure of nongraphical modeling into the already strong
Listing One /* Abridged version of LED.java, showing parts related to property value. * Whole file is 3 times as long. In total, the 4 LED classes are 663 lines. */ package org.eclipse.gef.examples.logicdesigner.model; import import import import
org.eclipse.gef.examples.logicdesigner.LogicMessages; org.eclipse.ui.views.properties.IPropertyDescriptor; org.eclipse.ui.views.properties.PropertyDescriptor; org.eclipse.ui.views.properties.TextPropertyDescriptor;
public class LED extends LogicSubpart { public static String P_VALUE = "value"; protected static IPropertyDescriptor[] newDescriptors = null; static{ PropertyDescriptor pValueProp = new TextPropertyDescriptor(P_VALUE, LogicMessages.PropertyDescriptor_LED_Value); pValueProp.setValidator(LogicNumberCellEditorValidator.instance()); if(descriptors!=null){ newDescriptors = new IPropertyDescriptor[descriptors.length+1]; for(int i=0;i<descriptors.length;i++) newDescriptors[i] = descriptors[i]; newDescriptors[descriptors.length] = pValueProp; } else newDescriptors = new IPropertyDescriptor[]{pValueProp}; } public Object getPropertyValue(Object propName) { if (P_VALUE.equals(propName)) return new Integer(getValue()).toString(); if( ID_SIZE.equals(propName)){ return new String("("+getSize().width+","+getSize().height+")"); } return super.getPropertyValue(propName); } public void resetPropertyValue(Object id){ if (P_VALUE.equals(id)) setValue(0); super.resetPropertyValue(id); } public void setPropertyValue(Object id, Object value){ if (P_VALUE.equals(id)) setValue(Integer.parseInt((String)value)); else super.setPropertyValue(id,value); } }
Figure 3: Logic example implemented in MetaEdit+. Eclipse development environment. GEF is a good basis for organizations wanting to build a graphical editor that does not follow the normal pattern of CASE tool behavior; for example, a GUI design tool. By also supporting such behavior, however, it misses the chance to offer optimal support for standard CASE behavior. In short, CASE tools can be built with GEF, but the amount of coding necessary isn’t insignificant. DDJ
*/ protected void paintFigure(Graphics g) { Rectangle r = getBounds().getCopy(); g.translate(r.getLocation()); g.setBackgroundColor(LogicColorConstants.logicGreen); g.setForegroundColor(LogicColorConstants.connectorGreen); g.fillRectangle(0, 2, r.width, r.height - 4); int right = r.width - 1; g.drawLine(0, Y1, right, Y1); g.drawLine(0, Y1, 0, Y2); g.setForegroundColor(LogicColorConstants.connectorGreen); g.drawLine(0, Y2, right, Y2); g.drawLine(right, Y1, right, Y2); // Draw the gaps for the connectors g.setForegroundColor(ColorConstants.listBackground); for (int i = 0; i < 4; i++) { g.drawLine(GAP_CENTERS_X[i] - 2, Y1, GAP_CENTERS_X[i] + 3, Y1); g.drawLine(GAP_CENTERS_X[i] - 2, Y2, GAP_CENTERS_X[i] + 3, Y2); } // Draw the connectors g.setForegroundColor(LogicColorConstants.connectorGreen); g.setBackgroundColor(LogicColorConstants.connectorGreen); for (int i = 0; i < 4; i++) { connector.translate(GAP_CENTERS_X[i], 0); g.fillPolygon(connector); g.drawPolygon(connector); connector.translate(-GAP_CENTERS_X[i], 0); bottomConnector.translate(GAP_CENTERS_X[i], r.height - 1); g.fillPolygon(bottomConnector); g.drawPolygon(bottomConnector); bottomConnector.translate(-GAP_CENTERS_X[i], -r.height + 1); } // Draw the display g.setBackgroundColor(LogicColorConstants.logicHighlight); g.fillRectangle(displayHighlight); g.setBackgroundColor(DISPLAY_SHADOW); g.fillRectangle(displayShadow); g.setBackgroundColor(ColorConstants.black); g.fillRectangle(displayRectangle); // Draw the value g.setFont(DISPLAY_FONT); g.setForegroundColor(DISPLAY_TEXT); g.drawText(value, valuePoint); }
Listing Two /* Abridged version of LEDFigure.java, showing just unselected symbol display. * Whole file is 4 times as long, plus 65 lines for dragged symbol display.
http://www.ddj.com
Dr. Dobb’s Journal, September 2004
DDJ 81
PROGRAMMING PARADIGMS
Fahrenheit 411 Michael Swaine
W
hat is the temperature of information? Can an information-rich communication ever really be cool, or does anything worth saying necessarily raise the temperature of the discussion? Is controversy proportional to content? Is everything interesting inherently political? Did Marshall McLuhan answer all these questions in one of those cryptic books he wrote back in the 1960s? Is Ray Bradbury going to sue me over the title of this month’s column? Is Michael Moore? A reader recently took me to task for sneakily slipping my political views into this otherwise strictly technological column. He was objecting to the way I introduced sections of a particular installment of this column with an only vaguely relevant “Bushism.” I think he was overreacting. As I see it, the unintentionally humorous sayings of the President of the United States are entertainment, not politics. “But,” that same reader could reply, “since 9/11, everything is changed.” No he couldn’t. Not since the establishment of a corollary to Godwin’s Law forbidding the use of that and several other 9/11 memes. (Godwin’s Law: “‘As a Usenet discussion grows longer, the probability of a comparison involving Nazis or Hitler approaches one.’ [O]nce this occurs, that thread is over, and whoever mentioned the Nazis has automatically lost whatever argument was in progress”; The Jargon File). The online magazine Slate seems all twisted up about Bushisms, too. It recently published a series of “Kerryisms.” The editor may have been trying to be fair and balanced, or may have been making a postmodernist joke about jokes, or may have thought that if Bushisms are funny then Kerryisms ought to be, too. Well, they’re not. John Kerry is not, as a rule, unintentionally funny. He’s unintentionally boring. There’s a difference, and it’s one you want to keep a sharp eye on if you edit a magazine, even an online one, and don’t want readers to think that your points are missing or your plugs are fouled. Being unintentionally funny or unintentionally boring doesn’t make one a bad person or even a bad President, but only one of them makes for entertaining reading. Michael is editor-at-large for DDJ. He can be contacted at
[email protected]. http://www.ddj.com
But the reader did have a point: I got more fun out of recounting those Bushisms than I would have if my political views were closer to, say, Karl Rove’s. I think that the best way for me to avoid sneakily slipping my political views into the column is to state them baldly once and for all so that readers can filter out the politics. I’ll do so at the end of the column. But will you really be able to filter out the politics? Can you ever? The ways we form judgments and make decisions are awfully subjective and error prone. It’s tempting to think that the whole enterprise badly needs the insights of people who spend most of their waking hours producing artifacts that make decisions more or less flawlessly— computer programmers, for instance. Maybe Real Life just needs a few good algorithms. I want to share with you, FWIW, some of my recent reading on how people make decisions and how technology might help them make better decisions or at least make their bad decisions more efficiently, and how maybe you can’t remove politics from the implementation of such technology. Words As Actions “I devoutly believe that words ought to be weapons.” — Christopher Hitchens, who writes “Fighting Words” for Slate
Many years ago, I came across the engaging little book How to Do Things with Words by J.L. Austin (Oxford University Press, 1962). “Engaging it may be,” you may say, “but I don’t believe in long engagements.” Touché, I’d say, if you were to say that. I do confess to an annoying habit of recommending books of hernia- inducing heft — either that or books with prose that moves along with all the grace of John Kerry on his voting record or George Bush on a bicycle. How to Do Things with Words is academic and archaic, but it is also short. Another argument in its favor is that it is based on Austin’s William James Lectures at Harvard, so it was written with a view to keeping undergraduates awake. In this little (only 166 pages!) book, Austin is challenging what he calls an ageold assumption in philosophy “that to say Dr. Dobb’s Journal, September 2004
something…is always and simply to state something.” Of course, some utterances are obviously not stating anything: They are questions or commands or exclamations. But Austin isn’t talking about them. He means that some utterances that have the form of statements are not — or at least are not simply — statements. Sometimes, he says, to say something is to do something. Getting back to engagements and following that process to its natural conclusion, there is the act of marrying. (Which, this year, is a hot political issue; but then, this year, what isn’t?) To say “I will” in the right context is to perform an act: to marry. Say it and you’ve done it. Austin calls such verbal acts “performatives” and identifies many of them. Christening. Bequeathing. Betting. Guessing. Rendering a verdict. Promising. Apologizing. Congratulating. Conceding. Authorizing. Warning. Notifying. Ordering someone to do something. Ordering pizza. Naming a child. Challenging someone to a duel. Voting. Vetoing. Repealing. Appointing someone to an office. Declaring war. These verbal acts have some peculiar attributes that plain old statements don’t have. Many of these involve prerequisites that must be satisfied for the speech acts to be considered to have taken place. Saying “I will” doesn’t get you married if you’re already in that blessed state. (Blessing: another performative.) Others have to do with obligations and other social pressures incumbent on the performance of the speech act. Promising, for example, places one in a system of social obligations and expectations — marrying even more so. It may have been necessary for Austin to point out this aspect of language, but I don’t think that we would consider his point particularly controversial. It is true that the use of words can perform actions, but this is pretty obvious to most of us. And it is a good thing, too, if you are a professional programmer, because what you do for a living is to perform actions by uttering, or anyway typing, words. Programmers most certainly know How to Do Things with Words. In fact, one can cast Austin’s distinction between performatives and declarative statements in terms of the procedural and declarative programming paradigms. In programming, the procedural seems more fundamental to most people, and the 83
declarative paradigm seems a somewhat roundabout way to perform actions. Well then, is every use of words a performance of an act? (Beyond the obvious act of uttering the words?) No, Austin doesn’t go that far. But others, influenced by the example of computer programming, might be tempted to. The Tyranny of Categories Human Values and the Design of Computer Technology, edited by Bayta Friedman (Cambridge University Press, 1997; ISBN 15758608050), is not a book that even I would call engaging, but it has its moments. At one point, the temperature of the information gets pretty high, as a legendary computer scientist gets into an argument with a Principal Scientist at Xerox PARC and accuses her of imputing sinister intentions to him and implying that he wants to impose regimes of enforced discipline on innocent knowledge workers. Hot stuff. The legendary computer scientist is Terry Winograd and the magilla causing the dustup is something called “speech act theory,” which is the conceptual base of Winograd and Flores’s decision-support system, called “The Coordinator.” What The Coordinator does is facilitate working together in groups, for example, software
projects. Part of the way it works involves defining categories of speech acts, like Promising or Accepting. When communicating via The Coordinator, as I understand it, you don’t just chat aimlessly, you organize communications inside these categorical frames of Requests and Promises and Acceptances and Withdrawals and Reneges. All of which are speech acts — what Austin called “performatives.” And the software imposes constraints consistent with the kind of speech act you’re committing. A Promise puts you in a situation where only certain specific actions are open to you. (“You are in a room; doors lead to the left and right…” Well, not that exactly.) The connection with Austin is pretty direct: Speech act theory grows out of Austin’s work as extended by John Searle. But does it push Austin’s views to the extreme of treating all utterances as actions? The PARC Principle Scientist thinks so. Her name is Lucy Suchman and her point, in essence, seems to be that the categories in terms of which we converse exert a lot of control over the directions our conversations can take — a version of linguist Benjamin Whorf’s famous hypothesis that language determines thought — so if you make people converse through a computer system and you define the conversational categories that the system imple-
ments, then YOU have a lot of control over the directions the conversations can take. Categories have politics, she says. Winograd’s response boils down to saying that if you’re going to build a computer system to deal with any real-world problem, you darned well have to use hardedged categories that may fail to perfectly capture the messiness of the real-world problem. You don’t get far in computersystem design or any rational enterprise until you come up with clearly defined categories. And Lucy Suchman shouldn’t be calling him a tyrant for simply being clear. I think that Winograd misses Suchman’s point, although I don’t know how useful that point is. She is saying, in part, that human communication is a lot richer than The Coordinator’s categories allow for. Could a system for collaborative work allow categories to emerge from the communication process in the group, in some organic way, rather than being imposed from outside? I don’t know how you’d do that, but it would be consistent with the point of another recent book, which argues that groups are sometimes not as stupid as we think. The Wisdom of Crowds The New Yorker economics editor James Surowiecki’s The Wisdom of Crowds: How the Many Are Smarter than the Few (Random House 2004; ISBN 0385503865) asserts that “under the right circumstances, groups are remarkably intelligent, and are often smarter than the smartest people in them.” He thus challenges the conventional wisdom that crowds are stupid. This got the attention of Slate, The Economist, Wired, Forbes, and so on. His argument also appeals to the Smart Mobs crowd. (For the lowdown on Smart Mobs, see Howard Rheingold’s Smart Mobs, Perseus Publishing, 2002; ISBN 0738206083, or http://www .smartmobs.com/.) Howard Rheingold says “smart mobs emerge when communication and computing technologies amplify human talents for cooperation.” But Surowiecki’s point is not the same as Rheingold’s, and he’s not proposing anything as trippy as “emergent group intelligence.” What he is asserting is pretty concrete and testable. Given certain conditions, for certain kinds of decisions, groups can make better decisions than any of their members. And he cites a lot of evidence. So what are the conditions, what are the appropriate kinds of decisions, and what is the evidence? There are four conditions that must be met: • Diversity of opinion. This increases the likelihood that the best answer will even be noticed. If the best idea isn’t even on the table, there is little chance that it will be chosen.
84
Dr. Dobb’s Journal, September 2004
http://www.ddj.com
• Independence of members. This is key: It gets probability on your side. • Decentralization. It’s essential than there be no dictator directing the process toward a predetermined answer. The best answer needs to have a chance to emerge. • A good method of aggregating opinions. Given that these conditions are met, Surowiecki claims that groups can be smarter than their members in several broad decision-making contexts: • Cognition. Groups are smart in situations where there is an objective correct answer, like guessing the number of jelly beans in a jar. • Coordination. Groups can be very good in situations where success requires members to subjugate their own self interest to the interests of the group. • Coordination. Groups can be extremely good in situations like buyers and sellers finding one another and finding the best price for a transaction. Groups, however, are not good in situations where skill is required. The reason that groups can often be better than experts, he says, is that the expert can’t have the breadth of viewpoint of the group. So what’s his evidence? Well, he’s very good at coming up with convincing examples, and they are not merely anecdotes. His examples tend to be specific cases in which the untutored group clearly did beat the educated expert, and did so in a statistically significant way. One example: On the TV show “Who Wants to Be a Millionaire,” contestants can call on experts to help them answer questions or poll the studio audience. In the sample taken, the experts had a 65 percent success rate; the studio audience was right 91 percent of the time. Surowiecki presents so many such examples that you start to think there must be something to his argument. The problem is selection bias. Are these cases typical? His argument can’t be evaluated without controlled experimental studies. But Surowiecki is arguing a case, not proving it, and that’s something worth doing. His conclusion, if it proves to be correct, would be very pleasingly populist. It would also be very useful. If we know the conditions under which groups are smart, we can structure decision-making situations appropriately, bringing in experts when appropriate and presenting the right kind of questions to the group when that’s the best way to go. It could improve decision-support programs like Winograd’s. And it could direct us away from building voting machines and toward building systems that help citizens exercise their collective wisdom. http://www.ddj.com
Government, Morality, and Politics I promised to reveal my political views, so here goes: 1. I am a small- d democrat. I think it would be nice if it turned out that the public at large had some kind of innate wisdom, but I consider that question to be independent of the democracy question. I believe that democracy, direct or representative, is the only valid form of government, whether it works or not. Anything else is tyranny. 2. I regard human beings as fundamentally good. I carefully didn’t say that I believe this; I suppose I could be swayed purely intellectually by an argument to the contrary, but it wouldn’t affect my actions. I am a human being, my values derive from my humanity, and to regard humans as inherently bad is perverse, IMHO. 3. I believe in the possibility of progress. I think that human effort brings into the world rich new ideas and artifacts that enlarge our world. It’s thermodynamic: Life, and especially intelligent life, is a local decrease in entropy. 4. I think that morality, government, and politics are all about how we can live in a world with other people. (I don’t deny that there are value judgments to be made regarding our treatment of animals and the earth and our own bodies and minds, but I think it’s helpful to separate these from our treatment of other people. Get that right and the rest is a corollary.) 5. I think that morality is the two-part acknowledgment that other people are people and, therefore, like us and that they are other and, therefore, not like us. One insight gives us the Golden Rule, the other gives us tolerance. We need both, and the wisdom to know which applies when. 6. I think that government is the acceptance of our joint responsibility for those aspects of the world that we share. There will always be disagreement about how much of experience is shared and how much is personal and we have to manage to accommodate these differing views without killing one another. How we do that is called “politics.” 7. I think that politics is the practical art of government. And since the other people with whom we share the world are other people, it is always and essentially the art of compromise. I like what Tony [“Angels in America”] Kushner says about politics: “It’s not an expression of your moral purity.” Which is why I won’t be voting for… No, let’s leave it at that. DDJ Dr. Dobb’s Journal, September 2004
85
EMBEDDED SPACE
RTECC Boston Ed Nisley
T
he local IEEE section toured Central Hudson’s Dashville Hydroelectric Plant, a small generating station on the Wallkill Creek near New Paltz, New York. The plant dates back to the 1920s and, apart from the monitoring instruments, the equipment is entirely original. A pair of hulking Allis-Chalmers generators, each rated at 1500 kVA, produce about 20 GWh/year (see Figure 1). Some years ago, a faulty circuit breaker back-fed grid power into the generators after a shutdown, prompting their first complete disassembly since installation to replace the burned-out windings. More recently, a dam renovation increased the plant’s annual output by raising the average water level a few feet and reducing the time required to clean the intake filters. Other than those interruptions, the generators have spun more-or-less continuously for eight decades. A few weeks earlier, I attended the RealTime and Embedded Computing Conference (http://www.rtecc.com/), which was held in the Sheraton Framingham. If you’ve ever driven I-90 west of 128/I-95, that’s the ersatz castle just south of the MassPike. The exhibits were in a ballroom on the main level, with lectures and presentations in what would be the dungeons if the architects had truly gotten into the spirit of the thing. The contrast between heavy-iron generators and heavy-duty embedded components was striking, with my money on the former for durability. Here’s my take on some interesting aspects of the show. The End of Busing Intel presented a talk on how to get from PCI and PCI-X to PCI-Express, the nextgeneration, in-the-box bus. A little background is in order to understand how we got into the current mess and why it’s time for something completely different. IBM introduced its Personal Computer in 1981 with a byte-wide backplane bus that matched the 4.77-MHz 8088 CPU. By 1984 the backplane widened to accomEd’s an EE, PE, and author in Poughkeepsie, NY. Contact him at ed.nisley@ieee .org with “Dr Dobbs” in the subject to avoid spam filters. 86
modate the 80286’s two-byte interface (plus more address and control lines) and became known as the ISA bus, for Industry Standard Architecture Bus. The ISA bus begat late-80s evolutionary dead ends such as the VESA Local Bus (VLB), Enhanced ISA (EISA) Bus, and the Microchannel Architecture (MCA) bus. The now-ubiquitous Peripheral Component Interconnect (PCI) bus, introduced in 1992, had sufficient bandwidth to withstand nearly a decade of Moore’s Law. The PCI data rate tops out at 132 MB/second for the common 32-bit interface found in most PCs and 533 MB/second for the 64-bit version in servers and high-end workstations. While that sounds like a lot, the IEEE 1394 (aka Firewire) interface runs at 100 MB/second, USB 2.0 gizmos hit 60 MB/second, and Serial ATA disk drives can reach 188 MB/second. A single PCI bus cannot keep up with those peripheral data rates, even if you figure most gizmos average only a fraction of their theoretical (aka advertised) rates. The PCI-X extensions to the PCI specification bump the peak data rate to just over 1 GB/second, but with a curious side effect: The “bus” becomes a point-to-point link. While the original PCI bus could accommodate up to four 32-bit or two 64bit devices, with a tree of bridges linking several of those buses together to get enough I/O slots to be useful, this is a far cry from the ISA bus limit of half a dozen or so slots in parallel. Admittedly, anybody foolish enough to stuff an ISA box full of cards encountered power, cooling, and data-integrity problems, but the concept was workable. The new PCI-Express specification transforms the byte-parallel PCI-X bus into a bit-serial, point-to-point link with a transfer rate of 2.5 GT/second (“T” for transfers) in each direction across two differential pairs of wires, with each transfer moving 1 bit. Surprisingly, the issue here is largely economic: The cost of designing, laying out, and building a circuit board depends strongly on both the maximum signal frequency and the number of conductors carrying that frequency. A very high transfer rate on a few conductors is far more affordable than a lower rate on many conductors. Dr. Dobb’s Journal, September 2004
A crude rule of thumb invokes radiofrequency design rules when a conductor’s length exceeds 20 percent of the wavelength of the highest frequency in use. The wavelength of a signal in free space is simply the speed of light divided by the frequency: λ=c/ƒ The speed of light is 300×106 meters/second or 12×109 inches/second, so, in round numbers, a nanosecond is a light-foot. Electromagnetic signals propagate about half as fast along circuit board traces than in free space, so their wavelength is twice as long. Digital signals (bits!) have abrupt transitions that require frequencies a factor of five or more higher than their nominal frequency. Taken together, those effects make the critical distance on the circuit board about 10 percent of the corresponding free-space wavelength. PCI-X runs at 133 MHz, with a critical circuit-board dimension of about 23 cm or 9 inches. PCI-Express transmits 250 MB/second over a pair of wires carrying low-voltage differential signals, decreasing the critical length to about 2 inches. Although components are getting smaller, you can appreciate the challenge of placing many connectors no more than a few inches from their bus driver chip! Those Dashville generators will still be spinning when PCI becomes as relevant as ISA, but ISA lives on in the embedded world: It’s just packaged as PC-104. The embedded world moves faster than the power industry, but much slower than the PC world. A readable overview of bus technology and PC-Express is at http://www.dell.com/ downloads/global/vectors/2004_pciexpress .pdf. An Intel PCI-Express intro is at http:// www.intel.com/technology/pciexpress/ devnet/docs/WhatisPCIExpress.pdf. OS Wars, Redux Montavista Software (http://www.mvista .com/) described some New Developments in Embedded Linux, including the observations that “Linux has grown up,” the homebrew operating system is dead, and per-unit royalties are history. In their view, a Linux system with 8 MB of RAM and 8 MB of Flash is doable, but http://www.ddj.com
more of both is highly desirable. Conversely, flensing out chunks of Linux to shrink its footprint is also doable, but to the extent that you eliminate vital pieces such as filesystems and communication stacks, you lose the advantages that drew you to Linux in the first place. The embedded distros may be lighter weight than their desktop compatriots, but they’re not svelte. Linux requires much more hardware than is found in the bulk of today’s embedded systems, making it too bulky for the high-volume microcontroller market. Above that level, though, it’s assimilating everything, as evidenced by the number of real-time OS vendors introducing either Linux-compatible interfaces, embedded Linux distros, or both. In related news, the open-source Eclipse (née IBM Websphere) development platform is Borging the IDE world, printf-style debugging is obsolete, and the time-tomarket for new systems is down to 12 months. The last fact seems to be driving the first two, as a system sufficiently complex to require Linux can pose a significant debugging challenge and it seems nobody wants to maintain Yet Another Proprietary IDE. LynuxWorks (http://www.lynuxworks .com/) regards Linux as a soft-real-time OS at best, even with the scheduling and latency improvements in the 2.6 kernel (more at http://www.lynuxworks.com/ products/whitepapers/linux-2.6.php3). They offer LynxOS, a genuine hard-realtime OS, as well as BlueCat Embedded Linux. The fact that LynxOS is “nonGPL encumbered” may be a deciding factor for some, although I’d hesitate to predict which way the GPL would drive the decision. Both Linux-oriented sessions took place in a small room, with 40 – 50 standingroom-only attendees. Down the hall, Microsoft sessions on Windows XP Embedded and Windows CE .NET, in a much larger room, each attracted about 20 people. It seems that embedded-system designs, at least for systems with stringent performance requirements, do not require many Windows-specific features. Hardware Gets Soft The process of creating software has always suffered in comparison with hardware methods, but that may well be changing. Unfortunately, hardware is becoming more like software, not the converse. Spectrum Signal Processing (http:// www.spectrumsignal.com/) gave a talk that was actually interesting despite its title. Ready? “Radio Waveforms and the SCA: Instantiating SDR Waveform Applications Across Heterogeneous Processing Elements (FPGAs, DSPs, and GPPs).” Got that? http://www.ddj.com
First, let’s untangle the acronyms. SDR= Software-Defined Radio, SCA=Software Communications Architecture, FPGA= Field-Programmable Gate Array, DSP=Digital Signal Processor, and GPP=GeneralPurpose Processor. A classic radio translates between radiofrequency (RF) energy and audio or digital signals through a chain of analog circuitry. That hardware tends to be both bulky and finicky, requiring complex alignment and temperature compensation. When you look inside a wireless phone, you’ll find most of the parts are stuck on the analog end of the board, with a few big epoxy slabs comprising the entire digital section. Radios must use RF frequencies dictated by the physics of their application. Wireless phones operate at about 1 or 2 GHz because they operate within line-ofsight of the base station and must have small antennas. Shortwave stations (yes, shortwave lives!) operate around 10 MHz because those frequencies bounce off the ionosphere on their way around the Earth. Although various government rules and regulations also affect the frequencies available for use, those tend to follow the laws of physics. Radios typically translate the external RF signal into an intermediate frequency (IF) that’s chosen to simplify the rest of the circuitry. Complex radios may use three or more different IFs in succession. Each IF stage requires several analog components, ending with circuitry that converts the final IF into voice or data. The same process applies in reverse to transmit a signal: Voice or data passes through a modulator, one or more IF stages, then a power amplifier drives the signal through the antenna. A software-defined radio has a single IF stage followed by a digital-to-analog converter (or two, depending on the application), replacing all the fiddly analog components with digital signal processing algorithms. Changing the type of signal, even from voice to data, requires nothing more than changing the algorithm. Those algorithms may execute as software in a GPP or a dedicated DSP but, for the highest performance radios, they’ll be implemented directly in gate-array hardware. The problem that SCA solves is allocating the hardware, loading the algorithm configuration into the gate arrays, and generally lashing all the machinery together. To my eyes, it looks a lot like an operating system. There’s even a CORBA server buried inside. However, if you thought debugging software was bad, just imagine tracking down problems in hardware that’s defined by software, with all the key signals existing as evanescent bit streams. It can be done, Dr. Dobb’s Journal, September 2004
but the value of getting things right the first time is increasing dramatically. Analog Matters Analog circuitry hasn’t vanished completely and, in fact, it’s more vital than ever before. Diamond Systems (http://www .diamondsystems.com/) presented Data Integrity in Embedded Analog Measurements, pointing out obscure problems that can completely invalidate your measurements without any obvious indication of trouble. In principle, an analog-to-digital converter samples its input voltage and presents a digital equivalent to the DSP code. If all voltage sources were ideal, all cables were schematic-perfect lines, and all switches behaved properly, what goes in would come out unchanged. Many errors are second-order, at best, making them easy to miss when you’re designing (and debugging!) a system. Most sensors produce voltages through a reasonably low output resistance. When the ADC has a reasonably high input resistance, most of the voltage appears at the ADC and the system works as you’d expect. Some sensors have a high output resistance that absorbs some voltage from the ADC and produces low digital readings, but when you disconnect the ADC to check what’s going wrong, your veryhigh-impedance voltmeter says everything looks fine. You can fix that with a software gain adjustment after you find it. Assuming you do find it, that is. The cable between the source and ADC can cause a more subtle error. The capacitance between the signal and common conductors forms a low-pass filter in combination with the source and input resistances. Because a low-pass filter attenuates only high-frequency signals, you
Figure 1: The Dashville hydroelectric plant sports an unmistakable early industrial chic. Apart from new copper windings and bearing maintenance, the generators are all original equipment. The lubricating oil pumps in the right foreground have been running forever, according to our guide. 87
won’t notice anything wrong with slowly changing inputs. When you apply higher frequency signals to the same circuitry, some things don’t reach the far end — sharp edges and narrow spikes. Maybe that’s what you intended, but if you expected an accurate digital copy of the input, you’re in for some protracted debugging. Nobody suspects the cables, at least for the first few days. Do you? Many data acquisition systems route multiple analog inputs to a single ADC through analog switches. You must select the proper input, then trigger an ADC cycle in order to read a voltage, a process that works for low sampling rates. The ADC’s input bandwidth limits the maximum switching rate if it cannot recover from one sample voltage before the next is applied. Surprisingly, many ADC chip datasheets don’t specify the input bandwidth, perhaps because they assume one ADC samples one analog signal with no switching. The effect appears as crosstalk between separate channels, but the interfering signal is a low-pass-filtered version of the original. It might take quite a while to figure out where that rumble originates. Even in this digital age, analog skills remain marketable. If you’re having trouble with your data collection system, you need
88
somebody who knows which end of a scope probe to use. Reentry Checklist The between-sessions and over-lunch chatter revealed another issue: the visceral dislike (perhaps that’s putting it too mildly) of Microsoft and its offerings in any and all contexts. I heard, from several directions, that technical folks have been instructed to find a way to simply not use Microsoft products, regardless of the cost or inconvenience. That would explain the attendance figures I observed, as well as a factoid from last year’s ESC/SD shows: Dot-Net books weren’t leaping off the shelves in expected numbers. I admit to a completely unscientific survey methodology and no hard facts. However, what I don’t hear is that Microsoft is winning the hearts and minds of the technical staff or management. Thanks go to our Dashville plant tour guides, Fred Laurito and Larry Sauter, for a fascinating look into century-old hardware that’s still getting the job done. The plant’s computerized monitoring system, on the other hand, has become irreparably obsolete after about a decade. The picture comes from my Zire 71 PDA, which incorporates the worst camera you wouldn’t throw across the room
Dr. Dobb’s Journal, September 2004
in disgust. It’s valuable for those times when “I wish I had a camera,” but it’s useless for critical images. DDJ
Dr. Ecco Solution Solution to “Smooth As Ice,” DDJ, August 2004. 14 13 4 3
1
5
7 8
31
20 9
Start
16
33 32
21 10
25 34
6
22
12 11
2
24 15 23
19
35
27
36
17
37
18 30
26
28
29
End
Zamboni Solution: No path is traversed more than once, though some nodes are. There are 37 nodeto-node traversals and 32 nodes. The theoretical minimum is 31 none-to-node traversals. Remember that the machine cannot turn more than 45 degrees at each node.
http://www.ddj.com
CHAOS MANOR
Infection, Prevention, and Remedy Jerry Pournelle
T
he other night at the opera, our dinner guests (a lawyer and a management consultant) both complained that their computers had slowed to a crawl and they were getting a million unwanted popup advertisements, often for sex toys and other embarrassing things. It wasn’t a crucial embarrassment, but it was annoying. “Ad-Aware,” I said. “Google to find AdAware, download that, run it, and tell me the result.” (http://www.lavasoftusa.com/ ) That was a Thursday, opening night of Il Trovotore, with what I thought were new sets but the Los Angeles Times says we’ve seen them before. Good production, moved fast, no pauses for set changes. They left the rather pointless ballet in, but it was staged well, and I like ballet in opera, although I gather I am in a distinct minority on that. Anyway, we saw our friends at church Sunday, and they were effusive in their gratitude. They each found something like 25 unwanted running processes, over 100 registry keys, and another hundred or so data mining or tracking programs. Ad-Aware eliminated them all. “It’s like we have new machines.” I can well believe it. The first time we ran AdAware on my wife’s machine, the results were similar. A good half the computer cycles were eaten with unwanted tracking processes, and she couldn’t go to the Internet without getting dozens of advertisements, some for the most anatomically amazing stuff. Ad-Aware halved the time it took to compile her reading program, and was even more effective in speeding up Internet access and browsing. So what’s going on here? What are these programs, how do they get on your machines, and do they do you any good? From http://www.spychecker.com/ spyware.html, which is as good a source of information as any: Spyware is Internet jargon for Advertising Supported software (Adware). It is a way
Jerry is a science-fiction writer and senior contributing editor to BYTE.com. You can contact him at
[email protected]. http://www.ddj.com
for shareware authors to make money from a product, other than by selling it to the users. There are several large media companies that offer them to place banner ads in their products in exchange for a portion of the revenue from banner sales. This way, you don’t have to pay for the software and the developers are still getting paid. If you find the banners annoying, there is usually an option to remove them, by paying the regular licensing fee.
The problem is that they don’t always give you an option to remove them, and the spyware programs gather information about your web-browsing habits and “phone home” to report it. Other programs take that information and feed you popup advertisements of things they think you want. In theory they don’t send you anything you haven’t given some indication of wanting. In practice, if you ever got to a pornographic web site — and it’s not hard to do, because pornographers are forever taking names as close to a popular web site name as they can get, and sending you e-mail of offers that have nothing to do with what they are really selling — then you will be put on a list of people who want porn, and you’ll get plenty of it. My wife’s site is always infected with the stuff, as well as advertisements for enlargement of organs she doesn’t possess. All this is complicated by tamer renditions of Adware that send you advertisements you didn’t want, but don’t track what you’re doing and phone home to report it. There have been lawsuits over whether a particular kind of Adware is Spyware or “malware.” I don’t propose to be part of that debate. If I use software, I expect either to pay for it or to have been given a reviewer’s key. If I use it and review it and like it enough to keep using it, I often pay for it anyway out of sheer gratitude. If the only way I can use shareware or freeware is to load up my machine with gunk I didn’t consent to and don’t want, and endure endless exhortations to visit places I don’t want to go and buy stuff I wouldn’t use for anything, then I’ll do without. Dr. Dobb’s Journal, September 2004
For more on this, see Steve Gibson at http://grc.com/optout.htm. Gibson has a strong viewpoint that I don’t always agree with, but if the subject interests you then you need to understand his argument — and he offers programs that offer fairly drastic remedies. And whatever your view of Adware, there are times when you don’t want it: With 20 processes running in the background of your machine can slow to a halt, and you may not want a lot of the popup ads for books, movies, cookware, marital aids, cheap clothes, or sex-starved housewives. Ad-Aware will fix most of that. SpyBot Versus Ad-Aware In addition to Ad-Aware there’s Spybot Search and Destroy (http://www.safernetworking.org/), a program that does much the same thing as Ad-Aware. Actually the two are complimentary, in that each finds a few things the other did not. I have never had either of them miss a running process, but I have had Spybot find a couple of registry keys missed by Ad-Aware and vice versa. Since both these programs are free there’s no reason not to have and employ both. In addition, Spybot offers a prevention program that seems to get a lot of this stuff before it installs. Again, not all: I have had Ad-Aware find registry keys that shouldn’t be there despite having prevention programs installed. Infection, Prevention, and Remedy So how does this stuff get into your computer in the first place? The short answer is that you give it permission to install itself; so in theory you could always refuse such permission, and never have either adware or spyware. In practice it’s hard to avoid. For example, you go to a web site for information. It offers you a graph. To see the graph you have to allow popups or ActiveX controls to run. You have no reason to mistrust the web site so you allow that. In comes the malware. 89
Or you visit a web site to get a new utility you’ve seen favorably reviewed, download and install it, and it works fine — but it loaded in a spybot. Had you read the license agreement you agreed to when you installed the software, you would have seen somewhere in there a murky statement that, if you paid enough attention, would in fact be giving them permission to put identification and tracking software on your machine. Or you’re just Googling around for information and find your way to an odd web site that wants to show you something interesting, and since you have good antivirus software you let it do that, and since spyware isn’t detected by most antivirus programs (I am not at all sure why; perhaps because spyware is legal, and is thought to serve some useful purpose?) it gets past your virus protection and there you are. Or you go for a free program to prevent spyware and something worse happens. Read on. Ping! I have a confession: For years, I ran a machine I used for Internet browsing without Norton or any other antivirus program. I got away with it, too — until last night. Now in my defense, my machines are all safely behind a pretty good firewall. A trip to Steve Gibson’s Shields Up (http:// www.grc.com/) shows I am invisible to sniffers, and my router does Network Address Translation (NAT) so that Trojans broadcast to randomly generated IP addresses won’t find me, either. The machine in question has no mail program, so it can’t possibly be used to open mail attachments, and popupstopper prevents most of the annoying trips to places I hadn’t intended to go. Alas, I did use it last night to test new antispyware programs. I installed half a dozen of them. Then, suddenly, my system popped up a Command Window. “Ping 66.50.102.201,” it ordered. Continuously, a new command every second. The pinging began, and issued so fast that the message that the attempt timed out was overwhelmed. Of course, nothing happened because the router blocked the pings from getting out into the world, but it was quite a shock. I shut down the Command Window. In a minute or so it opened, and the pinging began. Now what? Quick Henry, the Norton! First thing was to disconnect this machine from the rest of my network. If it was sending out pings, what else might it be sending inside my firewall? In fact, it wasn’t doing anything else; but it might have been. After I pulled the Ethernet plug, I 90
Dr. Dobb’s Journal, September 2004
reset the system. Then I began installing Norton antivirus. Within a minute or so poor Sable was trying to ping again. I left Norton trying to install while I opened Startup Manager (http://www .rayslab.com/). I have had a copy of this program on every machine I have owned from the earliest days of Windows to present. It finds just about everything that is told to run at startup, and offers you the chance to disable it (leaving those processes ready to be turned on again), or to delete the startup command entirely. And there was a program being run from a directory: Windows/System/Systemy7/hack.exe. Now Windows/System doesn’t have subdirectories; but here was one. Norton Commander showed it had a lot of programs and initialization files, too. One I remember was pepsi.com. I was still a bit curious so I created a directory called “qfooxxx” where the “xxx” was a random number, and copied all the files to that. One wouldn’t copy, and attempts to delete the directory failed, because one of the programs was running. Task manager showed that one, so I killed it. By then Norton was finished installing. Of course, Norton now wanted to be connected to the Internet. It wanted to register, and it also wanted its updates. Did I dare? The pinging had stopped. I have routers that create a DMZ, an area only weakly protected from the Internet but separated by firewalls from the rest of my internal net; but I didn’t have one running just at the moment. I decided I wasn’t that paranoid. I reset, deleted the last traces of Windows\ System\Systemy7\ (it went away fine since Startup Manager kept its program from trying to run on startup), and let Norton connect long enough to register and get its updates. Then I pulled the plug on the Ethernet connection and ran the updated Norton antivirus program. There are a lot of files on Sable, so it took a while; but eventually it found that the QFOOXXX directory contained a program delivering the Hacktool.flooder virus. I looked that one up (it helps to have several machines), and discovered it’s pretty benign: its sole purpose in life is to make a denial of service attack on a specific address by sending an inordinate number of pings. Norton found two more virus programs, both in archives of files copied from another machine and stored for safe keeping. The machine that contained them had long been disassembled, but so far as I know it had never been infected, and my guess is that the virus programs had been detected and quarantined by Dr. Solomon’s virus remedy, which is what I used until Allan Solomon sold his company. Anyway http://www.ddj.com
I saw no point in safekeeping those, and deleted them. The Moral of the Story Clearly, I was lucky. This is only the second time any of my machines have been hit by a virus or Trojan. The last time was years ago, when an earlier communication system was infected with the Melissa virus: I opened an attachment to a press release from someone I knew. When the machine’s hard disk and Ethernet connection lights began blinking furiously, I pulled its plug, so Melissa only mailed about 10 copies of itself to people in my address book. That was long ago, and Melissa was fairly easy to clean up. My good luck this time is that Sable is one of the machines on which I’ve installed Windows XP Service Pack 2 Release Candidate 1. Windows XP SP2 is largely devoted to security, and one of the security features is that it can’t run command-line programs like ping without opening the command-line window. There are other SP2 security features, some pretty nifty. Windows XP with SP 2 isn’t what I’d call secure (and certainly not what my friend Roland would call secure) but it’s a lot better than the earlier versions of Windows XP. And that’s my first conclusion: Unless you have good reason not to, if you’re running Windows XP, go get Service Pack 2 Release Candidate 1 and install it. I’ve run it on half a dozen machines here for a month now, and the only negative is a rather minor inconvenience, namely, a 10to 20-second delay in linking to systems with SP 2 from machines that don’t have it yet. SP 2 is more than worthwhile for the security features. When you build a new machine, you should have all the updates and exploit plugs and critical revisions already downloaded and burned onto a CD. Install them before your new machine ever sees the Internet (or, more importantly, the Internet ever sees your new machine). Then install an antivirus program. While you are at it, you might consider internally updating Norton AntiVirus. When Norton goes out to the Internet for updates, it puts all those files in its own directory. I haven’t tested this and I am not at all sure it would work, but I am tempted to install Norton on a new machine, then XCOPY a newly updated Norton directory over using the /D (copy later files only) switch. The goal isn’t to cheat Norton, it’s to face the Internet for the first time with a fully updated antivirus program. In any event, update that virus protection instantly. Eric points out that the proper way to do this would be to use the Norton Live Update Administrator. This lets you designate a single machine to keep an uphttp://www.ddj.com
date set of virus definitions that all of the other machines on the network can then point to for their updates. Saves on bandwidth and provides an easy way to keep the data available locally. Incidentally, Bob Thompson prefers AVG antivirus from grisoft (http://www .grisoft.com/). Thompson claims that it runs so much faster than Norton that it’s like having a new machine. I haven’t noticed that, but I only have Norton on very fast machines. My second conclusion is, don’t be too eager to try freeware and shareware. That’s not a conclusion I like. One of the fun parts of this business is to find really nifty new stuff, like Golden Bow VOPT (http:// www.goldenbow.com/), still the best defragging program I know of, and NotePro (http://www.tucows.com/preview/341278 .html) before anyone else has ever heard of it. Alas, the bad guys have found that one way to infect other people’s computers is to offer freeware or free trials of shareware. They can even tell you to turn off your antivirus program or their little gift to you won’t install. They can also turn off your spyware scanner without asking you at all. I think I had better look into ways to set up at least one computer normally not connected to the rest of Chaos Manor: a
Dr. Dobb’s Journal, September 2004
system I can use to browse the Internet, and run software as a test in a quarantine environment. I should also get out that router that has a DMZ, so I can test various sharing networks without exposing myself. Between those, I can continue to do silly things so you don’t have to without infecting all of Chaos Manor. Winding Down The Book of the Month is Richard Ben Cramer, How Israel Lost: The Four Questions (Simon & Schuster, 2004; ISBN 0743250281). This book will break your heart. Cramer, an American of Jewish ancestry, was the correspondent for the Philadelphia Inquirer and knows the Middle East better than most anyone now writing about that place. I thought I had an understanding of the problems before I read this book; now I know I didn’t understand at all. The computer book of the month is Joli Ballew and Jeff Duntemann’s Degunking Windows (Paraglyph Press, 2004; ISBN 1932111840). It’s the usual collection of tricks for speeding up your Windows system and while some of the tricks it shows are obvious, others are not. DDJ
91
PROGRAMMER’S BOOKSHELF
Free Culture And the Internet Lynne Greer Jolitz
T
he Internet of today doesn’t much resemble the Internet of decades ago. It wasn’t developed with the express purpose of distributing works by artists, or making money off of sales of music CDs, or even for sending video on demand. The Internet, as Vincent Cerf said, was merely an attempt to “get a bag of bits from one point to another with a greater than zero percent chance of getting there.” The fact that we can do this so reliably is a tribute to the Cerf-Kahn algorithm (TCP/IP) and the work of technologists who turned this lab experiment into a practical mechanism. Yet, success does breed its own failure. In the circumscribed world of the early Internet, there was little need to build in control over work, code, and access. Ideas such as open source and file sharing and e-mail sprang from the desire to communicate ideas — not control them. The question of mediation of works was left unresolved. So, as we watch the lawsuits and public policy debates grow ever more vitriolic, the stakes over the control of copyrighted works grow ever greater. Now, as Lawrence Lessig of the Stanford University Law School so aptly demonstrates in his new book Free Culture, the stakes are driven by interpretations of the U.S. Constitution itself. According to Lessig, we are engaged in nothing less than a war waged by the “monopolists of culture”: To fight “piracy,” to protect “property,” the content industry has launched a war…As with any war of prohibition, these [direct and collateral] damages will be suffered most by our own people.
Lynne is coauthor of the 386BSD operating system. She is currently CTO of ExecProducer, an Internet real-time video production company. Lynne can be contacted at
[email protected]. 92
Free Culture Lawrence Lessig Penguin Press, 2004 345 pp., $24.95 ISBN 1594200068 Lessig’s book is a dark reading of traditional forces arrayed for battle, using lawyers, lobbyists, and money to rewrite laws to suit their immediate interests while placing barriers in the path of others through onerous lawsuits, criminal actions, and increased penalties for perceived nonsanctioned use of properties. Lessig argues that the erosion of copyright and fair use through these tactics is undermining our intellectual commons. Free Culture at its core is the story of the personal battle Lessig took before the U.S. Supreme Court and the well-written and clear arguments that underlie his convictions are fascinating reading. Surprisingly, the actual case that resulted in the “Eldred Decision” takes up very little of the book, perhaps because it has been extensively written about elsewhere. However, since few people have been allowed to argue before the U.S. Supreme Court, a more intimate account and background of the Justices and their questions would have been most welcome for those of us who aren’t “Court TV” junkies. Lessig’s candor in discussing his loss is refreshing, with little of the self-pity you might expect in such a one-sided (seven to two) Supreme Court opinion. He admits that he should have focused more on the question of harm, as his legal advisor suggested, instead of on the larger Constitutional issues he favored. In other words, better a narrow victory than a broad defeat. However, “like a professor correcting a student,” Justice Kennedy’s invitation to discuss the “obvious and profound harm” Dr. Dobb’s Journal, September 2004
was declined, in fact, Lessig took it further, stating, “Nothing in our Copyright Clause claim hangs upon the empirical assertion of impeding progress,” which in retrospect he thought “was a correct answer, but it wasn’t the right answer.” Chief Justice Rehnquist, in particular, whom Lessig had hoped to appeal to per his earlier Lopez ruling, apparently wasn’t very friendly either: “To him, we were a bunch of anarchists.” Who says warfare only happens in chess? Lessig’s vision was large even by Internet standards, and perhaps it was just too much for the U.S. Supreme Court to accept. The question of harm, a public policy migration strategy, might still be the correct course of action to sew back together various factions. But like a failed grade on a student’s exam, this question is quickly forgotten, left as an exercise to the reader. In the end, the Internet may simply be viewed as another phone network, and much of this legal upheaval may vanish into the well-understood and arcane realm of telecommunications law and regulation. Recently, for example, the New York State Public Service Commission ruled that Vonage Holdings, a VOIP company, is actually a telephone company in disguise and is thus subject to state regulation. It may not be a visionary decision, but it is an understandable one. DDJ
Programmer’s Bookshelf Submissions to Programmer’s Bookshelf can be sent via e-mail to
[email protected] or mailed to DDJ, 2800 Campus Drive, San Mateo, CA 94403.
http://www.ddj.com
OF INTEREST code optimizations. Spices.Modeler generates diagrams and charts related to various aspects of assembly members. Spices.Investigator is a low-level .NET metadata and PE-format browser, and Spices.Informer shows information about the currently selected assembly or assembly member. 9Rays.net LLC Pils 12 190000 Riga Latvia +371 7044001 http://www.9rays.net/
Webogy has announced that Version 2.6 of its Lightning Development System for creating C++ web applications is available. Webogy Lightning Development System (LDS) is a platform for developing, housing, running, and deploying secure scalable, mission-critical web applications. Among other features, it offers a programming model that lets you leverage high-performance C++ and windows-like HTML pages. It also provides a multithreaded execution engine that dynamically generates web pages and services submission requests. Webogy LLC 3000 Lakeside Drive, Suite 105N Bannockburn, IL 60015 847-317-1100 http://www.webogy.com/lds/lds.htm @stake has developed the SmartRisk Analyzer, an automated solution for identifying security vulnerabilities in software applications. Using deep static analysis of the application binary code, SmartRisk Analyzer can map application control and data flow paths into a comprehensive security model. Scans are designed to find flaws related to improper use of programming languages and standard libraries, flaws that may result from the deployment platform on which the application runs, and other vulnerabilities such as input validation, command and script injection, and backdoors and malware. @stake Inc. 196 Broadway Cambridge, MA 02139-1902 617-621-3500 http://www.atstake.com/ 9Rays.Net has released Spices.NET 3.5, a set of plug-ins for .NET developers. Spices.NET comes with five components — Obfuscator, Decompiler, Modeler, Investigator, and Informer. Spices.Obfuscator obfuscates .NET assemblies, and Spices.Decompiler decompiles .NET assemblies into six languages (MSIL, VB.Net, C#, MC++, J#, Delphi.Net) with syntax highlighting, with or without 94
JGsoft has introduced RegexBuddy, a Windows and Linux utility designed to help you learn, create, understand, test, use, and save regular expressions. With RegexBuddy, instead of typing regex tokens directly, you pick what you want from a descriptive menu. A tree of regex tokens keeps track of the pattern you have built, and the RegexBuddy tester and debugger lets you step through the search matches and get detailed reports about each match. The Windows version of RegexBuddy provides both a command-line interface and a COM automation interface. The Linux version has an API that uses standard input/output to provide the same functionality as the COM interface on Windows. JGsoft 56 Uppalisan Road Muang, Ubonratchathani 34000 Thailand http://www.regular-expressions.info/ The Eclipse Foundation has announced availability of Eclipse 3.0, a royalty-free release supports a rich- client platform (RCP) for construction of desktop applications. Enhancements include streamlined installation, improved customization of menus and toolbars, and a restructured workbench for running underlying program facilities in the background in a multithreaded environment. Eclipse Foundation 2670 Queensview Drive Ottawa, ON Canada K2B 8K1 http://www.eclipse.org/ ILOG has launched the developer edition of its Business Rule Studio (BR Studio), a plug-in for the Eclipse IDE based on ILOG JRules. Key features include Rule Project, a new project type in Eclipse that allows developers to manage business rules just like Java code; a business rule editor with realtime error checking; an embedded rule engine for rapid prototyping and rule execution; and integrated tools for debugging. ILOG Inc. 1080 Linda Vista Avenue Mountain View, CA 94043 650-567-8000 http://www.ilog.com/ Dr. Dobb’s Journal, September 2004
Quinn-Curtis is offering Real-Time Graphics Tools for .NET, an object- oriented toolkit aimed at developers who want to add real-time graphics to their C# and Visual Basic for .NET applications. The toolkit includes more than 30 different types of real-time displays including scrolling graphs, bar indicators, dials, meters, clocks, and digital panel meter indicators. It also supports time/date coordinate systems required for real-time reporting in financial markets, process monitoring, and automation. Quinn-Curtis Inc. 18 Hearthstone Drive Medfield, MA 02052 508-359-6639 http://www.quinn-curtis.com/ PE Explorer 1.95 from Heaventools Software is a tool for inspecting the inner workings of Windows 32-bit executable files. PE Explorer offers a look at the PE (portable executable) file structure and all of the resources in the file. File structure can be analyzed and optimized, spyware tracked down, problems diagnosed, changes made, and resources repaired. Version 1.95 now supports removing debug information from the PE files, adds support for the SSE3 instruction set to the disassembler and improves the data-analysis algorithm. Heaventools Software Pacific Business Centre 101-1001 West Broadway, Dept. 381 Vancouver, BC, Canada V6H4E4 http://www.heaventools.com/ Parasoft .TEST 1.6 is a unit testing tool that automatically tests classes written on the Microsoft .NET Framework without requiring developers to write test scenarios or stubs. The new version of .TEST supports NUnit, automatically generating scenarios that can be compiled and run in NUnit without setting up the NUnit test harness by hand. .TEST can also automatically compile and run NUnit test cases in its own GUI. Parasoft 101 E. Huntington Dr., Second Floor Monrovia, CA 91016 626-256-3680 http://www.parasoft.com/ DDJ Dr. Dobb’s Software Tools Newsletter What’s the fastest way of keeping up with new developer products and version updates? Dr. Dobb’s Software Tools e-mail newsletter, delivered once a month to your mailbox. This unique newsletter keeps you up-to-date on the latest in SDKs, libraries, components, compilers, and the like. To sign up now for this free service, go to http://www.ddj.com/maillists/.
http://www.ddj.com
SWAINE’S FLAMES
The Burglar Who Knew Lisp
“Y
ou’re dressed for success, Bernie,” I told myself, first because my name is Bernie and second because I was, in fact, outfitted with everything the successful 21st century B&E artist needs for a night’s work. B&E: That’s breaking and entering, a hobby of mine. How does the successful B&E artist outfit him- (or her-; don’t want to be sexist) self these days? De rigueur is plastic film gloves, flushable in an emergency, and a penlight flashlight of the sort that anyone might carry. I add a perfectly ordinary Palm handheld with a not-so ordinary program, four inches of two-strand insulated copper wire under the leather label on the beltline of my jeans, and something that looks like a credit card if you overlook the electrical contacts on the edge. In jeans, T-shirt, and hightops, I could pass for a jogger, say, or a programmer. From where I dropped off the car three blocks away, I was that jogger, taking advantage of the cool night air. Standing in front of the door at Blindside Enterprises, fitting my card in the slot, I was the programmer arriving for his 2AM stint, obsessively tapping away at his Palm even as he unlocked the door. The door clicked open and I slipped inside, returning card, wire, and Palm to their respective homes. I had officially entered the magic kingdom of Felonyland. I wasn’t sure how much time I had, so I didn’t bother with corporate data as I shuffled through the CEO’s desk and hard disk files. For personal data, e-mail and browser history are obvious places to check, as well as cookies from sites your target has visited. I had my hand in the cookie jar, so to speak, when things got all too personal. “Hey! What are you doing in here?” I forced my heart back down out of my nasal cavity and sized him up. In his sweatshirt, floppy hippie pants, and sandals with socks, I suspected that he was not the CEO, nor was he likely to be security. Still, he was brandishing a cell phone in an ominous way. “I’m a burglar,” I said. “I’m going through the boss’s files to get some dirt on him.” He blinked twice and didn’t press any buttons on the phone. Good so far. “Don’t you want to know how I got past the security system?” I didn’t wait for an answer. “If you call it a security system. I’ve hacked way better card systems than this outfit’s, let me tell you.” His pupils contracted. I had engaged his professional attention. “Before you turn me in, maybe you’d like to know how I hacked it? You know that the security guard and the cops aren’t going to be able to understand what I’m saying.” “Yeah. You better tell me all about it.” He pointed at me sternly with his phone hand and sat down on the arm of a chair. I showed him the card, the wire, and the Palm, and how they worked. He slid into the chair and started to launch the key program, but I stopped him. “You’re a hacker, I can see that,” I said, and saw the look of pride he tried to conceal. “You know what I mean if I say this is Mission Impossible code?” “You mean self destructing?” “Affirmative.” I thought that sounded Mission Impossiblish. “So there’s a password, right?” I gave it to him. It took him a few seconds to tire of trying out the program and then he was into the source code. Source code. On a Palm. “This is Lisp,” he said incredulously. “You’re running a Lisp interpreter on this thing.” “No other language could do what I want on this platform.” I set the hook and prepared to tug it. “You know Paul Graham and Robert Morris wrote the first e-commerce toolkit in Lisp.” I could see the jolt as he took the hook. “Robert Morris? The legendary hacker?” It took another 10 minutes to get safely out the back door, but I was really home free when he said that. Home free — but without any revealing facts about the guy who had bought Foo Bar. What was I going to tell Swaine? Continued next month.
Michael Swaine editor-at-large
[email protected] 96
Dr. Dobb’s Journal, September 2004
http://www.ddj.com